Kaplan-Meier Estimator

Created:December 8, 2025

Last Updated:June 9, 2026

The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from time-to-event data. It accounts for censored observations (incomplete data) and provides survival probabilities at different time points, making it essential for clinical trials, reliability analysis, and epidemiological studies.

What You'll Get:

Survival Curve: Interactive plot showing survival probability over time
Survival Table: Detailed survival probabilities at each event time
Confidence Intervals: Upper and lower bounds for survival estimates
Median Survival Time: Time at which 50% of subjects have experienced the event
Number at Risk: Subjects remaining at each time point
Group Comparison: Compare survival curves across groups (optional)
Log-Rank Test: Statistical test for comparing survival curves (when groups provided)
APA-Formatted Report: Professional results ready for publication

💡 Pro Tip: Your data should include a time column (time to event or censoring) and an event column (1 = event occurred, 0 = censored). Optionally include a group column to compare survival curves. For formal group comparison, use the Log-Rank Test. For modeling covariates, consider Cox Proportional Hazards Model.

Ready to analyze survival data? (clinical trial recovery times) to see the Kaplan-Meier estimator in action, or upload your own time-to-event data to estimate survival curves.

Calculator

1. Load Your Data

2. Select Columns & Options

Time Column *

Time to event or censoring

Event Column *

1 = event, 0 = censored

Group Column (optional)

For comparing groups

Confidence Level:

Show Confidence Intervals on Plot

Related Calculators

Log-Rank Test Calculator

Cox Proportional Hazards Model Calculator

Descriptive Statistics Calculator

Learn More

Definition

The Kaplan-Meier estimator (also known as the product-limit estimator) is a non-parametric statistic used to estimate the survival function from lifetime data. It provides the probability that an individual survives past a certain time, accounting for censored observations where the event of interest has not yet occurred.

The Kaplan-Meier survival function at time t:

\hat{S}(t) = \prod_{t_i \leq t} \left(1 - \frac{d_i}{n_i}\right)

where $d_i$ is the number of events at time $t_i$ , and $n_i$ is the number at risk just before time $t_i$

Data Format Requirements

Correct Data Format

Time	Event	Group	Meaning
5	1	Treatment	Event occurred
8	0	Treatment	Censored
12	1	Control	Event occurred
15	0	Control	Censored
18	1	Treatment	Event occurred

Time: Any non-negative number (days, months, years, etc.)

Event: 1 = event occurred, 0 = censored

Group: Any text labels to identify groups (e.g., Treatment, Control, Drug A, Drug B)

Common Data Issues

Event column has wrong values

Time	Event	Group
5	yes	Treatment
8	no	Control

❌ Event must be 1 (event) or 0 (censored)

Negative time values

Time	Event	Group
-5	1	Treatment
8	0	Control

❌ Time values must be zero or positive

Missing required columns

Time	Event	Group
5	—	Treatment
8	—	Control

❌ Missing Event column

When to Use Kaplan-Meier Estimator

Use the Kaplan-Meier estimator when:

Time-to-event data:You're analyzing survival, failure, or duration data
Censored observations:Some subjects have incomplete follow-up or haven't experienced the event
Survival curve estimation: You want to visualize how survival probability changes over time
Median survival time: You need to estimate when 50% of subjects experience the event
Group comparison: You want to visually compare survival between groups (e.g., treatment vs. control)
Non-parametric approach:You don't want to assume a specific distribution for survival times

Requirements & Assumptions

Time variable: Continuous or discrete time values (must be non-negative)
Event indicator: Binary variable (1 = event occurred, 0 = censored)
Independent censoring: Censoring should be unrelated to the probability of the event
Survival probabilities: Same for all subjects entering the study at a given time
Event times are known precisely: Or can be adequately approximated
No negative times: Time values must be zero or positive

Understanding Censored Data

Censoring occurs when we have incomplete information about a subject's survival time. This is a fundamental concept in survival analysis:

Right-censoring (most common):Study ends or subject withdraws before the event occurs. We know the subject survived at least until the censoring time, but don't know when (or if) the event will occur.
Event = 1: The event of interest occurred at the recorded time. This is a complete observation.
Event = 0: Subject was censored. We only know they survived at least until that time, but the final outcome is unknown.
Example:In a 5-year cancer study, a patient who moves away after 3 years cancer-free is censored at 3 years. We know they were cancer-free for at least 3 years, but don't know what happened after they left the study.
Why censoring matters: The Kaplan-Meier estimator properly accounts for censored observations, using all available information without bias. Simply excluding censored observations would lead to incorrect survival estimates.

Understanding Censoring: Timeline View

Subject 1

✓

Subject 2

Subject 3

✓

Subject 4

Subject 5

✓

Event Occurred (Event = 1)

Censored (Event = 0)

Green checkmarks: We know exactly when the event happened. Red X marks: Subject left the study before the event occurred - we only know they survived at least until that time.

Interpreting Results

Anatomy of a Survival Curve

Survival Curve: Steps down when events occur, stays flat when observations are censored

Confidence Band: Shows uncertainty in survival estimates (widens over time)

Median Survival: Time when survival probability reaches 50%

Survival Curve:

Steps down at each event time (when subjects experience the event)
Stays flat when subjects are censored (no step down)
Steeper drops indicate higher hazard (more events occurring)
Flatter sections indicate lower hazard (fewer events)

Median Survival Time:

Time at which survival probability drops to 0.5 (50%)
May be undefined if survival never drops below 50% (censored data)

Confidence Intervals:

Wider intervals indicate more uncertainty
Typically widen over time as the number at risk decreases

Example Code

R Code

library(survival)
library(survminer)
library(tidyverse)

# Clinical trial data: time to recovery (days), event (1=recovered, 0=censored)
data <- tibble(
  time = c(5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60),
  event = c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1),
  group = c('Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment',
            'Treatment', 'Treatment', 'Treatment', 'Control', 'Control',
            'Control', 'Control', 'Control', 'Control', 'Control')
)

# Overall Kaplan-Meier estimate
km_fit <- survfit(Surv(time, event) ~ 1, data = data)

# Summary with survival probabilities
summary(km_fit)

# Median survival time
print(km_fit)

# Survival plot
ggsurvplot(km_fit,
           data = data,
           conf.int = TRUE,
           risk.table = TRUE,
           xlab = "Time (days)",
           ylab = "Survival Probability",
           title = "Kaplan-Meier Survival Curve")

# Kaplan-Meier by group
km_group <- survfit(Surv(time, event) ~ group, data = data)

# Summary by group
summary(km_group)

# Comparison plot
ggsurvplot(km_group,
           data = data,
           conf.int = TRUE,
           pval = TRUE,
           risk.table = TRUE,
           xlab = "Time (days)",
           ylab = "Survival Probability",
           title = "Kaplan-Meier Curves by Group")

# Log-rank test
survdiff(Surv(time, event) ~ group, data = data)

Python Code

Python

import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt

# Clinical trial data: time to recovery (days), event (1=recovered, 0=censored)
data = pd.DataFrame({
    'time': [5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60],
    'event': [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1],
    'group': ['Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment',
              'Treatment', 'Treatment', 'Treatment', 'Control', 'Control',
              'Control', 'Control', 'Control', 'Control', 'Control']
})

# Overall Kaplan-Meier estimate
kmf = KaplanMeierFitter()
kmf.fit(durations=data['time'], event_observed=data['event'])

# Print survival table
print(kmf.survival_function_)

# Print median survival time
print(f"Median survival time: {kmf.median_survival_time_:.2f} days")

# Plot survival curve
kmf.plot_survival_function()
plt.title('Kaplan-Meier Survival Curve')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.show()

# Kaplan-Meier by group
fig, ax = plt.subplots(figsize=(10, 6))

for group in data['group'].unique():
    group_data = data[data['group'] == group]
    kmf_group = KaplanMeierFitter()
    kmf_group.fit(
        durations=group_data['time'],
        event_observed=group_data['event'],
        label=group
    )
    kmf_group.plot_survival_function(ax=ax, ci_show=True)

plt.title('Kaplan-Meier Curves by Group')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

# Log-rank test
treatment_data = data[data['group'] == 'Treatment']
control_data = data[data['group'] == 'Control']

results = logrank_test(
    treatment_data['time'], control_data['time'],
    treatment_data['event'], control_data['event']
)

print(f"\nLog-rank test:")
print(f"Test statistic: {results.test_statistic:.3f}")
print(f"p-value: {results.p_value:.4f}")

Choosing the Right Test

Which Survival Analysis Test Should I Use?

I have time-to-event data

What is my goal?

Describe/estimate survival

Kaplan-Meier Estimator

Get survival curves & median survival

Compare 2+ groups

Log-Rank Test

Test if groups differ significantly

Model multiple factors

Cox Regression

Adjust for covariates & hazard ratios

Kaplan-Meier

• Estimate survival curves
• Find median survival time
• Visualize survival by group
• No formal hypothesis test

Log-Rank Test

• Compare 2+ groups
• Get p-value for difference
• Test overall survival
• Single grouping variable

Cox Regression

• Multiple covariates
• Hazard ratios
• Adjust for confounders
• Most comprehensive

Quick Decision Guide:

Start with Kaplan-Meier to visualize your data and understand survival patterns
Use Log-Rank Test when you have 2+ groups and want to test if they differ (e.g., treatment vs control)
Use Cox Regression when you have multiple variables (age, treatment, stage, etc.) and want to model their effects

Verification

Time

Event

Group

Meaning

Treatment

Event occurred

Treatment

Censored

Control

Event occurred

Control

Censored

Treatment

Event occurred

Time

Event

Group

yes

Treatment

Control

Time

Event

Group

-5

Treatment

Control

Time

Event

Group

—

Treatment

—

Control

library(survival) library(survminer) library(tidyverse) # Clinical trial data: time to recovery (days), event (1=recovered, 0=censored) data <- tibble( time = c(5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60), event = c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1), group = c('Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Control', 'Control', 'Control', 'Control', 'Control', 'Control', 'Control') ) # Overall Kaplan-Meier estimate km_fit <- survfit(Surv(time, event) ~ 1, data = data) # Summary with survival probabilities summary(km_fit) # Median survival time print(km_fit) # Survival plot ggsurvplot(km_fit, data = data, conf.int = TRUE, risk.table = TRUE, xlab = "Time (days)", ylab = "Survival Probability", title = "Kaplan-Meier Survival Curve") # Kaplan-Meier by group km_group <- survfit(Surv(time, event) ~ group, data = data) # Summary by group summary(km_group) # Comparison plot ggsurvplot(km_group, data = data, conf.int = TRUE, pval = TRUE, risk.table = TRUE, xlab = "Time (days)", ylab = "Survival Probability", title = "Kaplan-Meier Curves by Group") # Log-rank test survdiff(Surv(time, event) ~ group, data = data)

import numpy as np import pandas as pd from lifelines import KaplanMeierFitter from lifelines.statistics import logrank_test import matplotlib.pyplot as plt # Clinical trial data: time to recovery (days), event (1=recovered, 0=censored) data = pd.DataFrame({ 'time': [5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60], 'event': [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1], 'group': ['Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment', 'Control', 'Control', 'Control', 'Control', 'Control', 'Control', 'Control'] }) # Overall Kaplan-Meier estimate kmf = KaplanMeierFitter() kmf.fit(durations=data['time'], event_observed=data['event']) # Print survival table print(kmf.survival_function_) # Print median survival time print(f"Median survival time: {kmf.median_survival_time_:.2f} days") # Plot survival curve kmf.plot_survival_function() plt.title('Kaplan-Meier Survival Curve') plt.xlabel('Time (days)') plt.ylabel('Survival Probability') plt.ylim(0, 1) plt.grid(True, alpha=0.3) plt.show() # Kaplan-Meier by group fig, ax = plt.subplots(figsize=(10, 6)) for group in data['group'].unique(): group_data = data[data['group'] == group] kmf_group = KaplanMeierFitter() kmf_group.fit( durations=group_data['time'], event_observed=group_data['event'], label=group ) kmf_group.plot_survival_function(ax=ax, ci_show=True) plt.title('Kaplan-Meier Curves by Group') plt.xlabel('Time (days)') plt.ylabel('Survival Probability') plt.ylim(0, 1) plt.grid(True, alpha=0.3) plt.legend() plt.show() # Log-rank test treatment_data = data[data['group'] == 'Treatment'] control_data = data[data['group'] == 'Control'] results = logrank_test( treatment_data['time'], control_data['time'], treatment_data['event'], control_data['event'] ) print(f"\nLog-rank test:") print(f"Test statistic: {results.test_statistic:.3f}") print(f"p-value: {results.p_value:.4f}")

Kaplan-Meier Estimator

What You'll Get:

Calculator

1. Load Your Data

2. Select Columns & Options

Related Calculators

Log-Rank Test Calculator

Cox Proportional Hazards Model Calculator

Descriptive Statistics Calculator

Learn More

Definition

Data Format Requirements

Correct Data Format

Common Data Issues

Event column has wrong values

Negative time values

Missing required columns

When to Use Kaplan-Meier Estimator

Requirements & Assumptions

Understanding Censored Data

Understanding Censoring: Timeline View

Interpreting Results

Anatomy of a Survival Curve

Example Code

R Code

Python Code

Choosing the Right Test

Which Survival Analysis Test Should I Use?

Kaplan-Meier

Log-Rank Test

Cox Regression

Quick Decision Guide:

Verification

View Verification Details

Kaplan-Meier Estimator

What You'll Get:

Calculator

1. Load Your Data

2. Select Columns & Options

Related Calculators

Log-Rank Test Calculator

Cox Proportional Hazards Model Calculator

Descriptive Statistics Calculator

Learn More

Definition

Data Format Requirements

Correct Data Format

Common Data Issues

Event column has wrong values

Negative time values

Missing required columns

When to Use Kaplan-Meier Estimator

Requirements & Assumptions

Understanding Censored Data

Understanding Censoring: Timeline View

Interpreting Results

Anatomy of a Survival Curve

Example Code

R Code

Python Code

Choosing the Right Test

Which Survival Analysis Test Should I Use?

Kaplan-Meier

Log-Rank Test

Cox Regression

Quick Decision Guide:

Verification

View Verification Details