The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from time-to-event data. It accounts for censored observations (incomplete data) and provides survival probabilities at different time points, making it essential for clinical trials, reliability analysis, and epidemiological studies.
💡 Pro Tip: Your data should include a time column (time to event or censoring) and an event column (1 = event occurred, 0 = censored). Optionally include a group column to compare survival curves. For formal group comparison, use the Log-Rank Test. For modeling covariates, consider Cox Proportional Hazards Model.
Ready to analyze survival data? (clinical trial recovery times) to see the Kaplan-Meier estimator in action, or upload your own time-to-event data to estimate survival curves.
Time to event or censoring
1 = event, 0 = censored
For comparing groups
The Kaplan-Meier estimator (also known as the product-limit estimator) is a non-parametric statistic used to estimate the survival function from lifetime data. It provides the probability that an individual survives past a certain time, accounting for censored observations where the event of interest has not yet occurred.
The Kaplan-Meier survival function at time t:
where is the number of events at time , and is the number at risk just before time
| Time | Event | Group | Meaning |
|---|---|---|---|
| 5 | 1 | Treatment | Event occurred |
| 8 | 0 | Treatment | Censored |
| 12 | 1 | Control | Event occurred |
| 15 | 0 | Control | Censored |
| 18 | 1 | Treatment | Event occurred |
Time: Any non-negative number (days, months, years, etc.)
Event: 1 = event occurred, 0 = censored
Group: Any text labels to identify groups (e.g., Treatment, Control, Drug A, Drug B)
| Time | Event | Group |
|---|---|---|
| 5 | yes | Treatment |
| 8 | no | Control |
❌ Event must be 1 (event) or 0 (censored)
| Time | Event | Group |
|---|---|---|
| -5 | 1 | Treatment |
| 8 | 0 | Control |
❌ Time values must be zero or positive
| Time | Event | Group |
|---|---|---|
| 5 | — | Treatment |
| 8 | — | Control |
❌ Missing Event column
Use the Kaplan-Meier estimator when:
Censoring occurs when we have incomplete information about a subject's survival time. This is a fundamental concept in survival analysis:
Green checkmarks: We know exactly when the event happened. Red X marks: Subject left the study before the event occurred - we only know they survived at least until that time.
library(survival)
library(survminer)
library(tidyverse)
# Clinical trial data: time to recovery (days), event (1=recovered, 0=censored)
data <- tibble(
time = c(5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60),
event = c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1),
group = c('Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment',
'Treatment', 'Treatment', 'Treatment', 'Control', 'Control',
'Control', 'Control', 'Control', 'Control', 'Control')
)
# Create survival object
surv_obj <- Surv(time = data$time, event = data$event)
# Overall Kaplan-Meier estimate
km_fit <- survfit(surv_obj ~ 1, data = data)
# Summary with survival probabilities
summary(km_fit)
# Median survival time
print(km_fit)
# Survival plot
ggsurvplot(km_fit,
conf.int = TRUE,
risk.table = TRUE,
xlab = "Time (days)",
ylab = "Survival Probability",
title = "Kaplan-Meier Survival Curve")
# Kaplan-Meier by group
km_group <- survfit(Surv(time, event) ~ group, data = data)
# Summary by group
summary(km_group)
# Comparison plot
ggsurvplot(km_group,
conf.int = TRUE,
pval = TRUE,
risk.table = TRUE,
xlab = "Time (days)",
ylab = "Survival Probability",
title = "Kaplan-Meier Curves by Group")
# Log-rank test
survdiff(Surv(time, event) ~ group, data = data)import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt
# Clinical trial data: time to recovery (days), event (1=recovered, 0=censored)
data = pd.DataFrame({
'time': [5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60],
'event': [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1],
'group': ['Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment',
'Treatment', 'Treatment', 'Treatment', 'Control', 'Control',
'Control', 'Control', 'Control', 'Control', 'Control']
})
# Overall Kaplan-Meier estimate
kmf = KaplanMeierFitter()
kmf.fit(durations=data['time'], event_observed=data['event'])
# Print survival table
print(kmf.survival_function_)
# Print median survival time
print(f"Median survival time: {kmf.median_survival_time_:.2f} days")
# Plot survival curve
kmf.plot_survival_function()
plt.title('Kaplan-Meier Survival Curve')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.show()
# Kaplan-Meier by group
fig, ax = plt.subplots(figsize=(10, 6))
for group in data['group'].unique():
group_data = data[data['group'] == group]
kmf_group = KaplanMeierFitter()
kmf_group.fit(
durations=group_data['time'],
event_observed=group_data['event'],
label=group
)
kmf_group.plot_survival_function(ax=ax, ci_show=True)
plt.title('Kaplan-Meier Curves by Group')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
# Log-rank test
treatment_data = data[data['group'] == 'Treatment']
control_data = data[data['group'] == 'Control']
results = logrank_test(
treatment_data['time'], control_data['time'],
treatment_data['event'], control_data['event']
)
print(f"\nLog-rank test:")
print(f"Test statistic: {results.test_statistic:.3f}")
print(f"p-value: {results.p_value:.4f}")Get survival curves & median survival
Test if groups differ significantly
Adjust for covariates & hazard ratios
The Kaplan-Meier estimator is a non-parametric statistic used to estimate the survival function from time-to-event data. It accounts for censored observations (incomplete data) and provides survival probabilities at different time points, making it essential for clinical trials, reliability analysis, and epidemiological studies.
💡 Pro Tip: Your data should include a time column (time to event or censoring) and an event column (1 = event occurred, 0 = censored). Optionally include a group column to compare survival curves. For formal group comparison, use the Log-Rank Test. For modeling covariates, consider Cox Proportional Hazards Model.
Ready to analyze survival data? (clinical trial recovery times) to see the Kaplan-Meier estimator in action, or upload your own time-to-event data to estimate survival curves.
Time to event or censoring
1 = event, 0 = censored
For comparing groups
The Kaplan-Meier estimator (also known as the product-limit estimator) is a non-parametric statistic used to estimate the survival function from lifetime data. It provides the probability that an individual survives past a certain time, accounting for censored observations where the event of interest has not yet occurred.
The Kaplan-Meier survival function at time t:
where is the number of events at time , and is the number at risk just before time
| Time | Event | Group | Meaning |
|---|---|---|---|
| 5 | 1 | Treatment | Event occurred |
| 8 | 0 | Treatment | Censored |
| 12 | 1 | Control | Event occurred |
| 15 | 0 | Control | Censored |
| 18 | 1 | Treatment | Event occurred |
Time: Any non-negative number (days, months, years, etc.)
Event: 1 = event occurred, 0 = censored
Group: Any text labels to identify groups (e.g., Treatment, Control, Drug A, Drug B)
| Time | Event | Group |
|---|---|---|
| 5 | yes | Treatment |
| 8 | no | Control |
❌ Event must be 1 (event) or 0 (censored)
| Time | Event | Group |
|---|---|---|
| -5 | 1 | Treatment |
| 8 | 0 | Control |
❌ Time values must be zero or positive
| Time | Event | Group |
|---|---|---|
| 5 | — | Treatment |
| 8 | — | Control |
❌ Missing Event column
Use the Kaplan-Meier estimator when:
Censoring occurs when we have incomplete information about a subject's survival time. This is a fundamental concept in survival analysis:
Green checkmarks: We know exactly when the event happened. Red X marks: Subject left the study before the event occurred - we only know they survived at least until that time.
library(survival)
library(survminer)
library(tidyverse)
# Clinical trial data: time to recovery (days), event (1=recovered, 0=censored)
data <- tibble(
time = c(5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60),
event = c(1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1),
group = c('Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment',
'Treatment', 'Treatment', 'Treatment', 'Control', 'Control',
'Control', 'Control', 'Control', 'Control', 'Control')
)
# Create survival object
surv_obj <- Surv(time = data$time, event = data$event)
# Overall Kaplan-Meier estimate
km_fit <- survfit(surv_obj ~ 1, data = data)
# Summary with survival probabilities
summary(km_fit)
# Median survival time
print(km_fit)
# Survival plot
ggsurvplot(km_fit,
conf.int = TRUE,
risk.table = TRUE,
xlab = "Time (days)",
ylab = "Survival Probability",
title = "Kaplan-Meier Survival Curve")
# Kaplan-Meier by group
km_group <- survfit(Surv(time, event) ~ group, data = data)
# Summary by group
summary(km_group)
# Comparison plot
ggsurvplot(km_group,
conf.int = TRUE,
pval = TRUE,
risk.table = TRUE,
xlab = "Time (days)",
ylab = "Survival Probability",
title = "Kaplan-Meier Curves by Group")
# Log-rank test
survdiff(Surv(time, event) ~ group, data = data)import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt
# Clinical trial data: time to recovery (days), event (1=recovered, 0=censored)
data = pd.DataFrame({
'time': [5, 8, 12, 15, 18, 23, 25, 30, 35, 40, 45, 48, 50, 55, 60],
'event': [1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1],
'group': ['Treatment', 'Treatment', 'Treatment', 'Treatment', 'Treatment',
'Treatment', 'Treatment', 'Treatment', 'Control', 'Control',
'Control', 'Control', 'Control', 'Control', 'Control']
})
# Overall Kaplan-Meier estimate
kmf = KaplanMeierFitter()
kmf.fit(durations=data['time'], event_observed=data['event'])
# Print survival table
print(kmf.survival_function_)
# Print median survival time
print(f"Median survival time: {kmf.median_survival_time_:.2f} days")
# Plot survival curve
kmf.plot_survival_function()
plt.title('Kaplan-Meier Survival Curve')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.show()
# Kaplan-Meier by group
fig, ax = plt.subplots(figsize=(10, 6))
for group in data['group'].unique():
group_data = data[data['group'] == group]
kmf_group = KaplanMeierFitter()
kmf_group.fit(
durations=group_data['time'],
event_observed=group_data['event'],
label=group
)
kmf_group.plot_survival_function(ax=ax, ci_show=True)
plt.title('Kaplan-Meier Curves by Group')
plt.xlabel('Time (days)')
plt.ylabel('Survival Probability')
plt.ylim(0, 1)
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()
# Log-rank test
treatment_data = data[data['group'] == 'Treatment']
control_data = data[data['group'] == 'Control']
results = logrank_test(
treatment_data['time'], control_data['time'],
treatment_data['event'], control_data['event']
)
print(f"\nLog-rank test:")
print(f"Test statistic: {results.test_statistic:.3f}")
print(f"p-value: {results.p_value:.4f}")Get survival curves & median survival
Test if groups differ significantly
Adjust for covariates & hazard ratios