The Cox Proportional Hazards Model is a semi-parametric regression model used to examine the relationship between covariates and survival time. It estimates hazard ratios (HR) that quantify the effect of each covariate on the risk of the event occurring, making it the most widely used method for multivariable survival analysis in clinical research and epidemiology.
Pro Tip: Your data needs a time column, an event column (1 = event, 0 = censored), and one or more covariate columns. Categorical columns (text values) are automatically dummy-coded. For simple group comparison without covariates, use the Log-Rank Test.
The proportional hazards assumption test uses scaled Schoenfeld residuals (Grambsch & Therneau, 1994), implemented in Python via lifelines. Results may differ numerically from R's cox.zph(), which uses an exact score test implemented in compiled C code. Conclusions about whether the proportional hazards assumption is satisfied or violated should generally agree.
Ready to fit a Cox model? (clinical trial with age, treatment, and disease stage as covariates) to see Cox regression in action, or upload your own survival data.
Time to event or censoring
1 = event, 0 = censored
Load data and select Time/Event columns first
Categorical columns (text values) are automatically dummy-coded
The Cox Proportional Hazards Model, introduced by Sir David Cox in 1972, is a semi-parametric regression model for survival data. Unlike parametric models, it makes no assumption about the shape of the baseline hazard function. It models the hazard rate as a multiplicative function of covariates, making it the most widely used method for multivariable survival analysis in clinical research and epidemiology.
The Cox model hazard function at time t for subject with covariates X:
where is the unspecified baseline hazard function and are the regression coefficients. The hazard ratio for covariate is .
| Time | Event | Meaning |
|---|---|---|
| 5 | 1 | Event occurred |
| 8 | 0 | Censored |
| 12 | 1 | Event occurred |
| 15 | 0 | Censored |
| 18 | 1 | Event occurred |
Time: Any non-negative number (days, months, years, etc.)
Event: 1 = event occurred, 0 = censored
| Time | Event |
|---|---|
| 5 | yes |
| 8 | no |
❌ Event must be 1 (event) or 0 (censored)
| Time | Event |
|---|---|
| -5 | 1 |
| 8 | 0 |
❌ Time values must be zero or positive
| Time | Event |
|---|---|
| 5 | — |
| 8 | — |
❌ Missing Event column
Use the Cox proportional hazards model when:
library(survival)
# Sample clinical trial data
data <- data.frame(
time = c(6,7,10,15,19,25,30,33,42,45,46,52,53,54,59,
4,8,12,17,22,26,28,35,38,44,47,51,55,57,60),
event = c(1,1,1,1,0,1,1,1,0,1,0,1,0,1,0,
1,1,0,1,1,1,0,1,1,0,1,1,0,1,0),
age = c(45,52,38,60,47,55,41,63,50,48,57,44,66,39,53,
42,58,35,67,49,54,46,62,51,43,59,37,64,56,48),
treatment = c(rep("A",15), rep("B",15)),
stage = c(1,2,1,3,2,2,1,3,2,1,3,2,3,1,2,
2,1,2,3,2,1,3,2,1,2,3,1,2,2,3)
)
# Fit Cox proportional hazards model
cox_model <- coxph(Surv(time, event) ~ age + treatment + stage, data = data)
# Print summary
summary(cox_model)
# Test proportional hazards assumption
cox.zph(cox_model)import numpy as np
import pandas as pd
from lifelines import CoxPHFitter
import matplotlib.pyplot as plt
# Sample clinical trial data
data = pd.DataFrame({
'time': [6,7,10,15,19,25,30,33,42,45,46,52,53,54,59,
4,8,12,17,22,26,28,35,38,44,47,51,55,57,60],
'event': [1,1,1,1,0,1,1,1,0,1,0,1,0,1,0,
1,1,0,1,1,1,0,1,1,0,1,1,0,1,0],
'age': [45,52,38,60,47,55,41,63,50,48,57,44,66,39,53,
42,58,35,67,49,54,46,62,51,43,59,37,64,56,48],
'treatment': ['A']*15 + ['B']*15,
'stage': [1,2,1,3,2,2,1,3,2,1,3,2,3,1,2,
2,1,2,3,2,1,3,2,1,2,3,1,2,2,3]
})
# Fit Cox proportional hazards model
cph = CoxPHFitter()
cph.fit(data, duration_col='time', event_col='event')
# Print summary with hazard ratios
cph.print_summary()
# Plot hazard ratios (forest plot)
cph.plot()
plt.title('Cox Regression: Hazard Ratios with 95% CI')
plt.tight_layout()
plt.show()
# Plot baseline survival curve
cph.baseline_survival_.plot()
plt.title('Baseline Survival Function')
plt.xlabel('Time')
plt.ylabel('Survival Probability')
plt.show()
# Check proportional hazards assumption
cph.check_assumptions(data, show_plots=True)
# Predict survival for new subjects
new_subject = pd.DataFrame({'age': [50], 'treatment': ['A'], 'stage': [2]})
survival = cph.predict_survival_function(new_subject)
survival.plot()
plt.title('Predicted Survival for New Subject')
plt.xlabel('Time')
plt.ylabel('Survival Probability')
plt.show()Get survival curves & median survival
Test if groups differ significantly
Adjust for covariates & hazard ratios