Cox Proportional Hazards Model

Created:March 7, 2026

Last Updated:March 7, 2026

The Cox Proportional Hazards Model is a semi-parametric regression model used to examine the relationship between covariates and survival time. It estimates hazard ratios (HR) that quantify the effect of each covariate on the risk of the event occurring, making it the most widely used method for multivariable survival analysis in clinical research and epidemiology.

What You'll Get:

Hazard Ratios: Effect of each covariate on survival risk with confidence intervals
Forest Plot: Visual display of hazard ratios and confidence intervals
Model Fit: Concordance index, AIC, BIC, and log-likelihood
Baseline Survival: Estimated survival curve at reference covariate values
Schoenfeld Test: Proportional hazards assumption check for each covariate
APA-Formatted Report: Professional results ready for publication

Pro Tip: Your data needs a time column, an event column (1 = event, 0 = censored), and one or more covariate columns. Categorical columns (text values) are automatically dummy-coded. For simple group comparison without covariates, use the Log-Rank Test.

Software Implementation Differences

The proportional hazards assumption test uses scaled Schoenfeld residuals (Grambsch & Therneau, 1994), implemented in Python via lifelines. Results may differ numerically from R's cox.zph(), which uses an exact score test implemented in compiled C code. Conclusions about whether the proportional hazards assumption is satisfied or violated should generally agree.

Ready to fit a Cox model? (clinical trial with age, treatment, and disease stage as covariates) to see Cox regression in action, or upload your own survival data.

Calculator

1. Load Your Data

2. Select Columns & Options

Time Column *

Time to event or censoring

Event Column *

1 = event, 0 = censored

Covariate Columns *

Load data and select Time/Event columns first

Categorical columns (text values) are automatically dummy-coded

Confidence Level:

Related Calculators

Kaplan-Meier Estimator Calculator

Log-Rank Test Calculator

Multiple Linear Regression Calculator

Learn More

Definition

The Cox Proportional Hazards Model, introduced by Sir David Cox in 1972, is a semi-parametric regression model for survival data. Unlike parametric models, it makes no assumption about the shape of the baseline hazard function. It models the hazard rate as a multiplicative function of covariates, making it the most widely used method for multivariable survival analysis in clinical research and epidemiology.

The Cox model hazard function at time t for subject with covariates X:

h(t \mid X) = h_0(t) \cdot \exp(\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p)

where $h_0(t)$ is the unspecified baseline hazard function and $\beta_j$ are the regression coefficients. The hazard ratio for covariate $j$ is $\text{HR}_j = e^{\beta_j}$ .

Data Format Requirements

Correct Data Format

Time	Event	Meaning
5	1	Event occurred
8	0	Censored
12	1	Event occurred
15	0	Censored
18	1	Event occurred

Time: Any non-negative number (days, months, years, etc.)

Event: 1 = event occurred, 0 = censored

Common Data Issues

Event column has wrong values

Time	Event
5	yes
8	no

❌ Event must be 1 (event) or 0 (censored)

Negative time values

Time	Event
-5	1
8	0

❌ Time values must be zero or positive

Missing required columns

Time	Event
5	—
8	—

❌ Missing Event column

When to Use Cox Regression

Use the Cox proportional hazards model when:

Multiple predictors: You want to model the effect of several covariates on survival simultaneously
Confounding adjustment: You need to control for confounders while estimating a treatment or exposure effect
Prognostic modeling: You want to identify prognostic factors and build risk scores
Censored survival data: Some subjects have incomplete follow-up (right-censoring)
No distribution assumption:You don't want to commit to a specific parametric survival distribution
Sufficient events: You have at least 5–10 events per covariate for stable estimates

Requirements & Assumptions

Proportional hazards: The hazard ratio between any two individuals must be constant over time — the most critical assumption. Test with the Schoenfeld residuals test (provided in results).
Linear log-hazard: The log-hazard is a linear function of covariates. Check with martingale residuals for continuous variables.
Independent censoring: Censoring must be unrelated to the probability of the event occurring.
Events per variable (EPV): At least 5–10 events per covariate to avoid overfitting and unstable estimates.
No negative times: All time values must be non-negative.
Event indicator: Binary variable (1 = event occurred, 0 = censored).

Interpreting Results

Hazard Ratios (HR = exp(β)):

HR = 1: No effect on hazard — covariate does not change risk
HR > 1: Increased hazard (worse survival) — e.g., HR = 2.0 means twice the instantaneous risk
HR < 1: Decreased hazard (protective effect) — e.g., HR = 0.5 means half the risk
For continuous covariates, the HR represents the multiplicative change in hazard per one-unit increase in the covariate
For categorical covariates, the HR compares each level to the reference category (lowest alphabetically or numerically)

Concordance Index (C-statistic):

Measures the model's ability to correctly rank pairs of subjects by risk
Ranges from 0.5 (no discrimination) to 1.0 (perfect discrimination)
C ≥ 0.7 is generally considered acceptable

Schoenfeld Residuals Test (Proportional Hazards):

Tests whether the hazard ratio for each covariate is constant over time
A significant p-value (p < 0.05) suggests the proportional hazards assumption is violated for that covariate
Remedies include adding time-interaction terms, stratification, or using a time-varying coefficient model

p-values for Coefficients:

Tests whether the covariate has a statistically significant effect on survival
Based on the Wald test: $z = \hat{\beta} / \text{SE}(\hat{\beta})$

Example Code

R Code

library(survival)

# Sample clinical trial data
data <- data.frame(
  time = c(6,7,10,15,19,25,30,33,42,45,46,52,53,54,59,
           4,8,12,17,22,26,28,35,38,44,47,51,55,57,60),
  event = c(1,1,1,1,0,1,1,1,0,1,0,1,0,1,0,
            1,1,0,1,1,1,0,1,1,0,1,1,0,1,0),
  age = c(45,52,38,60,47,55,41,63,50,48,57,44,66,39,53,
          42,58,35,67,49,54,46,62,51,43,59,37,64,56,48),
  treatment = c(rep("A",15), rep("B",15)),
  stage = c(1,2,1,3,2,2,1,3,2,1,3,2,3,1,2,
            2,1,2,3,2,1,3,2,1,2,3,1,2,2,3)
)

# Fit Cox proportional hazards model
cox_model <- coxph(Surv(time, event) ~ age + treatment + stage, data = data)

# Print summary
summary(cox_model)

# Test proportional hazards assumption
cox.zph(cox_model)

Python Code

Python

import numpy as np
import pandas as pd
from lifelines import CoxPHFitter
import matplotlib.pyplot as plt

# Sample clinical trial data
data = pd.DataFrame({
    'time': [6,7,10,15,19,25,30,33,42,45,46,52,53,54,59,
             4,8,12,17,22,26,28,35,38,44,47,51,55,57,60],
    'event': [1,1,1,1,0,1,1,1,0,1,0,1,0,1,0,
              1,1,0,1,1,1,0,1,1,0,1,1,0,1,0],
    'age': [45,52,38,60,47,55,41,63,50,48,57,44,66,39,53,
            42,58,35,67,49,54,46,62,51,43,59,37,64,56,48],
    'treatment': ['A']*15 + ['B']*15,
    'stage': [1,2,1,3,2,2,1,3,2,1,3,2,3,1,2,
              2,1,2,3,2,1,3,2,1,2,3,1,2,2,3]
})

# Fit Cox proportional hazards model
cph = CoxPHFitter()
cph.fit(data, duration_col='time', event_col='event')

# Print summary with hazard ratios
cph.print_summary()

# Plot hazard ratios (forest plot)
cph.plot()
plt.title('Cox Regression: Hazard Ratios with 95% CI')
plt.tight_layout()
plt.show()

# Plot baseline survival curve
cph.baseline_survival_.plot()
plt.title('Baseline Survival Function')
plt.xlabel('Time')
plt.ylabel('Survival Probability')
plt.show()

# Check proportional hazards assumption
cph.check_assumptions(data, show_plots=True)

# Predict survival for new subjects
new_subject = pd.DataFrame({'age': [50], 'treatment': ['A'], 'stage': [2]})
survival = cph.predict_survival_function(new_subject)
survival.plot()
plt.title('Predicted Survival for New Subject')
plt.xlabel('Time')
plt.ylabel('Survival Probability')
plt.show()

Choosing the Right Test

Which Survival Analysis Test Should I Use?

I have time-to-event data

What is my goal?

Describe/estimate survival

Kaplan-Meier Estimator

Get survival curves & median survival

Compare 2+ groups

Log-Rank Test

Test if groups differ significantly

Model multiple factors

Cox Regression

Adjust for covariates & hazard ratios

Kaplan-Meier

• Estimate survival curves
• Find median survival time
• Visualize survival by group
• No formal hypothesis test

Log-Rank Test

• Compare 2+ groups
• Get p-value for difference
• Test overall survival
• Single grouping variable

Cox Regression

• Multiple covariates
• Hazard ratios
• Adjust for confounders
• Most comprehensive

Quick Decision Guide:

Start with Kaplan-Meier to visualize your data and understand survival patterns
Use Log-Rank Test when you have 2+ groups and want to test if they differ (e.g., treatment vs control)
Use Cox Regression when you have multiple variables (age, treatment, stage, etc.) and want to model their effects

Verification

Time

Event

Meaning

Event occurred

Censored

Event occurred

Censored

Event occurred

Time

Event

yes

Time

Event

-5

Time

Event

—

library(survival) # Sample clinical trial data data <- data.frame( time = c(6,7,10,15,19,25,30,33,42,45,46,52,53,54,59, 4,8,12,17,22,26,28,35,38,44,47,51,55,57,60), event = c(1,1,1,1,0,1,1,1,0,1,0,1,0,1,0, 1,1,0,1,1,1,0,1,1,0,1,1,0,1,0), age = c(45,52,38,60,47,55,41,63,50,48,57,44,66,39,53, 42,58,35,67,49,54,46,62,51,43,59,37,64,56,48), treatment = c(rep("A",15), rep("B",15)), stage = c(1,2,1,3,2,2,1,3,2,1,3,2,3,1,2, 2,1,2,3,2,1,3,2,1,2,3,1,2,2,3) ) # Fit Cox proportional hazards model cox_model <- coxph(Surv(time, event) ~ age + treatment + stage, data = data) # Print summary summary(cox_model) # Test proportional hazards assumption cox.zph(cox_model)

import numpy as np import pandas as pd from lifelines import CoxPHFitter import matplotlib.pyplot as plt # Sample clinical trial data data = pd.DataFrame({ 'time': [6,7,10,15,19,25,30,33,42,45,46,52,53,54,59, 4,8,12,17,22,26,28,35,38,44,47,51,55,57,60], 'event': [1,1,1,1,0,1,1,1,0,1,0,1,0,1,0, 1,1,0,1,1,1,0,1,1,0,1,1,0,1,0], 'age': [45,52,38,60,47,55,41,63,50,48,57,44,66,39,53, 42,58,35,67,49,54,46,62,51,43,59,37,64,56,48], 'treatment': ['A']*15 + ['B']*15, 'stage': [1,2,1,3,2,2,1,3,2,1,3,2,3,1,2, 2,1,2,3,2,1,3,2,1,2,3,1,2,2,3] }) # Fit Cox proportional hazards model cph = CoxPHFitter() cph.fit(data, duration_col='time', event_col='event') # Print summary with hazard ratios cph.print_summary() # Plot hazard ratios (forest plot) cph.plot() plt.title('Cox Regression: Hazard Ratios with 95% CI') plt.tight_layout() plt.show() # Plot baseline survival curve cph.baseline_survival_.plot() plt.title('Baseline Survival Function') plt.xlabel('Time') plt.ylabel('Survival Probability') plt.show() # Check proportional hazards assumption cph.check_assumptions(data, show_plots=True) # Predict survival for new subjects new_subject = pd.DataFrame({'age': [50], 'treatment': ['A'], 'stage': [2]}) survival = cph.predict_survival_function(new_subject) survival.plot() plt.title('Predicted Survival for New Subject') plt.xlabel('Time') plt.ylabel('Survival Probability') plt.show()

Cox Proportional Hazards Model

What You'll Get:

Software Implementation Differences

Calculator

1. Load Your Data

2. Select Columns & Options

Related Calculators

Kaplan-Meier Estimator Calculator

Log-Rank Test Calculator

Multiple Linear Regression Calculator

Learn More

Definition

Data Format Requirements

Correct Data Format

Common Data Issues

Event column has wrong values

Negative time values

Missing required columns

When to Use Cox Regression

Requirements & Assumptions

Interpreting Results

Example Code

R Code

Python Code

Choosing the Right Test

Which Survival Analysis Test Should I Use?

Kaplan-Meier

Log-Rank Test

Cox Regression

Quick Decision Guide:

Verification

View Verification Details

Cox Proportional Hazards Model

What You'll Get:

Software Implementation Differences

Calculator

1. Load Your Data

2. Select Columns & Options

Related Calculators

Kaplan-Meier Estimator Calculator

Log-Rank Test Calculator

Multiple Linear Regression Calculator

Learn More

Definition

Data Format Requirements

Correct Data Format

Common Data Issues

Event column has wrong values

Negative time values

Missing required columns

When to Use Cox Regression

Requirements & Assumptions

Interpreting Results

Example Code

R Code

Python Code

Choosing the Right Test

Which Survival Analysis Test Should I Use?

Kaplan-Meier

Log-Rank Test

Cox Regression

Quick Decision Guide:

Verification

View Verification Details