Regression Discontinuity Design (RDD) is a rigorous quasi-experimental method for estimating causal effects when treatment assignment is determined by whether an observable variable (the "running variable") crosses a known threshold. RDD provides credible causal estimates by comparing outcomes of units just above and just below the cutoff, exploiting the quasi-random assignment near the threshold.
💡 Key Assumptions: RDD requires (1) units cannot precisely manipulate their position relative to the threshold, and (2) potential outcomes are continuous at the cutoff. This calculator provides diagnostic tests to help you assess these assumptions.
Ready to estimate causal effects? (scholarship eligibility and college enrollment) to see how RDD works, or check the guide below to prepare your own data and discover the true impact of your policy or intervention.
Enter the mean outcomes just below and just above the cutoff. Standard deviations are optional but recommended for statistical inference.
Regression Discontinuity Design (RDD) is a quasi-experimental method for estimating causal effects when treatment is assigned based on whether a continuous "running variable" crosses a known threshold. Units just above and below the cutoff are assumed to be similar except for treatment status, allowing for causal identification of the local average treatment effect at the threshold.
Sharp RDD (Treatment Deterministic):
Where c is the cutoff, X is the running variable, and Y is the outcome
Local Linear Regression:
D = 1 if X ≥ c (treated), 0 otherwise. τ is the treatment effect at the cutoff.
The bandwidth determines which observations are included in the analysis. This calculator supports:
library(tidyverse)
library(rdrobust) # For RDD analysis
library(rddensity) # For McCrary density test
# ── Load sample data (download CSV from the calculator) ──────────────
data <- read_csv("regression-discontinuity-sample.csv")
cutoff <- 70
cat("=== Data Summary ===\n")
cat("N:", nrow(data), "\n")
cat("Below cutoff:", sum(data$test_score < cutoff), "\n")
cat("Above cutoff:", sum(data$test_score >= cutoff), "\n\n")
# ── Method 1: rdrobust (automatic bandwidth, triangular kernel) ──────
rdd_result <- rdrobust(
y = data$enrollment,
x = data$test_score,
c = cutoff,
kernel = "triangular",
p = 1 # Linear (matches our calculator default)
)
summary(rdd_result)
# ── Method 2: Manual local linear regression (matches backend) ───────
# Bandwidth selection (Imbens-Kalyanaraman)
bw_select <- rdbwselect(
y = data$enrollment,
x = data$test_score,
c = cutoff,
kernel = "triangular",
bwselect = "IK"
)
h_opt <- bw_select$bws[1, 1]
cat("\n=== Optimal Bandwidth (IK) ===\n")
cat("h_opt:", h_opt, "\n\n")
# Local linear regression within bandwidth (triangular kernel weights)
data_bw <- data %>%
mutate(x_centered = test_score - cutoff) %>%
filter(abs(x_centered) <= h_opt) %>%
mutate(
above = as.numeric(test_score >= cutoff),
kernel_weight = 1 - abs(x_centered) / h_opt # Triangular kernel
)
cat("=== Sample within bandwidth ===\n")
cat("N below:", sum(data_bw$above == 0), "\n")
cat("N above:", sum(data_bw$above == 1), "\n")
cat("N total:", nrow(data_bw), "\n")
cat("Effective N (sum of weights):", sum(data_bw$kernel_weight), "\n\n")
# Weighted least squares: Y = a + b1*X + tau*D + b2*D*X
lm_rdd <- lm(
enrollment ~ x_centered + above + x_centered:above,
data = data_bw,
weights = kernel_weight
)
cat("=== Treatment Effect (Local Linear Regression) ===\n")
coefs <- summary(lm_rdd)$coefficients
tau <- coefs["above", "Estimate"]
se_tau <- coefs["above", "Std. Error"]
t_stat <- coefs["above", "t value"]
p_val <- coefs["above", "Pr(>|t|)"]
df_resid <- lm_rdd$df.residual
ci <- confint(lm_rdd, "above", level = 0.95)
cat("Treatment effect (tau):", tau, "\n")
cat("Standard error:", se_tau, "\n")
cat("t-statistic:", t_stat, "\n")
cat("p-value:", p_val, "\n")
cat("95% CI: [", ci[1], ",", ci[2], "]\n")
cat("df:", df_resid, "\n\n")
# ── McCrary density test ─────────────────────────────────────────────
cat("=== McCrary Density Test ===\n")
density_test <- rddensity(data$test_score, c = cutoff)
summary(density_test)
# ── Robustness checks ───────────────────────────────────────────────
cat("\n=== Robustness Checks ===\n")
# Half bandwidth
cat("\n--- 50% Bandwidth (h =", h_opt * 0.5, ") ---\n")
rdrobust(data$enrollment, data$test_score, c = cutoff,
h = h_opt * 0.5, kernel = "triangular")
# 1.5x bandwidth
cat("\n--- 150% Bandwidth (h =", h_opt * 1.5, ") ---\n")
rdrobust(data$enrollment, data$test_score, c = cutoff,
h = h_opt * 1.5, kernel = "triangular")
# Quadratic polynomial
cat("\n--- Quadratic Polynomial ---\n")
rdrobust(data$enrollment, data$test_score, c = cutoff,
p = 2, kernel = "triangular")
# ── Visualization ────────────────────────────────────────────────────
data %>%
mutate(above = test_score >= cutoff) %>%
ggplot(aes(x = test_score, y = enrollment, color = above)) +
geom_point(alpha = 0.5, size = 2) +
geom_smooth(method = "lm", se = TRUE) +
geom_vline(xintercept = cutoff, linetype = "dashed", linewidth = 1) +
annotate("rect",
xmin = cutoff - h_opt, xmax = cutoff + h_opt,
ymin = -Inf, ymax = Inf,
fill = "gray", alpha = 0.15
) +
labs(
title = "Regression Discontinuity Design",
subtitle = paste0("Cutoff = ", cutoff, ", Bandwidth = ", round(h_opt, 2)),
x = "Test Score (Running Variable)",
y = "Enrollment Rate (Outcome)",
color = "Above Cutoff"
) +
theme_minimal()import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
# Example: Effect of scholarship eligibility on college enrollment
np.random.seed(123)
n = 500
test_score = np.random.uniform(40, 100, n)
cutoff = 70
# Treatment assignment
scholarship = (test_score >= cutoff).astype(int)
# True effect = 0.15 (15 percentage points)
enrollment = (0.3 + 0.01 * (test_score - cutoff) +
0.15 * scholarship +
0.005 * scholarship * (test_score - cutoff) +
np.random.normal(0, 0.1, n))
enrollment = np.clip(enrollment, 0, 1)
data = pd.DataFrame({
'test_score': test_score,
'scholarship': scholarship,
'enrollment': enrollment
})
# Optimal bandwidth (simplified Imbens-Kalyanaraman)
def optimal_bandwidth_ik(X, Y, cutoff):
X_centered = X - cutoff
left_mask = X_centered < 0
right_mask = X_centered >= 0
# Standard deviations
var_left = np.var(Y[left_mask])
var_right = np.var(Y[right_mask])
# Sample sizes
n_left = np.sum(left_mask)
n_right = np.sum(right_mask)
n = n_left + n_right
# Simplified formula
h_opt = 1.84 * np.std(X_centered) * (n ** (-1/5))
return h_opt
bandwidth = optimal_bandwidth_ik(
data['test_score'].values,
data['enrollment'].values,
cutoff
)
print(f"Optimal bandwidth: {bandwidth:.3f}")
# Local linear regression within bandwidth
data['x_centered'] = data['test_score'] - cutoff
data_bw = data[np.abs(data['x_centered']) <= bandwidth].copy()
data_bw['above'] = (data_bw['test_score'] >= cutoff).astype(int)
data_bw['above_x_centered'] = data_bw['above'] * data_bw['x_centered']
# Design matrix
X = data_bw[['x_centered', 'above', 'above_x_centered']].values
X = np.column_stack([np.ones(len(X)), X])
y = data_bw['enrollment'].values
# OLS estimation
beta = np.linalg.lstsq(X, y, rcond=None)[0]
y_pred = X @ beta
residuals = y - y_pred
# Standard errors
n_bw = len(y)
k = X.shape[1]
df = n_bw - k
mse = np.sum(residuals ** 2) / df
var_beta = mse * np.linalg.inv(X.T @ X)
se_beta = np.sqrt(np.diag(var_beta))
# Treatment effect (coefficient on 'above')
tau_idx = 2
tau = beta[tau_idx]
se_tau = se_beta[tau_idx]
t_stat = tau / se_tau
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))
print(f"Treatment Effect: {tau:.4f}")
print(f"Standard Error: {se_tau:.4f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"95% CI: [{tau - 1.96*se_tau:.4f}, {tau + 1.96*se_tau:.4f}]")
# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# RDD plot
data['bin'] = pd.cut(data['test_score'], bins=30)
binned = data.groupby(['bin', 'scholarship']).agg({
'test_score': 'mean',
'enrollment': 'mean'
}).reset_index()
for treated in [0, 1]:
subset = binned[binned['scholarship'] == treated]
label = 'Scholarship' if treated == 1 else 'No Scholarship'
color = 'red' if treated == 1 else 'blue'
ax1.scatter(subset['test_score'], subset['enrollment'],
label=label, color=color, alpha=0.6, s=50)
ax1.axvline(x=cutoff, color='black', linestyle='--',
label='Cutoff', linewidth=2)
ax1.set_xlabel('Test Score (Running Variable)')
ax1.set_ylabel('College Enrollment Rate')
ax1.set_title('Regression Discontinuity Design')
ax1.legend()
ax1.grid(True, alpha=0.3)
# Density plot (McCrary test)
ax2.hist(data[data['test_score'] < cutoff]['test_score'],
bins=20, alpha=0.6, color='blue', label='Below Cutoff')
ax2.hist(data[data['test_score'] >= cutoff]['test_score'],
bins=20, alpha=0.6, color='red', label='Above Cutoff')
ax2.axvline(x=cutoff, color='black', linestyle='--',
label='Cutoff', linewidth=2)
ax2.set_xlabel('Test Score')
ax2.set_ylabel('Frequency')
ax2.set_title('Density of Running Variable (McCrary Test)')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Robustness checks
print("\n=== Robustness Checks ===")
for bw_mult in [0.5, 1.5]:
h = bandwidth * bw_mult
data_rob = data[np.abs(data['x_centered']) <= h].copy()
# ... (repeat estimation with different bandwidth)
print(f"Bandwidth {bw_mult}x: n={len(data_rob)}")