Under Construction

This calculator is still under development. Some of the functionality may not work as expected. Please use with caution and verify results with other tools.

Regression Discontinuity Design (RDD)

Created:November 28, 2025

Last Updated:November 28, 2025

Regression Discontinuity Design (RDD) is a rigorous quasi-experimental method for estimating causal effects when treatment assignment is determined by whether an observable variable (the "running variable") crosses a known threshold. RDD provides credible causal estimates by comparing outcomes of units just above and just below the cutoff, exploiting the quasi-random assignment near the threshold.

What You'll Get:

Local Average Treatment Effect: Causal effect at the cutoff threshold
Optimal Bandwidth Selection: Automatic calculation using Imbens-Kalyanaraman method
RDD Visualization: Clear plot showing discontinuity at cutoff
McCrary Density Test: Test for manipulation of running variable
Robustness Checks: Sensitivity analysis with different bandwidths and polynomial orders
Step-by-Step Calculations: Understand exactly how RDD estimates are derived
APA-Ready Report: Publication-quality results for your research

💡 Key Assumptions: RDD requires (1) units cannot precisely manipulate their position relative to the threshold, and (2) potential outcomes are continuous at the cutoff. This calculator provides diagnostic tests to help you assess these assumptions.

Ready to estimate causal effects? (scholarship eligibility and college enrollment) to see how RDD works, or check the guide below to prepare your own data and discover the true impact of your policy or intervention.

Calculator

Enter Summary Statistics

Enter the mean outcomes just below and just above the cutoff. Standard deviations are optional but recommended for statistical inference.

Below Cutoff (Control)

Mean Outcome *

Sample Size *

Std Dev (optional)

Above Cutoff (Treatment)

Mean Outcome *

Sample Size *

Std Dev (optional)

Cutoff Value *

Confidence Level:

Related Calculators

Difference-in-Differences Calculator

Linear Regression Calculator

Two-Sample T-Test Calculator

Learn More

What is Regression Discontinuity Design (RDD)?

Definition

Regression Discontinuity Design (RDD) is a quasi-experimental method for estimating causal effects when treatment is assigned based on whether a continuous "running variable" crosses a known threshold. Units just above and below the cutoff are assumed to be similar except for treatment status, allowing for causal identification of the local average treatment effect at the threshold.

The RDD Estimator

Sharp RDD (Treatment Deterministic):

\tau_{RDD} = \lim_{x \downarrow c} E[Y_i | X_i = x] - \lim_{x \uparrow c} E[Y_i | X_i = x]

Where c is the cutoff, X is the running variable, and Y is the outcome

Local Linear Regression:

Y_i = \alpha + \beta_1(X_i - c) + \tau \cdot D_i + \beta_2 \cdot D_i(X_i - c) + \varepsilon_i

D = 1 if X ≥ c (treated), 0 otherwise. τ is the treatment effect at the cutoff.

Key Assumptions

No Manipulation: Units cannot precisely control their position relative to the cutoff
Continuity: Potential outcomes are continuous at the cutoff in the absence of treatment
Local Randomization: Units near the threshold are essentially randomly assigned
Correct Functional Form: Relationship between running variable and outcome is properly modeled

When to Use RDD

Program Eligibility: Financial aid, scholarship programs based on test scores or GPA

Policy Thresholds: Minimum wage changes, pollution regulations based on firm size

Electoral Studies: Incumbent advantage, party effects at 50% vote threshold

Health Interventions: Treatment based on clinical thresholds (e.g., BMI, blood pressure)

Bandwidth Selection

The bandwidth determines which observations are included in the analysis. This calculator supports:

Automatic (Recommended): Uses the Imbens-Kalyanaraman optimal bandwidth selector
Manual: Specify your own bandwidth based on domain knowledge
Robustness: Results include sensitivity analysis with 50% and 150% of optimal bandwidth

Code Examples

library(tidyverse)
library(rdrobust)  # For RDD analysis
library(rddensity) # For McCrary density test

# ── Load sample data (download CSV from the calculator) ──────────────
data <- read_csv("regression-discontinuity-sample.csv")
cutoff <- 70

cat("=== Data Summary ===\n")
cat("N:", nrow(data), "\n")
cat("Below cutoff:", sum(data$test_score < cutoff), "\n")
cat("Above cutoff:", sum(data$test_score >= cutoff), "\n\n")

# ── Method 1: rdrobust (automatic bandwidth, triangular kernel) ──────
rdd_result <- rdrobust(
  y = data$enrollment,
  x = data$test_score,
  c = cutoff,
  kernel = "triangular",
  p = 1  # Linear (matches our calculator default)
)
summary(rdd_result)

# ── Method 2: Manual local linear regression (matches backend) ───────
# Bandwidth selection (Imbens-Kalyanaraman)
bw_select <- rdbwselect(
  y = data$enrollment,
  x = data$test_score,
  c = cutoff,
  kernel = "triangular",
  bwselect = "IK"
)
h_opt <- bw_select$bws[1, 1]
cat("\n=== Optimal Bandwidth (IK) ===\n")
cat("h_opt:", h_opt, "\n\n")

# Local linear regression within bandwidth (triangular kernel weights)
data_bw <- data %>%
  mutate(x_centered = test_score - cutoff) %>%
  filter(abs(x_centered) <= h_opt) %>%
  mutate(
    above = as.numeric(test_score >= cutoff),
    kernel_weight = 1 - abs(x_centered) / h_opt  # Triangular kernel
  )

cat("=== Sample within bandwidth ===\n")
cat("N below:", sum(data_bw$above == 0), "\n")
cat("N above:", sum(data_bw$above == 1), "\n")
cat("N total:", nrow(data_bw), "\n")
cat("Effective N (sum of weights):", sum(data_bw$kernel_weight), "\n\n")

# Weighted least squares: Y = a + b1*X + tau*D + b2*D*X
lm_rdd <- lm(
  enrollment ~ x_centered + above + x_centered:above,
  data = data_bw,
  weights = kernel_weight
)
cat("=== Treatment Effect (Local Linear Regression) ===\n")
coefs <- summary(lm_rdd)$coefficients
tau <- coefs["above", "Estimate"]
se_tau <- coefs["above", "Std. Error"]
t_stat <- coefs["above", "t value"]
p_val <- coefs["above", "Pr(>|t|)"]
df_resid <- lm_rdd$df.residual
ci <- confint(lm_rdd, "above", level = 0.95)
cat("Treatment effect (tau):", tau, "\n")
cat("Standard error:", se_tau, "\n")
cat("t-statistic:", t_stat, "\n")
cat("p-value:", p_val, "\n")
cat("95% CI: [", ci[1], ",", ci[2], "]\n")
cat("df:", df_resid, "\n\n")

# ── McCrary density test ─────────────────────────────────────────────
cat("=== McCrary Density Test ===\n")
density_test <- rddensity(data$test_score, c = cutoff)
summary(density_test)

# ── Robustness checks ───────────────────────────────────────────────
cat("\n=== Robustness Checks ===\n")

# Half bandwidth
cat("\n--- 50% Bandwidth (h =", h_opt * 0.5, ") ---\n")
rdrobust(data$enrollment, data$test_score, c = cutoff,
         h = h_opt * 0.5, kernel = "triangular")

# 1.5x bandwidth
cat("\n--- 150% Bandwidth (h =", h_opt * 1.5, ") ---\n")
rdrobust(data$enrollment, data$test_score, c = cutoff,
         h = h_opt * 1.5, kernel = "triangular")

# Quadratic polynomial
cat("\n--- Quadratic Polynomial ---\n")
rdrobust(data$enrollment, data$test_score, c = cutoff,
         p = 2, kernel = "triangular")

# ── Visualization ────────────────────────────────────────────────────
data %>%
  mutate(above = test_score >= cutoff) %>%
  ggplot(aes(x = test_score, y = enrollment, color = above)) +
  geom_point(alpha = 0.5, size = 2) +
  geom_smooth(method = "lm", se = TRUE) +
  geom_vline(xintercept = cutoff, linetype = "dashed", linewidth = 1) +
  annotate("rect",
    xmin = cutoff - h_opt, xmax = cutoff + h_opt,
    ymin = -Inf, ymax = Inf,
    fill = "gray", alpha = 0.15
  ) +
  labs(
    title = "Regression Discontinuity Design",
    subtitle = paste0("Cutoff = ", cutoff, ", Bandwidth = ", round(h_opt, 2)),
    x = "Test Score (Running Variable)",
    y = "Enrollment Rate (Outcome)",
    color = "Above Cutoff"
  ) +
  theme_minimal()

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# Example: Effect of scholarship eligibility on college enrollment
np.random.seed(123)
n = 500
test_score = np.random.uniform(40, 100, n)
cutoff = 70

# Treatment assignment
scholarship = (test_score >= cutoff).astype(int)

# True effect = 0.15 (15 percentage points)
enrollment = (0.3 + 0.01 * (test_score - cutoff) +
              0.15 * scholarship +
              0.005 * scholarship * (test_score - cutoff) +
              np.random.normal(0, 0.1, n))
enrollment = np.clip(enrollment, 0, 1)

data = pd.DataFrame({
    'test_score': test_score,
    'scholarship': scholarship,
    'enrollment': enrollment
})

# Optimal bandwidth (simplified Imbens-Kalyanaraman)
def optimal_bandwidth_ik(X, Y, cutoff):
    X_centered = X - cutoff
    left_mask = X_centered < 0
    right_mask = X_centered >= 0

    # Standard deviations
    var_left = np.var(Y[left_mask])
    var_right = np.var(Y[right_mask])

    # Sample sizes
    n_left = np.sum(left_mask)
    n_right = np.sum(right_mask)
    n = n_left + n_right

    # Simplified formula
    h_opt = 1.84 * np.std(X_centered) * (n ** (-1/5))
    return h_opt

bandwidth = optimal_bandwidth_ik(
    data['test_score'].values,
    data['enrollment'].values,
    cutoff
)
print(f"Optimal bandwidth: {bandwidth:.3f}")

# Local linear regression within bandwidth
data['x_centered'] = data['test_score'] - cutoff
data_bw = data[np.abs(data['x_centered']) <= bandwidth].copy()
data_bw['above'] = (data_bw['test_score'] >= cutoff).astype(int)
data_bw['above_x_centered'] = data_bw['above'] * data_bw['x_centered']

# Design matrix
X = data_bw[['x_centered', 'above', 'above_x_centered']].values
X = np.column_stack([np.ones(len(X)), X])
y = data_bw['enrollment'].values

# OLS estimation
beta = np.linalg.lstsq(X, y, rcond=None)[0]
y_pred = X @ beta
residuals = y - y_pred

# Standard errors
n_bw = len(y)
k = X.shape[1]
df = n_bw - k
mse = np.sum(residuals ** 2) / df
var_beta = mse * np.linalg.inv(X.T @ X)
se_beta = np.sqrt(np.diag(var_beta))

# Treatment effect (coefficient on 'above')
tau_idx = 2
tau = beta[tau_idx]
se_tau = se_beta[tau_idx]
t_stat = tau / se_tau
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))

print(f"Treatment Effect: {tau:.4f}")
print(f"Standard Error: {se_tau:.4f}")
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
print(f"95% CI: [{tau - 1.96*se_tau:.4f}, {tau + 1.96*se_tau:.4f}]")

# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# RDD plot
data['bin'] = pd.cut(data['test_score'], bins=30)
binned = data.groupby(['bin', 'scholarship']).agg({
    'test_score': 'mean',
    'enrollment': 'mean'
}).reset_index()

for treated in [0, 1]:
    subset = binned[binned['scholarship'] == treated]
    label = 'Scholarship' if treated == 1 else 'No Scholarship'
    color = 'red' if treated == 1 else 'blue'
    ax1.scatter(subset['test_score'], subset['enrollment'],
                label=label, color=color, alpha=0.6, s=50)

ax1.axvline(x=cutoff, color='black', linestyle='--',
            label='Cutoff', linewidth=2)
ax1.set_xlabel('Test Score (Running Variable)')
ax1.set_ylabel('College Enrollment Rate')
ax1.set_title('Regression Discontinuity Design')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Density plot (McCrary test)
ax2.hist(data[data['test_score'] < cutoff]['test_score'],
         bins=20, alpha=0.6, color='blue', label='Below Cutoff')
ax2.hist(data[data['test_score'] >= cutoff]['test_score'],
         bins=20, alpha=0.6, color='red', label='Above Cutoff')
ax2.axvline(x=cutoff, color='black', linestyle='--',
            label='Cutoff', linewidth=2)
ax2.set_xlabel('Test Score')
ax2.set_ylabel('Frequency')
ax2.set_title('Density of Running Variable (McCrary Test)')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Robustness checks
print("\n=== Robustness Checks ===")
for bw_mult in [0.5, 1.5]:
    h = bandwidth * bw_mult
    data_rob = data[np.abs(data['x_centered']) <= h].copy()
    # ... (repeat estimation with different bandwidth)
    print(f"Bandwidth {bw_mult}x: n={len(data_rob)}")

Interpreting Results

Treatment Effect: The estimated causal effect for units at the cutoff threshold
Local Effect: RDD estimates a local effect, not the average effect across all units
McCrary Test: Non-significant p-value (p 0.05) suggests no manipulation
Robustness: Consistent estimates across different bandwidths strengthen causal claims

Common Pitfalls

Manipulation: If units can game the system to cross the threshold, RDD estimates are biased
Incorrect Bandwidth: Too narrow reduces power; too wide introduces bias
Wrong Polynomial Order: Overfitting with high-order polynomials can distort estimates
Multiple Cutoffs: Other thresholds near the focal cutoff can confound results
Discontinuities in Covariates: If other variables jump at cutoff, continuity assumption is violated

Verification

library(tidyverse) library(rdrobust) # For RDD analysis library(rddensity) # For McCrary density test # ── Load sample data (download CSV from the calculator) ────────────── data <- read_csv("regression-discontinuity-sample.csv") cutoff <- 70 cat("=== Data Summary ===\n") cat("N:", nrow(data), "\n") cat("Below cutoff:", sum(data$test_score < cutoff), "\n") cat("Above cutoff:", sum(data$test_score >= cutoff), "\n\n") # ── Method 1: rdrobust (automatic bandwidth, triangular kernel) ────── rdd_result <- rdrobust( y = data$enrollment, x = data$test_score, c = cutoff, kernel = "triangular", p = 1 # Linear (matches our calculator default) ) summary(rdd_result) # ── Method 2: Manual local linear regression (matches backend) ─────── # Bandwidth selection (Imbens-Kalyanaraman) bw_select <- rdbwselect( y = data$enrollment, x = data$test_score, c = cutoff, kernel = "triangular", bwselect = "IK" ) h_opt <- bw_select$bws[1, 1] cat("\n=== Optimal Bandwidth (IK) ===\n") cat("h_opt:", h_opt, "\n\n") # Local linear regression within bandwidth (triangular kernel weights) data_bw <- data %>% mutate(x_centered = test_score - cutoff) %>% filter(abs(x_centered) <= h_opt) %>% mutate( above = as.numeric(test_score >= cutoff), kernel_weight = 1 - abs(x_centered) / h_opt # Triangular kernel ) cat("=== Sample within bandwidth ===\n") cat("N below:", sum(data_bw$above == 0), "\n") cat("N above:", sum(data_bw$above == 1), "\n") cat("N total:", nrow(data_bw), "\n") cat("Effective N (sum of weights):", sum(data_bw$kernel_weight), "\n\n") # Weighted least squares: Y = a + b1*X + tau*D + b2*D*X lm_rdd <- lm( enrollment ~ x_centered + above + x_centered:above, data = data_bw, weights = kernel_weight ) cat("=== Treatment Effect (Local Linear Regression) ===\n") coefs <- summary(lm_rdd)$coefficients tau <- coefs["above", "Estimate"] se_tau <- coefs["above", "Std. Error"] t_stat <- coefs["above", "t value"] p_val <- coefs["above", "Pr(>|t|)"] df_resid <- lm_rdd$df.residual ci <- confint(lm_rdd, "above", level = 0.95) cat("Treatment effect (tau):", tau, "\n") cat("Standard error:", se_tau, "\n") cat("t-statistic:", t_stat, "\n") cat("p-value:", p_val, "\n") cat("95% CI: [", ci[1], ",", ci[2], "]\n") cat("df:", df_resid, "\n\n") # ── McCrary density test ───────────────────────────────────────────── cat("=== McCrary Density Test ===\n") density_test <- rddensity(data$test_score, c = cutoff) summary(density_test) # ── Robustness checks ─────────────────────────────────────────────── cat("\n=== Robustness Checks ===\n") # Half bandwidth cat("\n--- 50% Bandwidth (h =", h_opt * 0.5, ") ---\n") rdrobust(data$enrollment, data$test_score, c = cutoff, h = h_opt * 0.5, kernel = "triangular") # 1.5x bandwidth cat("\n--- 150% Bandwidth (h =", h_opt * 1.5, ") ---\n") rdrobust(data$enrollment, data$test_score, c = cutoff, h = h_opt * 1.5, kernel = "triangular") # Quadratic polynomial cat("\n--- Quadratic Polynomial ---\n") rdrobust(data$enrollment, data$test_score, c = cutoff, p = 2, kernel = "triangular") # ── Visualization ──────────────────────────────────────────────────── data %>% mutate(above = test_score >= cutoff) %>% ggplot(aes(x = test_score, y = enrollment, color = above)) + geom_point(alpha = 0.5, size = 2) + geom_smooth(method = "lm", se = TRUE) + geom_vline(xintercept = cutoff, linetype = "dashed", linewidth = 1) + annotate("rect", xmin = cutoff - h_opt, xmax = cutoff + h_opt, ymin = -Inf, ymax = Inf, fill = "gray", alpha = 0.15 ) + labs( title = "Regression Discontinuity Design", subtitle = paste0("Cutoff = ", cutoff, ", Bandwidth = ", round(h_opt, 2)), x = "Test Score (Running Variable)", y = "Enrollment Rate (Outcome)", color = "Above Cutoff" ) + theme_minimal()

import numpy as np import pandas as pd import matplotlib.pyplot as plt from scipy import stats # Example: Effect of scholarship eligibility on college enrollment np.random.seed(123) n = 500 test_score = np.random.uniform(40, 100, n) cutoff = 70 # Treatment assignment scholarship = (test_score >= cutoff).astype(int) # True effect = 0.15 (15 percentage points) enrollment = (0.3 + 0.01 * (test_score - cutoff) + 0.15 * scholarship + 0.005 * scholarship * (test_score - cutoff) + np.random.normal(0, 0.1, n)) enrollment = np.clip(enrollment, 0, 1) data = pd.DataFrame({ 'test_score': test_score, 'scholarship': scholarship, 'enrollment': enrollment }) # Optimal bandwidth (simplified Imbens-Kalyanaraman) def optimal_bandwidth_ik(X, Y, cutoff): X_centered = X - cutoff left_mask = X_centered < 0 right_mask = X_centered >= 0 # Standard deviations var_left = np.var(Y[left_mask]) var_right = np.var(Y[right_mask]) # Sample sizes n_left = np.sum(left_mask) n_right = np.sum(right_mask) n = n_left + n_right # Simplified formula h_opt = 1.84 * np.std(X_centered) * (n ** (-1/5)) return h_opt bandwidth = optimal_bandwidth_ik( data['test_score'].values, data['enrollment'].values, cutoff ) print(f"Optimal bandwidth: {bandwidth:.3f}") # Local linear regression within bandwidth data['x_centered'] = data['test_score'] - cutoff data_bw = data[np.abs(data['x_centered']) <= bandwidth].copy() data_bw['above'] = (data_bw['test_score'] >= cutoff).astype(int) data_bw['above_x_centered'] = data_bw['above'] * data_bw['x_centered'] # Design matrix X = data_bw[['x_centered', 'above', 'above_x_centered']].values X = np.column_stack([np.ones(len(X)), X]) y = data_bw['enrollment'].values # OLS estimation beta = np.linalg.lstsq(X, y, rcond=None)[0] y_pred = X @ beta residuals = y - y_pred # Standard errors n_bw = len(y) k = X.shape[1] df = n_bw - k mse = np.sum(residuals ** 2) / df var_beta = mse * np.linalg.inv(X.T @ X) se_beta = np.sqrt(np.diag(var_beta)) # Treatment effect (coefficient on 'above') tau_idx = 2 tau = beta[tau_idx] se_tau = se_beta[tau_idx] t_stat = tau / se_tau p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df)) print(f"Treatment Effect: {tau:.4f}") print(f"Standard Error: {se_tau:.4f}") print(f"t-statistic: {t_stat:.4f}") print(f"p-value: {p_value:.4f}") print(f"95% CI: [{tau - 1.96*se_tau:.4f}, {tau + 1.96*se_tau:.4f}]") # Visualization fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5)) # RDD plot data['bin'] = pd.cut(data['test_score'], bins=30) binned = data.groupby(['bin', 'scholarship']).agg({ 'test_score': 'mean', 'enrollment': 'mean' }).reset_index() for treated in [0, 1]: subset = binned[binned['scholarship'] == treated] label = 'Scholarship' if treated == 1 else 'No Scholarship' color = 'red' if treated == 1 else 'blue' ax1.scatter(subset['test_score'], subset['enrollment'], label=label, color=color, alpha=0.6, s=50) ax1.axvline(x=cutoff, color='black', linestyle='--', label='Cutoff', linewidth=2) ax1.set_xlabel('Test Score (Running Variable)') ax1.set_ylabel('College Enrollment Rate') ax1.set_title('Regression Discontinuity Design') ax1.legend() ax1.grid(True, alpha=0.3) # Density plot (McCrary test) ax2.hist(data[data['test_score'] < cutoff]['test_score'], bins=20, alpha=0.6, color='blue', label='Below Cutoff') ax2.hist(data[data['test_score'] >= cutoff]['test_score'], bins=20, alpha=0.6, color='red', label='Above Cutoff') ax2.axvline(x=cutoff, color='black', linestyle='--', label='Cutoff', linewidth=2) ax2.set_xlabel('Test Score') ax2.set_ylabel('Frequency') ax2.set_title('Density of Running Variable (McCrary Test)') ax2.legend() ax2.grid(True, alpha=0.3) plt.tight_layout() plt.show() # Robustness checks print("\n=== Robustness Checks ===") for bw_mult in [0.5, 1.5]: h = bandwidth * bw_mult data_rob = data[np.abs(data['x_centered']) <= h].copy() # ... (repeat estimation with different bandwidth) print(f"Bandwidth {bw_mult}x: n={len(data_rob)}")

Under Construction

Regression Discontinuity Design (RDD)

What You'll Get:

Calculator

Enter Summary Statistics

Below Cutoff (Control)

Above Cutoff (Treatment)

Related Calculators

Difference-in-Differences Calculator

Linear Regression Calculator

Two-Sample T-Test Calculator

Learn More

What is Regression Discontinuity Design (RDD)?

Definition

The RDD Estimator

Key Assumptions

When to Use RDD

Bandwidth Selection

Code Examples

Interpreting Results

Common Pitfalls

Verification

View Verification Details

Under Construction

Regression Discontinuity Design (RDD)

What You'll Get:

Calculator

Enter Summary Statistics

Below Cutoff (Control)

Above Cutoff (Treatment)

Related Calculators

Difference-in-Differences Calculator

Linear Regression Calculator

Two-Sample T-Test Calculator

Learn More

What is Regression Discontinuity Design (RDD)?

Definition

The RDD Estimator

Key Assumptions

When to Use RDD

Bandwidth Selection

Code Examples

Interpreting Results

Common Pitfalls

Verification

View Verification Details