Censored Regression (Tobit)

Created:March 8, 2026

Last Updated:March 8, 2026

This calculator performs Tobit (censored) regression analysis using Maximum Likelihood Estimation (MLE). The Tobit model is used when the dependent variable is censored — meaning values beyond a threshold are observed but recorded at the limit value. Common examples include hours worked (censored at 0), expenditures (non-negative), and test scores with floor or ceiling effects.

What You'll Get:

Tobit MLE Estimates: Corrected coefficient estimates that account for censoring
Marginal Effects: Average marginal effects on the expected value of the latent variable
OLS Comparison: See how ignoring censoring biases your estimates
Model Fit Statistics: Log-likelihood, AIC, BIC, and censoring summary
Diagnostic Plots: Residuals, Q-Q plot, and coefficient visualization
Publication-Ready Output: APA-formatted results for academic reporting

Censoring vs. Truncation: Censored data means observations at the limit are still in the sample but their true value is unknown (e.g., hours = 0 for non-workers, but they are still in the dataset). If observations beyond the limit are completely excluded from the sample, use a Truncated Regression Calculator instead.

Ready to analyze censored data? (hours worked, censored at 0) to see the analysis in action, or upload your own data.

Calculator

1. Load Your Data

2. Select Variables & Options

Dependent Variable (Y):

Independent Variables (X):

No variables available. Please enter data in the table above.

Censoring Type:

Censoring Point:

Confidence Level:

Related Calculators

Truncated Regression Calculator

Simple Linear Regression Calculator

Logistic Regression Calculator

Learn More

Definition

The Tobit Model (named after James Tobin, 1958) is a censored regression model designed for situations where the dependent variable is observed for all cases, but its value is constrained (censored) at one or both ends. Unlike truncated data where observations are removed, censored observations remain in the sample at their limit value. The model estimates both the probability of being uncensored and the expected value conditional on being uncensored.

Key Formulas

The Tobit model assumes a latent variable:

y_i^* = X_i\beta + \epsilon_i, \quad \epsilon_i \sim N(0, \sigma^2)

The observed variable (left-censored at c):

y_i = \begin{cases} y_i^* & \text{if } y_i^* > c \\ c & \text{if } y_i^* \leq c \end{cases}

The log-likelihood:

\ell = \sum_{\text{censored}} \log \Phi\left(\frac{c - X_i\beta}{\sigma}\right) + \sum_{\text{uncensored}} \left[ \log \phi\left(\frac{y_i - X_i\beta}{\sigma}\right) - \log \sigma \right]

When to Use the Tobit Model

Hours worked: Non-workers report 0 hours, but their latent desired hours could be negative. Left-censored at 0.

Household expenditure: Some households spend 0 on certain categories (e.g., tobacco, luxury goods).

Test scores: Scores capped at 0 (floor) or 100 (ceiling). Two-sided censoring.

Donation amounts: Many people donate $0; those who donate have positive amounts.

How to Perform Tobit Regression with R

# Tobit (Censored) Regression in R
library(AER)
library(tibble)

# Example sample data (same structure as this calculator)
data <- tibble(
  hours_worked = c(0, 0, 0, 0, 5, 8, 12, 15, 18, 20,
                   22, 25, 28, 30, 32, 35, 38, 40, 40, 42,
                   45, 48, 0, 10, 35),
  education = c(8, 10, 9, 11, 12, 12, 14, 14, 16, 16,
                16, 18, 18, 18, 20, 20, 20, 22, 22, 22,
                24, 24, 10, 13, 19),
  age = c(55, 60, 62, 58, 35, 40, 30, 28, 32, 45,
          38, 27, 33, 42, 29, 35, 40, 25, 30, 48,
          35, 28, 65, 42, 37)
)

# Left-censored Tobit at 0
model <- tobit(hours_worked ~ education + age,
               data = data,
               left = 0)

summary(model)

# Marginal effects (manual AME for E[y | X] with left-censoring at 0)
xb <- predict(model, type = "lp")
sigma <- model$scale
p_uncensored <- pnorm(xb / sigma)

beta <- coef(model)[c("education", "age")]
ame <- beta * mean(p_uncensored)
ame

How to Perform Tobit Regression with Python

Python

# Tobit (Censored) Regression in Python
import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm

# Example sample data (same structure as this calculator)
hours_worked = np.array([
  0, 0, 0, 0, 5, 8, 12, 15, 18, 20,
  22, 25, 28, 30, 32, 35, 38, 40, 40, 42,
  45, 48, 0, 10, 35
])
education = np.array([
  8, 10, 9, 11, 12, 12, 14, 14, 16, 16,
  16, 18, 18, 18, 20, 20, 20, 22, 22, 22,
  24, 24, 10, 13, 19
])
age = np.array([
  55, 60, 62, 58, 35, 40, 30, 28, 32, 45,
  38, 27, 33, 42, 29, 35, 40, 25, 30, 48,
  35, 28, 65, 42, 37
])

# Left censoring point
c = 0

y = hours_worked
X = np.column_stack([np.ones_like(education), education, age])

def neg_log_likelihood(params, y, X, c):
  beta = params[:-1]
  sigma = np.exp(params[-1])  # ensures sigma > 0

  xb = X @ beta
  censored = y <= c
  uncensored = ~censored

  eps = 1e-12

  ll_cens = np.log(np.maximum(norm.cdf((c - xb[censored]) / sigma), eps)).sum()
  ll_uncens = (
    norm.logpdf((y[uncensored] - xb[uncensored]) / sigma) - np.log(sigma)
  ).sum()

  return -(ll_cens + ll_uncens)

# OLS starting values
beta0 = np.linalg.lstsq(X, y, rcond=None)[0]
resid0 = y - X @ beta0
sigma0 = max(np.std(resid0), 1e-6)
init_params = np.append(beta0, np.log(sigma0))

result = minimize(
  neg_log_likelihood,
  init_params,
  args=(y, X, c),
  method='BFGS'
)

beta_hat = result.x[:-1]
sigma_hat = np.exp(result.x[-1])

# Average marginal effects on E[y|X] for covariates (education, age)
xb_hat = X @ beta_hat
p_uncensored = norm.cdf((xb_hat - c) / sigma_hat)
ame = beta_hat[1:] * p_uncensored.mean()

print('Converged:', result.success)
print('Log-likelihood:', -result.fun)
print('Intercept:', beta_hat[0])
print('Education coef:', beta_hat[1])
print('Age coef:', beta_hat[2])
print('Sigma:', sigma_hat)
print('AME (education, age):', ame)

Censoring vs. Truncation

These are commonly confused but are fundamentally different:

Censoring (Tobit): All observations are in the sample. Values beyond the limit are recorded at the limit. You know how many are censored. Use this calculator.
Truncation: Observations beyond the limit are completely excludedfrom the sample. You don't know how many were excluded. Use truncated regression.

Verification

# Tobit (Censored) Regression in R library(AER) library(tibble) # Example sample data (same structure as this calculator) data <- tibble( hours_worked = c(0, 0, 0, 0, 5, 8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38, 40, 40, 42, 45, 48, 0, 10, 35), education = c(8, 10, 9, 11, 12, 12, 14, 14, 16, 16, 16, 18, 18, 18, 20, 20, 20, 22, 22, 22, 24, 24, 10, 13, 19), age = c(55, 60, 62, 58, 35, 40, 30, 28, 32, 45, 38, 27, 33, 42, 29, 35, 40, 25, 30, 48, 35, 28, 65, 42, 37) ) # Left-censored Tobit at 0 model <- tobit(hours_worked ~ education + age, data = data, left = 0) summary(model) # Marginal effects (manual AME for E[y | X] with left-censoring at 0) xb <- predict(model, type = "lp") sigma <- model$scale p_uncensored <- pnorm(xb / sigma) beta <- coef(model)[c("education", "age")] ame <- beta * mean(p_uncensored) ame

# Tobit (Censored) Regression in Python import numpy as np from scipy.optimize import minimize from scipy.stats import norm # Example sample data (same structure as this calculator) hours_worked = np.array([ 0, 0, 0, 0, 5, 8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38, 40, 40, 42, 45, 48, 0, 10, 35 ]) education = np.array([ 8, 10, 9, 11, 12, 12, 14, 14, 16, 16, 16, 18, 18, 18, 20, 20, 20, 22, 22, 22, 24, 24, 10, 13, 19 ]) age = np.array([ 55, 60, 62, 58, 35, 40, 30, 28, 32, 45, 38, 27, 33, 42, 29, 35, 40, 25, 30, 48, 35, 28, 65, 42, 37 ]) # Left censoring point c = 0 y = hours_worked X = np.column_stack([np.ones_like(education), education, age]) def neg_log_likelihood(params, y, X, c): beta = params[:-1] sigma = np.exp(params[-1]) # ensures sigma > 0 xb = X @ beta censored = y <= c uncensored = ~censored eps = 1e-12 ll_cens = np.log(np.maximum(norm.cdf((c - xb[censored]) / sigma), eps)).sum() ll_uncens = ( norm.logpdf((y[uncensored] - xb[uncensored]) / sigma) - np.log(sigma) ).sum() return -(ll_cens + ll_uncens) # OLS starting values beta0 = np.linalg.lstsq(X, y, rcond=None)[0] resid0 = y - X @ beta0 sigma0 = max(np.std(resid0), 1e-6) init_params = np.append(beta0, np.log(sigma0)) result = minimize( neg_log_likelihood, init_params, args=(y, X, c), method='BFGS' ) beta_hat = result.x[:-1] sigma_hat = np.exp(result.x[-1]) # Average marginal effects on E[y|X] for covariates (education, age) xb_hat = X @ beta_hat p_uncensored = norm.cdf((xb_hat - c) / sigma_hat) ame = beta_hat[1:] * p_uncensored.mean() print('Converged:', result.success) print('Log-likelihood:', -result.fun) print('Intercept:', beta_hat[0]) print('Education coef:', beta_hat[1]) print('Age coef:', beta_hat[2]) print('Sigma:', sigma_hat) print('AME (education, age):', ame)

Censored Regression (Tobit)

What You'll Get:

Calculator

1. Load Your Data

2. Select Variables & Options

Related Calculators

Truncated Regression Calculator

Simple Linear Regression Calculator

Logistic Regression Calculator

Learn More

Definition

Key Formulas

When to Use the Tobit Model

How to Perform Tobit Regression with R

How to Perform Tobit Regression with Python

Censoring vs. Truncation

Verification

View Verification Details

Censored Regression (Tobit)

What You'll Get:

Calculator

1. Load Your Data

2. Select Variables & Options

Related Calculators

Truncated Regression Calculator

Simple Linear Regression Calculator

Logistic Regression Calculator

Learn More

Definition

Key Formulas

When to Use the Tobit Model

How to Perform Tobit Regression with R

How to Perform Tobit Regression with Python

Censoring vs. Truncation

Verification

View Verification Details