This calculator performs Tobit (censored) regression analysis using Maximum Likelihood Estimation (MLE). The Tobit model is used when the dependent variable is censored — meaning values beyond a threshold are observed but recorded at the limit value. Common examples include hours worked (censored at 0), expenditures (non-negative), and test scores with floor or ceiling effects.
Censoring vs. Truncation: Censored data means observations at the limit are still in the sample but their true value is unknown (e.g., hours = 0 for non-workers, but they are still in the dataset). If observations beyond the limit are completely excluded from the sample, use a Truncated Regression Calculator instead.
Ready to analyze censored data? (hours worked, censored at 0) to see the analysis in action, or upload your own data.
No variables available. Please enter data in the table above.
The Tobit Model (named after James Tobin, 1958) is a censored regression model designed for situations where the dependent variable is observed for all cases, but its value is constrained (censored) at one or both ends. Unlike truncated data where observations are removed, censored observations remain in the sample at their limit value. The model estimates both the probability of being uncensored and the expected value conditional on being uncensored.
The Tobit model assumes a latent variable:
The observed variable (left-censored at c):
The log-likelihood:
# Tobit (Censored) Regression in R
library(AER)
library(tibble)
# Example sample data (same structure as this calculator)
data <- tibble(
hours_worked = c(0, 0, 0, 0, 5, 8, 12, 15, 18, 20,
22, 25, 28, 30, 32, 35, 38, 40, 40, 42,
45, 48, 0, 10, 35),
education = c(8, 10, 9, 11, 12, 12, 14, 14, 16, 16,
16, 18, 18, 18, 20, 20, 20, 22, 22, 22,
24, 24, 10, 13, 19),
age = c(55, 60, 62, 58, 35, 40, 30, 28, 32, 45,
38, 27, 33, 42, 29, 35, 40, 25, 30, 48,
35, 28, 65, 42, 37)
)
# Left-censored Tobit at 0
model <- tobit(hours_worked ~ education + age,
data = data,
left = 0)
summary(model)
# Marginal effects (manual AME for E[y | X] with left-censoring at 0)
xb <- predict(model, type = "lp")
sigma <- model$scale
p_uncensored <- pnorm(xb / sigma)
beta <- coef(model)[c("education", "age")]
ame <- beta * mean(p_uncensored)
ame# Tobit (Censored) Regression in Python
import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm
# Example sample data (same structure as this calculator)
hours_worked = np.array([
0, 0, 0, 0, 5, 8, 12, 15, 18, 20,
22, 25, 28, 30, 32, 35, 38, 40, 40, 42,
45, 48, 0, 10, 35
])
education = np.array([
8, 10, 9, 11, 12, 12, 14, 14, 16, 16,
16, 18, 18, 18, 20, 20, 20, 22, 22, 22,
24, 24, 10, 13, 19
])
age = np.array([
55, 60, 62, 58, 35, 40, 30, 28, 32, 45,
38, 27, 33, 42, 29, 35, 40, 25, 30, 48,
35, 28, 65, 42, 37
])
# Left censoring point
c = 0
y = hours_worked
X = np.column_stack([np.ones_like(education), education, age])
def neg_log_likelihood(params, y, X, c):
beta = params[:-1]
sigma = np.exp(params[-1]) # ensures sigma > 0
xb = X @ beta
censored = y <= c
uncensored = ~censored
eps = 1e-12
ll_cens = np.log(np.maximum(norm.cdf((c - xb[censored]) / sigma), eps)).sum()
ll_uncens = (
norm.logpdf((y[uncensored] - xb[uncensored]) / sigma) - np.log(sigma)
).sum()
return -(ll_cens + ll_uncens)
# OLS starting values
beta0 = np.linalg.lstsq(X, y, rcond=None)[0]
resid0 = y - X @ beta0
sigma0 = max(np.std(resid0), 1e-6)
init_params = np.append(beta0, np.log(sigma0))
result = minimize(
neg_log_likelihood,
init_params,
args=(y, X, c),
method='BFGS'
)
beta_hat = result.x[:-1]
sigma_hat = np.exp(result.x[-1])
# Average marginal effects on E[y|X] for covariates (education, age)
xb_hat = X @ beta_hat
p_uncensored = norm.cdf((xb_hat - c) / sigma_hat)
ame = beta_hat[1:] * p_uncensored.mean()
print('Converged:', result.success)
print('Log-likelihood:', -result.fun)
print('Intercept:', beta_hat[0])
print('Education coef:', beta_hat[1])
print('Age coef:', beta_hat[2])
print('Sigma:', sigma_hat)
print('AME (education, age):', ame)These are commonly confused but are fundamentally different:
This calculator performs Tobit (censored) regression analysis using Maximum Likelihood Estimation (MLE). The Tobit model is used when the dependent variable is censored — meaning values beyond a threshold are observed but recorded at the limit value. Common examples include hours worked (censored at 0), expenditures (non-negative), and test scores with floor or ceiling effects.
Censoring vs. Truncation: Censored data means observations at the limit are still in the sample but their true value is unknown (e.g., hours = 0 for non-workers, but they are still in the dataset). If observations beyond the limit are completely excluded from the sample, use a Truncated Regression Calculator instead.
Ready to analyze censored data? (hours worked, censored at 0) to see the analysis in action, or upload your own data.
No variables available. Please enter data in the table above.
The Tobit Model (named after James Tobin, 1958) is a censored regression model designed for situations where the dependent variable is observed for all cases, but its value is constrained (censored) at one or both ends. Unlike truncated data where observations are removed, censored observations remain in the sample at their limit value. The model estimates both the probability of being uncensored and the expected value conditional on being uncensored.
The Tobit model assumes a latent variable:
The observed variable (left-censored at c):
The log-likelihood:
# Tobit (Censored) Regression in R
library(AER)
library(tibble)
# Example sample data (same structure as this calculator)
data <- tibble(
hours_worked = c(0, 0, 0, 0, 5, 8, 12, 15, 18, 20,
22, 25, 28, 30, 32, 35, 38, 40, 40, 42,
45, 48, 0, 10, 35),
education = c(8, 10, 9, 11, 12, 12, 14, 14, 16, 16,
16, 18, 18, 18, 20, 20, 20, 22, 22, 22,
24, 24, 10, 13, 19),
age = c(55, 60, 62, 58, 35, 40, 30, 28, 32, 45,
38, 27, 33, 42, 29, 35, 40, 25, 30, 48,
35, 28, 65, 42, 37)
)
# Left-censored Tobit at 0
model <- tobit(hours_worked ~ education + age,
data = data,
left = 0)
summary(model)
# Marginal effects (manual AME for E[y | X] with left-censoring at 0)
xb <- predict(model, type = "lp")
sigma <- model$scale
p_uncensored <- pnorm(xb / sigma)
beta <- coef(model)[c("education", "age")]
ame <- beta * mean(p_uncensored)
ame# Tobit (Censored) Regression in Python
import numpy as np
from scipy.optimize import minimize
from scipy.stats import norm
# Example sample data (same structure as this calculator)
hours_worked = np.array([
0, 0, 0, 0, 5, 8, 12, 15, 18, 20,
22, 25, 28, 30, 32, 35, 38, 40, 40, 42,
45, 48, 0, 10, 35
])
education = np.array([
8, 10, 9, 11, 12, 12, 14, 14, 16, 16,
16, 18, 18, 18, 20, 20, 20, 22, 22, 22,
24, 24, 10, 13, 19
])
age = np.array([
55, 60, 62, 58, 35, 40, 30, 28, 32, 45,
38, 27, 33, 42, 29, 35, 40, 25, 30, 48,
35, 28, 65, 42, 37
])
# Left censoring point
c = 0
y = hours_worked
X = np.column_stack([np.ones_like(education), education, age])
def neg_log_likelihood(params, y, X, c):
beta = params[:-1]
sigma = np.exp(params[-1]) # ensures sigma > 0
xb = X @ beta
censored = y <= c
uncensored = ~censored
eps = 1e-12
ll_cens = np.log(np.maximum(norm.cdf((c - xb[censored]) / sigma), eps)).sum()
ll_uncens = (
norm.logpdf((y[uncensored] - xb[uncensored]) / sigma) - np.log(sigma)
).sum()
return -(ll_cens + ll_uncens)
# OLS starting values
beta0 = np.linalg.lstsq(X, y, rcond=None)[0]
resid0 = y - X @ beta0
sigma0 = max(np.std(resid0), 1e-6)
init_params = np.append(beta0, np.log(sigma0))
result = minimize(
neg_log_likelihood,
init_params,
args=(y, X, c),
method='BFGS'
)
beta_hat = result.x[:-1]
sigma_hat = np.exp(result.x[-1])
# Average marginal effects on E[y|X] for covariates (education, age)
xb_hat = X @ beta_hat
p_uncensored = norm.cdf((xb_hat - c) / sigma_hat)
ame = beta_hat[1:] * p_uncensored.mean()
print('Converged:', result.success)
print('Log-likelihood:', -result.fun)
print('Intercept:', beta_hat[0])
print('Education coef:', beta_hat[1])
print('Age coef:', beta_hat[2])
print('Sigma:', sigma_hat)
print('AME (education, age):', ame)These are commonly confused but are fundamentally different: