This calculator performs truncated regression analysis using Maximum Likelihood Estimation (MLE). Truncated regression is used when the sample is drawn from a restricted part of the population — for example, when you only observe wages for employed individuals (left truncation) or test scores below a ceiling (right truncation).
Truncation vs. Censoring: Truncated data means observations outside the truncation point are completely excluded from the sample. If observations are recorded but capped at a limit value, that is censoring — use a Censored Regression (Tobit) Calculator instead.
Ready to analyze truncated data? (wages truncated from below) to see the analysis in action, or upload your own data.
No variables available. Please enter data in the table above.
Truncated Regression is a statistical model for data where the dependent variable is only observed within a certain range. Unlike censored data (where values are capped), truncated data means observations outside the range are entirely missing from the sample. Standard OLS regression on truncated data produces biased and inconsistent estimates; truncated regression corrects this using Maximum Likelihood Estimation.
The log-likelihood for left-truncated data (observed when Y > a):
Where:
# Truncated Regression in R
library(truncreg)
library(tidyverse)
# Example sample data (same structure as this calculator)
data <- tibble(
wage = c(12.5, 15.3, 18.7, 22.1, 25.4, 28.9, 31.2, 35.6, 38.4, 42.1,
45.7, 48.3, 52.8, 55.1, 58.9, 62.3, 65.7, 70.2, 74.8, 80.1),
education = c(10, 12, 12, 14, 14, 16, 16, 16, 18, 18,
18, 20, 20, 20, 20, 22, 22, 22, 22, 24),
experience = c(2, 3, 5, 4, 8, 6, 10, 12, 8, 15,
18, 10, 14, 20, 22, 12, 16, 25, 28, 15)
)
# Left-truncated at 10
model <- truncreg(wage ~ education + experience,
data = data,
point = 10,
direction = "left")
summary(model)# Truncated Regression in Python
import numpy as np
import statsmodels.api as sm
from scipy import stats
from scipy.optimize import minimize
# Example sample data (same structure as this calculator)
wage = np.array([
12.5, 15.3, 18.7, 22.1, 25.4, 28.9, 31.2, 35.6, 38.4, 42.1,
45.7, 48.3, 52.8, 55.1, 58.9, 62.3, 65.7, 70.2, 74.8, 80.1
])
education = np.array([
10, 12, 12, 14, 14, 16, 16, 16, 18, 18,
18, 20, 20, 20, 20, 22, 22, 22, 22, 24
])
experience = np.array([
2, 3, 5, 4, 8, 6, 10, 12, 8, 15,
18, 10, 14, 20, 22, 12, 16, 25, 28, 15
])
# Left truncation point
a = 10
X = sm.add_constant(np.column_stack([education, experience]))
y = wage
def neg_log_likelihood(params, X, y, a):
beta = params[:-1]
sigma = np.exp(params[-1]) # ensures sigma > 0
mu = X @ beta
z = (y - mu) / sigma
alpha = (a - mu) / sigma
eps = 1e-12
log_pdf = stats.norm.logpdf(z) - np.log(sigma)
log_survival = np.log(np.maximum(1 - stats.norm.cdf(alpha), eps))
ll = np.sum(log_pdf - log_survival)
return -ll
# OLS starting values
ols = sm.OLS(y, X).fit()
beta0 = ols.params
sigma0 = np.std(ols.resid, ddof=X.shape[1])
init_params = np.concatenate([beta0, [np.log(max(sigma0, 1e-6))]])
result = minimize(
neg_log_likelihood,
init_params,
args=(X, y, a),
method='BFGS'
)
beta_hat = result.x[:-1]
sigma_hat = np.exp(result.x[-1])
print('Converged:', result.success)
print('Log-likelihood:', -result.fun)
print('Intercept:', beta_hat[0])
print('Education coef:', beta_hat[1])
print('Experience coef:', beta_hat[2])
print('Sigma:', sigma_hat)These are commonly confused but are fundamentally different:
This calculator performs truncated regression analysis using Maximum Likelihood Estimation (MLE). Truncated regression is used when the sample is drawn from a restricted part of the population — for example, when you only observe wages for employed individuals (left truncation) or test scores below a ceiling (right truncation).
Truncation vs. Censoring: Truncated data means observations outside the truncation point are completely excluded from the sample. If observations are recorded but capped at a limit value, that is censoring — use a Censored Regression (Tobit) Calculator instead.
Ready to analyze truncated data? (wages truncated from below) to see the analysis in action, or upload your own data.
No variables available. Please enter data in the table above.
Truncated Regression is a statistical model for data where the dependent variable is only observed within a certain range. Unlike censored data (where values are capped), truncated data means observations outside the range are entirely missing from the sample. Standard OLS regression on truncated data produces biased and inconsistent estimates; truncated regression corrects this using Maximum Likelihood Estimation.
The log-likelihood for left-truncated data (observed when Y > a):
Where:
# Truncated Regression in R
library(truncreg)
library(tidyverse)
# Example sample data (same structure as this calculator)
data <- tibble(
wage = c(12.5, 15.3, 18.7, 22.1, 25.4, 28.9, 31.2, 35.6, 38.4, 42.1,
45.7, 48.3, 52.8, 55.1, 58.9, 62.3, 65.7, 70.2, 74.8, 80.1),
education = c(10, 12, 12, 14, 14, 16, 16, 16, 18, 18,
18, 20, 20, 20, 20, 22, 22, 22, 22, 24),
experience = c(2, 3, 5, 4, 8, 6, 10, 12, 8, 15,
18, 10, 14, 20, 22, 12, 16, 25, 28, 15)
)
# Left-truncated at 10
model <- truncreg(wage ~ education + experience,
data = data,
point = 10,
direction = "left")
summary(model)# Truncated Regression in Python
import numpy as np
import statsmodels.api as sm
from scipy import stats
from scipy.optimize import minimize
# Example sample data (same structure as this calculator)
wage = np.array([
12.5, 15.3, 18.7, 22.1, 25.4, 28.9, 31.2, 35.6, 38.4, 42.1,
45.7, 48.3, 52.8, 55.1, 58.9, 62.3, 65.7, 70.2, 74.8, 80.1
])
education = np.array([
10, 12, 12, 14, 14, 16, 16, 16, 18, 18,
18, 20, 20, 20, 20, 22, 22, 22, 22, 24
])
experience = np.array([
2, 3, 5, 4, 8, 6, 10, 12, 8, 15,
18, 10, 14, 20, 22, 12, 16, 25, 28, 15
])
# Left truncation point
a = 10
X = sm.add_constant(np.column_stack([education, experience]))
y = wage
def neg_log_likelihood(params, X, y, a):
beta = params[:-1]
sigma = np.exp(params[-1]) # ensures sigma > 0
mu = X @ beta
z = (y - mu) / sigma
alpha = (a - mu) / sigma
eps = 1e-12
log_pdf = stats.norm.logpdf(z) - np.log(sigma)
log_survival = np.log(np.maximum(1 - stats.norm.cdf(alpha), eps))
ll = np.sum(log_pdf - log_survival)
return -ll
# OLS starting values
ols = sm.OLS(y, X).fit()
beta0 = ols.params
sigma0 = np.std(ols.resid, ddof=X.shape[1])
init_params = np.concatenate([beta0, [np.log(max(sigma0, 1e-6))]])
result = minimize(
neg_log_likelihood,
init_params,
args=(X, y, a),
method='BFGS'
)
beta_hat = result.x[:-1]
sigma_hat = np.exp(result.x[-1])
print('Converged:', result.success)
print('Log-likelihood:', -result.fun)
print('Intercept:', beta_hat[0])
print('Education coef:', beta_hat[1])
print('Experience coef:', beta_hat[2])
print('Sigma:', sigma_hat)These are commonly confused but are fundamentally different: