Fixed Effects (FE) and Random Effects (RE) models are fundamental panel data techniques for analyzing longitudinal data where the same units are observed over multiple time periods. These models control for unobserved heterogeneity across units, allowing you to estimate causal effects more accurately than standard regression.
💡 Key Decision: Fixed effects control for all time-invariant unobserved characteristics (like ability, gender) but can't estimate their effects. Random effects can estimate time-invariant effects but assume they're uncorrelated with your predictors. Use the Hausman test to decide which is appropriate!
Ready to analyze panel data? to see how these models work, or upload your own panel data to uncover causal relationships in your research.
Select unit, time, and outcome columns first
Panel data (also called longitudinal data) consists of observations on the same units (individuals, firms, countries) across multiple time periods. This structure allows you to control for unobserved time-invariant characteristics that might confound your analysis.
Yit = outcome for unit i at time t
i = 1, ..., N (cross-sectional units)
t = 1, ..., T (time periods)Key Idea: Control for all time-invariant differences between units (observed and unobserved)
Yit = β₁Xit + αi + εitKey Idea: Entity-specific effects are random and uncorrelated with predictors
Yit = β₀ + β₁Xit + αi + εit
where αi ~ N(0, σ²α)H₀: Random effects model is consistent and efficient (αi uncorrelated with X)
H₁: Only fixed effects is consistent (αi correlated with X)
library(plm) # Panel linear models
library(lmtest) # For Hausman test
# Example: Wage panel data
# person_id: Individual identifier
# year: Time period
# wage: Hourly wage (outcome)
# experience: Years of work experience
# union: Union membership (0/1)
# education: Years of education (time-invariant)
# Create sample data
data <- data.frame(
person_id = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5,
6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10),
year = c(2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012,
2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012,
2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012,
2010, 2011, 2012),
wage = c(15.2, 16.1, 17.3, 18.5, 19.2, 20.1, 14.8, 15.5, 16.2,
22.3, 23.1, 24.5, 12.5, 13.2, 14.1, 19.8, 20.5, 21.3,
16.7, 17.4, 18.2, 21.2, 22.0, 23.1, 13.9, 14.6, 15.4,
17.5, 18.3, 19.2),
experience = c(5, 6, 7, 3, 4, 5, 2, 3, 4, 8, 9, 10, 1, 2, 3,
6, 7, 8, 4, 5, 6, 7, 8, 9, 3, 4, 5, 5, 6, 7),
union = c(1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1),
education = c(12, 12, 12, 16, 16, 16, 12, 12, 12, 16, 16, 16,
10, 10, 10, 14, 14, 14, 12, 12, 12, 16, 16, 16,
12, 12, 12, 14, 14, 14)
)
# Convert to panel data format
pdata <- pdata.frame(data, index = c("person_id", "year"))
# Method 1: Pooled OLS (ignores panel structure)
pooled <- plm(wage ~ experience + union + education,
data = pdata,
model = "pooling")
summary(pooled)
# Method 2: Fixed Effects (one-way, entity effects)
fe_model <- plm(wage ~ experience + union, # education dropped (time-invariant)
data = pdata,
model = "within",
effect = "individual")
summary(fe_model)
# Extract fixed effects (entity-specific intercepts)
fixef(fe_model)
# Method 3: Two-way Fixed Effects (entity + time effects)
fe_twoway <- plm(wage ~ experience + union,
data = pdata,
model = "within",
effect = "twoways")
summary(fe_twoway)
# Method 4: Random Effects
re_model <- plm(wage ~ experience + union + education, # can include education
data = pdata,
model = "random")
summary(re_model)
# Hausman Test (FE vs RE)
hausman_test <- phtest(fe_model, re_model)
print(hausman_test)
# Interpretation:
# If p < 0.05: Reject H0, use Fixed Effects
# If p >= 0.05: Fail to reject, use Random Effects
# F-test for fixed effects significance
pFtest(fe_model, pooled)
# Breusch-Pagan LM test for random effects
plmtest(pooled, type = "bp")
# Clustered standard errors (by entity)
library(sandwich)
library(lmtest)
coeftest(fe_model, vcov = vcovHC(fe_model, cluster = "group"))import pandas as pd
import numpy as np
from linearmodels.panel import PanelOLS, RandomEffects
from linearmodels.panel.results import compare
import statsmodels.api as sm
# Example: Wage panel data
# Create sample data
data = pd.DataFrame({
'person_id': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5,
6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10],
'year': [2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012,
2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012,
2010, 2011, 2012, 2010, 2011, 2012, 2010, 2011, 2012,
2010, 2011, 2012],
'wage': [15.2, 16.1, 17.3, 18.5, 19.2, 20.1, 14.8, 15.5, 16.2,
22.3, 23.1, 24.5, 12.5, 13.2, 14.1, 19.8, 20.5, 21.3,
16.7, 17.4, 18.2, 21.2, 22.0, 23.1, 13.9, 14.6, 15.4,
17.5, 18.3, 19.2],
'experience': [5, 6, 7, 3, 4, 5, 2, 3, 4, 8, 9, 10, 1, 2, 3,
6, 7, 8, 4, 5, 6, 7, 8, 9, 3, 4, 5, 5, 6, 7],
'union': [1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1],
'education': [12, 12, 12, 16, 16, 16, 12, 12, 12, 16, 16, 16,
10, 10, 10, 14, 14, 14, 12, 12, 12, 16, 16, 16,
12, 12, 12, 14, 14, 14]
})
# Set multi-index for panel data
data = data.set_index(['person_id', 'year'])
# Dependent variable
y = data['wage']
# Independent variables
X = data[['experience', 'union', 'education']]
# Method 1: Pooled OLS
pooled = sm.OLS(y, sm.add_constant(X)).fit()
print(pooled.summary())
# Method 2: Fixed Effects (entity effects)
# Note: education is time-invariant and will be absorbed
fe_model = PanelOLS(y, X[['experience', 'union']],
entity_effects=True)
fe_results = fe_model.fit(cov_type='clustered',
cluster_entity=True)
print(fe_results)
# Extract fixed effects
print("Entity Fixed Effects:")
print(fe_results.estimated_effects)
# Method 3: Two-way Fixed Effects (entity + time effects)
fe_twoway = PanelOLS(y, X[['experience', 'union']],
entity_effects=True,
time_effects=True)
fe_twoway_results = fe_twoway.fit(cov_type='clustered',
cluster_entity=True)
print(fe_twoway_results)
# Method 4: Random Effects
re_model = RandomEffects(y, X)
re_results = re_model.fit()
print(re_results)
# Variance components
print(f"Variance Components:")
print(f"Between variance (sigma_alpha^2): {re_results.variance_decomposition['Effects']}")
print(f"Within variance (sigma_epsilon^2): {re_results.variance_decomposition['Residual']}")
# Compare models
comparison = compare({'Pooled OLS': pooled,
'Fixed Effects': fe_results,
'Random Effects': re_results})
print(comparison)
# Model selection interpretation:
# - Use FE if you believe unobserved heterogeneity is correlated with X
# - Use RE if you believe it's uncorrelated (and want to estimate time-invariant effects)
# - Hausman test helps decide statistically