This Multiple Linear Regression Calculator helps you analyze the relationship between a dependent variable and multiple independent variables. It provides comprehensive analysis including model summary statistics, coefficient estimates, confidence intervals, and diagnostic tests. The calculator also generates diagnostic plots to check regression assumptions. To learn about the data format required and test this calculator, click here to populate the sample data.
Calculator
1. Load Your Data
2. Select Variables & Options
Related Calculators
Learn More
Multiple Linear Regression
Definition
Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables, assuming a linear relationship. It extends simple linear regression to account for multiple predictors.
Model Equation
Where:
- = dependent variable
- = independent variables
- = intercept
- = regression coefficients
- = error term
Key Formulas:
Sum of Squares:
Where is the predicted value and is the mean
R-squared:
Adjusted R-squared:
Key Assumptions
Practical Example
Step 1: State the Data
Housing prices model:
House | Price (K) | Sqft | Age | Bedrooms |
---|---|---|---|---|
1 | 300 | 1500 | 15 | 3 |
2 | 250 | 1200 | 20 | 2 |
3 | 400 | 2000 | 10 | 4 |
4 | 550 | 2400 | 5 | 4 |
5 | 317 | 1600 | 12 | 3 |
6 | 389 | 1800 | 8 | 3 |
Step 2: Calculate Matrix Operations
Design matrix X:
Coefficients calculation:
Step 3: Model Results
Fitted equation:
- R² = 0.997
- Adjusted R² = 0.993
- F-statistic = 238.4124 (p-value = 0.0042)
Step 4: Interpretation
- For each additional square foot, price increases by $0.39 (coefficient = 0.3879)
- Each additional year of age increases price by $2.54 (coefficient = 2.5381)
- Each additional bedroom decreases price by $66.30 (coefficient = -66.3024)
- Model explains 99.7% of price variation (R² = 0.997)
- The model is statistically significant (F = 238.41, p = 0.0042)
Model Diagnostics
Key diagnostic measures:
- VIF (Variance Inflation Factor):
- Residual Standard Error:
Code Examples
library(tidyverse)
library(broom)
data <- tibble(
price = c(300, 250, 400, 550, 317, 389),
sqft = c(1500, 1200, 2000, 2400, 1600, 1800),
age = c(15, 20, 10, 5, 12, 8),
bedrooms = c(3, 2, 4, 4, 3, 3)
)
model <- lm(price ~ sqft + age + bedrooms, data = data)
tidy(model)
glance(model)
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
import statsmodels.api as sm
df = pd.DataFrame({
'price': [300, 250, 400, 550, 317, 389],
'sqft': [1500, 1200, 2000, 2400, 1600, 1800],
'age': [15, 20, 10, 5, 12, 8],
'bedrooms': [3, 2, 4, 4, 3, 3]
})
model = ols('price ~ sqft + age + bedrooms', data=df).fit()
print(model.summary())
print("Coefficients:")
print(model.params)
print("R-squared:", model.rsquared)
Alternative Methods
Consider these alternatives:
- Ridge Regression: For handling multicollinearity
- Lasso Regression: For feature selection
- Polynomial Regression: For non-linear relationships