StatsCalculators.com

Multiple Linear Regression

Created:December 15, 2024
Last Updated:April 6, 2025

This Multiple Linear Regression Calculator helps you analyze the relationship between a dependent variable and multiple independent variables. It provides comprehensive analysis including model summary statistics, coefficient estimates, confidence intervals, and diagnostic tests. The calculator also generates diagnostic plots to check regression assumptions. To learn about the data format required and test this calculator, click here to populate the sample data.

Calculator

1. Load Your Data

Note: Column names will be converted to snake_case (e.g., "Product ID" → "product_id") for processing.

2. Select Variables & Options

Related Calculators

Learn More

Multiple Linear Regression

Definition

Multiple Linear Regression models the relationship between a dependent variable and two or more independent variables, assuming a linear relationship. It extends simple linear regression to account for multiple predictors.

Model Equation

y=β0+β1x1+β2x2+...+βkxk+ϵy = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + \epsilon

Where:

  • yy = dependent variable
  • xix_i = independent variables
  • β0\beta_0 = intercept
  • βi\beta_i = regression coefficients
  • ϵ\epsilon = error term

Key Formulas:

Sum of Squares:

SST=(yiyˉ)2SST = \sum(y_i - \bar{y})^2SSR=(y^iyˉ)2SSR = \sum(\hat{y}_i - \bar{y})^2SSE=(yiy^i)2SSE = \sum(y_i - \hat{y}_i)^2

Where y^i\hat{y}_i is the predicted value and yˉ\bar{y} is the mean

R-squared:

R2=SSRSST=1SSESSTR^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}

Adjusted R-squared:

Radj2=1(1R2)n1nk1R^2_{adj} = 1 - (1-R^2)\frac{n-1}{n-k-1}

Key Assumptions

Linearity: Linear relationship between variables
Independence: Independent residuals
Homoscedasticity: Constant variance of residuals
Normality: Normal distribution of residuals
No Multicollinearity: Independent variables not highly correlated

Practical Example

Step 1: State the Data

Housing prices model:

HousePrice (K)SqftAgeBedrooms
13001500153
22501200202
34002000104
4550240054
53171600123
6389180083
Step 2: Calculate Matrix Operations

Design matrix X:

X=[11500153112002021180083]\mathbf{X} = \begin{bmatrix} 1 & 1500 & 15 & 3 \\ 1 & 1200 & 20 & 2 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 1800 & 8 & 3 \end{bmatrix}

Coefficients calculation:

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
Step 3: Model Results

Fitted equation:

y^=130.8551+0.3879xsqft+2.5381xage66.3024xbedrooms\hat{y} = -130.8551 + 0.3879x_{\text{sqft}} + 2.5381x_{\text{age}} - 66.3024x_{\text{bedrooms}}
  • R² = 0.997
  • Adjusted R² = 0.993
  • F-statistic = 238.4124 (p-value = 0.0042)
Step 4: Interpretation
  • For each additional square foot, price increases by $0.39 (coefficient = 0.3879)
  • Each additional year of age increases price by $2.54 (coefficient = 2.5381)
  • Each additional bedroom decreases price by $66.30 (coefficient = -66.3024)
  • Model explains 99.7% of price variation (R² = 0.997)
  • The model is statistically significant (F = 238.41, p = 0.0042)

Model Diagnostics

Key diagnostic measures:

  • VIF (Variance Inflation Factor):
    VIFj=11Rj2VIF_j = \frac{1}{1-R^2_j}
  • Residual Standard Error:
    RSE=SSEnk1RSE = \sqrt{\frac{SSE}{n-k-1}}

Code Examples

R
library(tidyverse)
library(broom)

data <- tibble(
  price = c(300, 250, 400, 550, 317, 389),
  sqft = c(1500, 1200, 2000, 2400, 1600, 1800),
  age = c(15, 20, 10, 5, 12, 8),
  bedrooms = c(3, 2, 4, 4, 3, 3)
)

model <- lm(price ~ sqft + age + bedrooms, data = data)

tidy(model)
glance(model)
Python
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
import statsmodels.api as sm

df = pd.DataFrame({
    'price': [300, 250, 400, 550, 317, 389],
    'sqft': [1500, 1200, 2000, 2400, 1600, 1800],
    'age': [15, 20, 10, 5, 12, 8],
    'bedrooms': [3, 2, 4, 4, 3, 3]
})

model = ols('price ~ sqft + age + bedrooms', data=df).fit()

print(model.summary())

print("Coefficients:")
print(model.params)
print("R-squared:", model.rsquared)

Alternative Methods

Consider these alternatives:

  • Ridge Regression: For handling multicollinearity
  • Lasso Regression: For feature selection
  • Polynomial Regression: For non-linear relationships

Verification