StatsCalculators.com

Simple Linear Regression

Created:November 15, 2024
Last Updated:October 9, 2025

This calculator performs comprehensive analysis of the linear relationship between two continuous variables. It provides everything you need for professional statistical analysis, from basic model fitting to advanced diagnostic testing, ensuring your regression model meets all statistical assumptions and delivers reliable insights.

What You'll Get:

  • Complete Model Summary: R-squared, adjusted R-squared, F-statistic, and significance testing
  • Detailed Coefficients: Slope and intercept estimates with standard errors, t-values, and confidence intervals
  • Professional Visualizations: Regression plot with fitted line, confidence bands, and prediction intervals
  • Comprehensive Diagnostics: Four essential diagnostic plots to validate model assumptions
  • Statistical Tests: Durbin-Watson, heteroscedasticity, and normality tests for thorough validation
  • Publication-Ready Output: APA-formatted results ready for academic or professional reporting

💡 Pro Tip: Always examine the diagnostic plots before interpreting your results! The residuals vs fitted plot reveals non-linear patterns, while the Q-Q plot checks normality assumptions. For multiple regression analysis, check out our Multiple Linear Regression Calculator to analyze relationships with several predictor variables.

Ready to explore the linear relationship in your data? Load our sample dataset to see the required data format and regression analysis in action, or upload your own data to discover the strength and direction of the relationship between your variables.

Calculator

1. Load Your Data

2. Select Columns & Options

Related Calculators

Learn More

Simple Linear Regression

Definition

Simple Linear Regression models the relationship between a predictor variable (X) and a response variable (Y) using a linear equation. It finds the line that minimizes the sum of squared residuals.

Key Formulas

Regression Line:

Y^=b0+b1X\hat{Y} = b_0 + b_1X

Slope:

b1=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}

Intercept:

b0=yˉ−b1xˉb_0 = \bar{y} - b_1\bar{x}

R-squared:

R2=1−∑(yi−y^i)2∑(yi−yˉ)2R^2 = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

Key Assumptions

Linearity: Relationship between X and Y is linear
Independence: Observations are independent
Homoscedasticity: Constant variance of residuals
Normality: Residuals are normally distributed

Practical Example

Step 1: Data
XXYY(X−Xˉ)(X-\bar{X})(Y−Yˉ)(Y-\bar{Y})(X−Xˉ)2(X-\bar{X})^2(X−Xˉ)(Y−Yˉ)(X-\bar{X})(Y-\bar{Y})
12.1-2-3.8247.64
23.8-1-2.1212.12
36.200.2800
47.811.8811.88
59.323.3846.76
Σ=15\Sigma=15Σ=29.2\Sigma=29.2Σ=0\Sigma=0Σ=0\Sigma=0Σ=10\Sigma=10Σ=18.4\Sigma=18.4

Means: Xˉ=3\bar X = 3, Yˉ=5.84\bar Y = 5.84

Step 2: Calculate Slope (b1b_1)
b1=∑(xi−xˉ)(yi−yˉ)∑(xi−xˉ)2=18.410=1.84b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} = \frac{18.4}{10} = 1.84
Step 3: Calculate Intercept (b0b_0)
b0=yˉ−b1xˉ=5.84−1.84(3)=0.32b_0 = \bar{y} - b_1\bar{x} = 5.84 - 1.84(3) = 0.32
Step 4: Regression Equation
Y^=0.32+1.84X\hat{Y} = 0.32 + 1.84X
Step 5: Calculate R2R^2

R2=0.986R^2 = 0.986 (98.6% of variation in Y explained by X)

Code Examples

R
library(tidyverse)

data <- tibble(x = c(1, 2, 3, 4, 5), 
               y = c(2.1, 3.8, 6.2, 7.8, 9.3))

model <- lm(y ~ x, data=data)

summary(model)

ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_minimal()

par(mfrow = c(2, 2))
plot(model)
Python
import numpy as np
import pandas as pd
import statsmodels.api as sm

X = [1, 2, 3, 4, 5]  
y = [2.1, 3.8, 6.2, 7.8, 9.3] 
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
print(model.summary())

Alternative Regression Methods

Consider these alternatives when assumptions are violated:

  • Robust Regression: When outliers significantly impact the model fit
  • Polynomial Regression: For curved relationships between variables
  • Quantile Regression: When variance changes across X values (heteroscedasticity)
  • Weighted Least Squares: When observations have different levels of precision

Verification