Two Sample Paired t-Test

Created:September 17, 2024

Last Updated:March 1, 2025

This calculator helps you compare two related measurements from the same subjects to determine if there's a statistically significant difference. Whether you have wide format data (two columns side by side) or long format data (subject ID, condition, and value columns), this calculator handles both formats. Perfect for before-and-after studies (weight loss, test scores, blood pressure), repeated measurements (twins, siblings, matched controls), or any situation where you have matched pairs of data.

What You'll Get:

Step-by-Step Calculations: Complete mathematical breakdown from hypothesis to conclusion
Visual Analysis: T-distribution curves and difference histograms
Assumption Checking: Automatic normality testing with alternative test recommendations
APA-Style Report: Publication-ready results you can copy directly

Ready to analyze your paired data? or to see how it works, or upload your own data to begin your analysis.

Calculator

1. Load Your Data

2. Select Data Format & Options

Data Format:

Wide Format (Two columns: Before and After)

Long Format (Subject ID, Condition, Value columns)

Select first column (Before):

Select second column (After):

Significance Level:

Alternative Hypothesis:

Exclude Outliers in Differences

Note: Pairs with missing values in either column are automatically excluded from the analysis to ensure accurate calculations.

Related Calculators

One-Sample T-Test

Two-Sample T-Test (Independent)

Repeated Measures ANOVA

Wilcoxon Signed-Rank Test

Learn More

Two Sample Paired t-Test

Definition

Paired T-Test is a statistical test used to compare two related/dependent samples to determine if there is a significant difference between their means. It's particularly useful when measurements are taken from the same subject before and after a treatment, or when subjects are matched pairs.

Formula

Test Statistic:

t = \frac{\bar{d}}{s_d/\sqrt{n}}

Degrees of freedom:

df = n - 1

Confidence Intervals:

Two-sided confidence interval:

CI = \bar{d} \pm t_{\alpha/2, n-1} \cdot \frac{s_d}{\sqrt{n}}

One-sided confidence intervals:

CI = \bar{d} \pm t_{\alpha, n-1} \cdot \frac{s_d}{\sqrt{n}}

Where:

$\bar{d}$ = mean difference between paired observations
$s_d$ = standard deviation of the differences
$n$ = number of pairs

Key Assumptions

Paired Observations: Each data point in one sample has a matched pair in the other sample

Normality: The differences between pairs should be approximately normally distributed

Independence: The pairs should be independent of each other

Practical Example

Testing the effectiveness of a weight loss program by measuring participants' weights before and after the program:

Given Data:

Before weights (kg): 70, 75, 80, 85, 90
After weights (kg): 68, 72, 77, 82, 87
Differences (After - Before): -2, -3, -3, -3, -3
$\alpha = 0.05$ (two-tailed test)

Hypotheses:

Null Hypothesis ( $H_0$ ): $\mu_d = 0$ (no difference between before and after)

Alternative Hypothesis ( $H_1$ ): $\mu_d \neq 0$ (there is a difference)

Step-by-Step Calculation:

Calculate mean difference: $\bar{d} = -2.8$
Calculate standard deviation of differences: $s_d = 0.447$
Degrees of freedom: $df = 4$
Calculate t-statistic: $t = \frac{-2.8}{0.447/\sqrt{5}} = -14.0$
Critical value: $t_{0.025,4} = \pm 2.776$
Confidence interval: $\bar{d} \pm t_{\alpha/2,df} \cdot \frac{s_d}{\sqrt{n}} = -2.8 \pm 2.776 \cdot \frac{0.447}{\sqrt{5}} = [-3.2, -2.4]$

Conclusion:

$|-14.0| > 2.776$ , we reject the null hypothesis. There is sufficient evidence to conclude that the weight loss program resulted in a significant change in participants' weights ( $p < 0.05$ ). We are 95% confident that the true mean difference lies between -3.2 and -2.4 kg.

Effect Size

Cohen's d for paired samples:

d = \frac{|\bar{d}|}{s_d}

Interpretation guidelines:

$\text{Small effect: }|d| \approx 0.2$
$\text{Medium effect: }|d| \approx 0.5$
$\text{Large effect: }|d| \approx 0.8$

Power Analysis

Required sample size (n) for desired power (1-β):

n = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2\sigma_d^2}{\Delta^2}

Where:

$\alpha$ = significance level
$\beta$ = probability of Type II error
$\sigma_d$ = standard deviation of differences
$\Delta$ = minimum detectable difference

Decision Rules

Reject $H_0$ if:

Two-sided test: $|t| > t_{\alpha/2,n-1}$
Left-tailed test: $t < -t_{\alpha,n-1}$
Right-tailed test: $t > t_{\alpha,n-1}$
Or if $p\text{-value} < \alpha$

Reporting Results

Standard format for scientific reporting:

"A paired-samples t-test was conducted to compare [variable] before and after [treatment]. Results indicated that [treatment] produced a [significant/non-significant] difference in scores from [before] (M = [mean1], SD = [sd1]) to [after] (M = [mean2], SD = [sd2]), t([df]) = [t-value], p = [p-value], d = [Cohen's d]. The mean difference was [diff] (95% CI: [lower] to [upper])."

Code Examples

library(tidyverse)
library(car)
library(effsize)

set.seed(42)
n <- 30
baseline <- rnorm(n, mean = 100, sd = 15)
followup <- baseline + rnorm(n, mean = -5, sd = 5)  # Average decrease of 5 units

# Create data frame
data <- tibble(
  subject = 1:n,
  baseline = baseline,
  followup = followup,
  difference = followup - baseline
)

# Basic summary
summary_stats <- data %>%
  summarise(
    mean_diff = mean(difference),
    sd_diff = sd(difference),
    n = n()
  )

# Paired t-test
t_test_result <- t.test(data$followup, data$baseline, paired = TRUE)

# Effect size
cohens_d <- mean(data$difference) / sd(data$difference)

# Visualization
ggplot(data) +
  geom_point(aes(x = baseline, y = followup)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_minimal() +
  labs(title = "Baseline vs Follow-up Measurements",
       subtitle = paste("Mean difference:", round(mean(data$difference), 2)))

Python

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.stats.power import TTestPower

# Generate example data
np.random.seed(42)
n = 30
baseline = np.random.normal(100, 15, n)
followup = baseline + np.random.normal(-5, 5, n)
differences = followup - baseline

# Basic statistics
mean_diff = np.mean(differences)
sd_diff = np.std(differences, ddof=1)
se_diff = sd_diff / np.sqrt(n)

# Paired t-test
t_stat, p_value = stats.ttest_rel(followup, baseline)

# Effect size
cohens_d = mean_diff / sd_diff

# Power analysis
analysis = TTestPower()
power = analysis.power(effect_size=cohens_d, 
                      nobs=n,
                      alpha=0.05)

# Visualization
plt.figure(figsize=(12, 5))

# Scatterplot
plt.subplot(1, 2, 1)
plt.scatter(baseline, followup)
min_val = min(baseline.min(), followup.min())
max_val = max(baseline.max(), followup.max())
plt.plot([min_val, max_val], [min_val, max_val], '--', color='red')
plt.xlabel('Baseline')
plt.ylabel('Follow-up')
plt.title('Baseline vs Follow-up')

# Differences histogram
plt.subplot(1, 2, 2)
sns.histplot(differences, kde=True)
plt.axvline(mean_diff, color='red', linestyle='--')
plt.xlabel('Differences (Follow-up - Baseline)')
plt.title('Distribution of Differences')

plt.tight_layout()
plt.show()

print(f"Mean difference: {mean_diff:.2f}")
print(f"Standard deviation of differences: {sd_diff:.2f}")
print(f"t-statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.4f}")
print(f"Cohen's d: {cohens_d:.2f}")
print(f"Statistical Power: {power:.4f}")

Alternative Tests

Consider these alternatives when assumptions are violated:

Wilcoxon Signed-Rank Test: When normality of differences is violated or data is ordinal
Independent t-test: When samples are independent rather than paired

Verification

Two Sample Paired t-Test

Created:September 17, 2024

Last Updated:March 1, 2025

What You'll Get:

Step-by-Step Calculations: Complete mathematical breakdown from hypothesis to conclusion
Visual Analysis: T-distribution curves and difference histograms
Assumption Checking: Automatic normality testing with alternative test recommendations
APA-Style Report: Publication-ready results you can copy directly

Ready to analyze your paired data? or to see how it works, or upload your own data to begin your analysis.

Calculator

1. Load Your Data

2. Select Data Format & Options

Data Format:

Wide Format (Two columns: Before and After)

Long Format (Subject ID, Condition, Value columns)

Select first column (Before):

Select second column (After):

Significance Level:

Alternative Hypothesis:

Exclude Outliers in Differences

Note: Pairs with missing values in either column are automatically excluded from the analysis to ensure accurate calculations.

Related Calculators

One-Sample T-Test

Two-Sample T-Test (Independent)

Repeated Measures ANOVA

Wilcoxon Signed-Rank Test

Learn More

Two Sample Paired t-Test

Definition

Formula

Test Statistic:

t = \frac{\bar{d}}{s_d/\sqrt{n}}

Degrees of freedom:

df = n - 1

Confidence Intervals:

Two-sided confidence interval:

CI = \bar{d} \pm t_{\alpha/2, n-1} \cdot \frac{s_d}{\sqrt{n}}

One-sided confidence intervals:

CI = \bar{d} \pm t_{\alpha, n-1} \cdot \frac{s_d}{\sqrt{n}}

Where:

$\bar{d}$ = mean difference between paired observations
$s_d$ = standard deviation of the differences
$n$ = number of pairs

Key Assumptions

Paired Observations: Each data point in one sample has a matched pair in the other sample

Normality: The differences between pairs should be approximately normally distributed

Independence: The pairs should be independent of each other

Practical Example

Testing the effectiveness of a weight loss program by measuring participants' weights before and after the program:

Given Data:

Before weights (kg): 70, 75, 80, 85, 90
After weights (kg): 68, 72, 77, 82, 87
Differences (After - Before): -2, -3, -3, -3, -3
$\alpha = 0.05$ (two-tailed test)

Hypotheses:

Null Hypothesis ( $H_0$ ): $\mu_d = 0$ (no difference between before and after)

Alternative Hypothesis ( $H_1$ ): $\mu_d \neq 0$ (there is a difference)

Step-by-Step Calculation:

Calculate mean difference: $\bar{d} = -2.8$
Calculate standard deviation of differences: $s_d = 0.447$
Degrees of freedom: $df = 4$
Calculate t-statistic: $t = \frac{-2.8}{0.447/\sqrt{5}} = -14.0$
Critical value: $t_{0.025,4} = \pm 2.776$
Confidence interval: $\bar{d} \pm t_{\alpha/2,df} \cdot \frac{s_d}{\sqrt{n}} = -2.8 \pm 2.776 \cdot \frac{0.447}{\sqrt{5}} = [-3.2, -2.4]$

Conclusion:

Effect Size

Cohen's d for paired samples:

d = \frac{|\bar{d}|}{s_d}

Interpretation guidelines:

$\text{Small effect: }|d| \approx 0.2$
$\text{Medium effect: }|d| \approx 0.5$
$\text{Large effect: }|d| \approx 0.8$

Power Analysis

Required sample size (n) for desired power (1-β):

n = \frac{(z_{1-\alpha/2} + z_{1-\beta})^2\sigma_d^2}{\Delta^2}

Where:

$\alpha$ = significance level
$\beta$ = probability of Type II error
$\sigma_d$ = standard deviation of differences
$\Delta$ = minimum detectable difference

Decision Rules

Reject $H_0$ if:

Two-sided test: $|t| > t_{\alpha/2,n-1}$
Left-tailed test: $t < -t_{\alpha,n-1}$
Right-tailed test: $t > t_{\alpha,n-1}$
Or if $p\text{-value} < \alpha$

Reporting Results

Standard format for scientific reporting:

Code Examples

library(tidyverse)
library(car)
library(effsize)

set.seed(42)
n <- 30
baseline <- rnorm(n, mean = 100, sd = 15)
followup <- baseline + rnorm(n, mean = -5, sd = 5)  # Average decrease of 5 units

# Create data frame
data <- tibble(
  subject = 1:n,
  baseline = baseline,
  followup = followup,
  difference = followup - baseline
)

# Basic summary
summary_stats <- data %>%
  summarise(
    mean_diff = mean(difference),
    sd_diff = sd(difference),
    n = n()
  )

# Paired t-test
t_test_result <- t.test(data$followup, data$baseline, paired = TRUE)

# Effect size
cohens_d <- mean(data$difference) / sd(data$difference)

# Visualization
ggplot(data) +
  geom_point(aes(x = baseline, y = followup)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
  theme_minimal() +
  labs(title = "Baseline vs Follow-up Measurements",
       subtitle = paste("Mean difference:", round(mean(data$difference), 2)))

Python

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.stats.power import TTestPower

# Generate example data
np.random.seed(42)
n = 30
baseline = np.random.normal(100, 15, n)
followup = baseline + np.random.normal(-5, 5, n)
differences = followup - baseline

# Basic statistics
mean_diff = np.mean(differences)
sd_diff = np.std(differences, ddof=1)
se_diff = sd_diff / np.sqrt(n)

# Paired t-test
t_stat, p_value = stats.ttest_rel(followup, baseline)

# Effect size
cohens_d = mean_diff / sd_diff

# Power analysis
analysis = TTestPower()
power = analysis.power(effect_size=cohens_d, 
                      nobs=n,
                      alpha=0.05)

# Visualization
plt.figure(figsize=(12, 5))

# Scatterplot
plt.subplot(1, 2, 1)
plt.scatter(baseline, followup)
min_val = min(baseline.min(), followup.min())
max_val = max(baseline.max(), followup.max())
plt.plot([min_val, max_val], [min_val, max_val], '--', color='red')
plt.xlabel('Baseline')
plt.ylabel('Follow-up')
plt.title('Baseline vs Follow-up')

# Differences histogram
plt.subplot(1, 2, 2)
sns.histplot(differences, kde=True)
plt.axvline(mean_diff, color='red', linestyle='--')
plt.xlabel('Differences (Follow-up - Baseline)')
plt.title('Distribution of Differences')

plt.tight_layout()
plt.show()

print(f"Mean difference: {mean_diff:.2f}")
print(f"Standard deviation of differences: {sd_diff:.2f}")
print(f"t-statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.4f}")
print(f"Cohen's d: {cohens_d:.2f}")
print(f"Statistical Power: {power:.4f}")

Alternative Tests

Consider these alternatives when assumptions are violated:

Wilcoxon Signed-Rank Test: When normality of differences is violated or data is ordinal
Independent t-test: When samples are independent rather than paired

Verification

library(tidyverse) library(car) library(effsize) set.seed(42) n <- 30 baseline <- rnorm(n, mean = 100, sd = 15) followup <- baseline + rnorm(n, mean = -5, sd = 5) # Average decrease of 5 units # Create data frame data <- tibble( subject = 1:n, baseline = baseline, followup = followup, difference = followup - baseline ) # Basic summary summary_stats <- data %>% summarise( mean_diff = mean(difference), sd_diff = sd(difference), n = n() ) # Paired t-test t_test_result <- t.test(data$followup, data$baseline, paired = TRUE) # Effect size cohens_d <- mean(data$difference) / sd(data$difference) # Visualization ggplot(data) + geom_point(aes(x = baseline, y = followup)) + geom_abline(intercept = 0, slope = 1, linetype = "dashed") + theme_minimal() + labs(title = "Baseline vs Follow-up Measurements", subtitle = paste("Mean difference:", round(mean(data$difference), 2)))

import numpy as np from scipy import stats import matplotlib.pyplot as plt import seaborn as sns from statsmodels.stats.power import TTestPower # Generate example data np.random.seed(42) n = 30 baseline = np.random.normal(100, 15, n) followup = baseline + np.random.normal(-5, 5, n) differences = followup - baseline # Basic statistics mean_diff = np.mean(differences) sd_diff = np.std(differences, ddof=1) se_diff = sd_diff / np.sqrt(n) # Paired t-test t_stat, p_value = stats.ttest_rel(followup, baseline) # Effect size cohens_d = mean_diff / sd_diff # Power analysis analysis = TTestPower() power = analysis.power(effect_size=cohens_d, nobs=n, alpha=0.05) # Visualization plt.figure(figsize=(12, 5)) # Scatterplot plt.subplot(1, 2, 1) plt.scatter(baseline, followup) min_val = min(baseline.min(), followup.min()) max_val = max(baseline.max(), followup.max()) plt.plot([min_val, max_val], [min_val, max_val], '--', color='red') plt.xlabel('Baseline') plt.ylabel('Follow-up') plt.title('Baseline vs Follow-up') # Differences histogram plt.subplot(1, 2, 2) sns.histplot(differences, kde=True) plt.axvline(mean_diff, color='red', linestyle='--') plt.xlabel('Differences (Follow-up - Baseline)') plt.title('Distribution of Differences') plt.tight_layout() plt.show() print(f"Mean difference: {mean_diff:.2f}") print(f"Standard deviation of differences: {sd_diff:.2f}") print(f"t-statistic: {t_stat:.2f}") print(f"p-value: {p_value:.4f}") print(f"Cohen's d: {cohens_d:.2f}") print(f"Statistical Power: {power:.4f}")

Two Sample Paired t-Test

What You'll Get:

Calculator

1. Load Your Data

2. Select Data Format & Options

Related Calculators

One-Sample T-Test

Two-Sample T-Test (Independent)

Repeated Measures ANOVA

Wilcoxon Signed-Rank Test

Learn More

Two Sample Paired t-Test

Definition

Formula

Key Assumptions

Practical Example

Effect Size

Power Analysis

Decision Rules

Reporting Results

Code Examples

Alternative Tests

Verification

View Verification Details

Two Sample Paired t-Test

What You'll Get:

Calculator

1. Load Your Data

2. Select Data Format & Options

Related Calculators

One-Sample T-Test

Two-Sample T-Test (Independent)

Repeated Measures ANOVA

Wilcoxon Signed-Rank Test

Learn More

Two Sample Paired t-Test

Definition

Formula

Key Assumptions

Practical Example

Effect Size

Power Analysis

Decision Rules

Reporting Results

Code Examples

Alternative Tests

Verification

View Verification Details

Two Sample Paired t-Test

What You'll Get:

Calculator

1. Load Your Data

2. Select Data Format & Options

Related Calculators

One-Sample T-Test

Two-Sample T-Test (Independent)

Repeated Measures ANOVA

Wilcoxon Signed-Rank Test

Learn More

Two Sample Paired t-Test

Definition

Formula

Key Assumptions

Practical Example

Effect Size

Power Analysis

Decision Rules

Reporting Results

Code Examples

Alternative Tests

Verification

View Verification Details

Two Sample Paired t-Test

What You'll Get:

Calculator

1. Load Your Data

2. Select Data Format & Options

Related Calculators

One-Sample T-Test

Two-Sample T-Test (Independent)