Permutation Test

Created:April 16, 2025

This Permutation Test Calculator helps you determine if there's a significant difference between groups without assuming a specific data distribution. It's ideal for small sample sizes or when parametric assumptions are violated. The test works by randomly shuffling (permuting) the data between groups many times to create a null distribution, allowing you to assess how likely your observed result would be by chance alone. To learn about the data format required and test this calculator, click here to populate the sample data.

Calculator

1. Load Your Data

Note: Column names will be converted to snake_case (e.g., "Product ID" → "product_id") for processing.

2. Select Columns & Options

Select group column:

Select value column:

Random Seed (optional):

Setting a seed ensures reproducible results across multiple runs

Significance Level:

Number of Permutations:

Test Type:

Related Calculators

Independent T-Test Calculator

One-Way ANOVA Calculator

Mann-Whitney U Test Calculator

Bootstrap Test Calculator

Learn More

Permutation Test

Definition

Permutation Test is a non-parametric statistical method used to determine if there's a significant difference between groups by randomly shuffling (permuting) the data between groups many times to create a null distribution, then comparing the observed difference to this distribution.

When to Use Permutation Tests

Permutation tests are particularly useful in these situations:

When your sample size is small
When your data doesn't meet parametric test assumptions (like normality)
When you want to make minimal assumptions about underlying distributions
For complex test statistics without known sampling distributions
When you need a test that maintains good statistical power with non-normal data
When testing for independence between variables

How Permutation Tests Work (Step by Step)

Calculate the observed test statistic:
$T_{obs} = \text{statistic}(\text{group}_1, \text{group}_2, ...)$
This could be a difference in means, medians, or any other statistic of interest.
Combine all data from all groups:
$\text{combined} = \text{group}_1 \cup \text{group}_2 \cup ...$
Repeat many times (e.g., 10,000 iterations):
1. Randomly shuffle the combined data
2. Reassign data points to groups with original group sizes
3. Calculate the test statistic for this permutation
4. Store this permuted test statistic
Calculate the p-value:
$p = \frac{\text{Number of permuted statistics } \geq |T_{obs}|}{\text{Number of permutations}}$
For a two-sided test, we count how many permuted statistics are as or more extreme than the observed statistic.

Key Advantages

Distribution-Free: No assumptions about underlying data distributions

Flexible: Can be applied to many different test statistics

Small Sample Size: Works well when sample sizes are too small for parametric tests

Exact p-values: Provides exact p-values (limited only by number of permutations)

Practical Example

Step 1: State the Data

Group A	Group B
75	85
72	86
80	83
78	87
76	84

Step 2: Calculate Observed Difference

Mean of Group A: (75 + 72 + 80 + 78 + 76) / 5 = 76.2
Mean of Group B: (85 + 86 + 83 + 87 + 84) / 5 = 85.0
Observed difference: 85.0 - 76.2 = 8.8

Step 3: Perform Permutation Test

Combine all data: 75, 72, 80, 78, 76, 85, 86, 83, 87, 84

Randomly shuffle and split into groups many times (10,000 permutations)

Calculate the difference for each permutation

Step 4: Calculate p-value

In our example, if we assumed that we found 19 out of 10,000 permutations where the absolute difference was greater than or equal to 8.8, then the p-value would be:

p-value = 19/10,000 = 0.0019

Step 5: Draw Conclusion

Since the p-value (0.0019) is less than our significance level (0.05), we reject the null hypothesis. There is statistically significant evidence to conclude that the groups differ.

Effect Size

Cohen's d can be used to measure effect size:

d = \frac{|\bar{x}_1 - \bar{x}_2|}{s_{pooled}}

where $s_{pooled} = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}$

Guidelines:

Small effect: $d \approx 0.2$
Medium effect: $d \approx 0.5$
Large effect: $d \approx 0.8$

For our example: $d = \frac{|85.0 - 76.2|}{2.42} = 3.638$ which indicates a very large effect.

Code Examples

library(tidyverse)

set.seed(42)
group1 <- c(75, 72, 80, 78, 76)
group2 <- c(85, 86, 83, 87, 84)

observed_diff <- mean(group2) - mean(group1)
print(str_glue("Observed difference: {observed_diff}"))

combined <- c(group1, group2)
n1 <- length(group1)
n <- length(combined)

n_perm <- 10000
perm_diffs <- numeric(n_perm)

for (i in 1:n_perm) {
  perm <- sample(combined, n, replace = FALSE)
  perm_group1 <- perm[1:n1]
  perm_group2 <- perm[(n1+1):n]
  perm_diffs[i] <- mean(perm_group2) - mean(perm_group1)
}

p_value <- mean(abs(perm_diffs) >= abs(observed_diff))
print(str_glue("Permutation test p-value: {p_value}"))


# plot permuted differences with observed difference

ggplot(data.frame(perm_diffs), aes(x = perm_diffs)) +
  geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "lightblue", color = "black") +
  geom_vline(aes(xintercept = observed_diff), color = "red", linetype = "dashed", linewidth = 1) +
  geom_vline(aes(xintercept = -observed_diff), color = "red", linetype = "dashed", linewidth = 1) +
  labs(title = "Permutation Test: Distribution of Permuted Differences",
       x = "Difference in Means",
       y = "Density") +
  theme_minimal()

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

np.random.seed(42)

group1 = np.array([75, 72, 80, 78, 76])
group2 = np.array([85, 86, 83, 87, 84])

# observed difference
observed_diff = np.mean(group2) - np.mean(group1)
print(f"Observed difference: {observed_diff}")

combined = np.concatenate([group1, group2])
n1 = len(group1)
n = len(combined)

n_perm = 10000
perm_diffs = np.zeros(n_perm)

for i in range(n_perm):
    # Randomly permute the combined data
    perm = np.random.permutation(combined)
    # Split into two groups of original sizes
    perm_group1 = perm[:n1]
    perm_group2 = perm[n1:]
    # Calculate and store the difference in means
    perm_diffs[i] = np.mean(perm_group2) - np.mean(perm_group1)

# p-value
p_value = np.mean(np.abs(perm_diffs) >= np.abs(observed_diff))
print(f"Permutation test p-value: {p_value}")

# plot the permuted differences with observed difference
plt.figure(figsize=(10, 6))
sns.histplot(perm_diffs, kde=True, color='lightblue', edgecolor='black')
plt.axvline(x=observed_diff, color='red', linestyle='dashed', linewidth=2, label='Observed difference')
plt.axvline(x=-observed_diff, color='red', linestyle='dashed', linewidth=2)
plt.title('Permutation Test: Distribution of Permuted Differences')
plt.xlabel('Difference in Means')
plt.ylabel('Density')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Comparison to Other Tests

How permutation tests compare to other common statistical methods:

t-test: Permutation tests are more flexible and don't require normality assumptions, but t-tests are simpler to compute and have closed-form solutions.
Mann-Whitney U Test: Both are non-parametric, but permutation tests can use any test statistic, while Mann-Whitney focuses on ranks.
Bootstrap Tests: Bootstrap tests resample with replacement, while permutation tests shuffle existing data. Permutation tests are better for testing differences between groups.

Permutation Test

Calculator

1. Load Your Data

2. Select Columns & Options

Related Calculators

Independent T-Test Calculator

One-Way ANOVA Calculator

Mann-Whitney U Test Calculator

Bootstrap Test Calculator

Learn More

Permutation Test

Definition

When to Use Permutation Tests

How Permutation Tests Work (Step by Step)

Key Advantages

Practical Example

Step 1: State the Data

Step 2: Calculate Observed Difference

Step 3: Perform Permutation Test

Step 4: Calculate p-value

Step 5: Draw Conclusion

Effect Size

Code Examples

Comparison to Other Tests

Verification

View Verification Details