Understanding the formulas behind sample size calculations gives you the ability to critically evaluate study designs, communicate with statisticians, and make informed decisions when planning research. In this tutorial, we break down the sample size formula for six commonly used statistical tests, each with a step-by-step worked example and code in R.
If you're looking for a high-level process rather than formulas, see our How to Determine Sample Size for a Study guide. For a deeper dive into the concept of statistical power, see Power Analysis: A Complete Guide.
Skip the math? Use our Sample Size & Power Analysis Calculator to compute sample sizes instantly with interactive power curves.
All sample size formulas share the same underlying logic. You need four quantities, and knowing any three lets you solve for the fourth:
Significance level
Statistical power
ES
Effect size
n
Sample size
The general pattern for most formulas is:
This tells us two key relationships:
Note: The manual formulas below use -values (normal approximation), while R's pwr package uses the -distribution, which has heavier tails. As a result, software-computed sample sizes are often slightly larger than the manual formula results. The software values are more accurate in practice, especially for smaller sample sizes.
Compares the means of two independent groups. The effect size is Cohen's d, the standardized mean difference.
where is Cohen's d.
Detect a medium effect (d = 0.5) with 80% power at :
library(pwr)
result <- pwr.t.test(
d = 0.5, sig.level = 0.05,
power = 0.8, type = "two.sample"
)
cat("n per group:", ceiling(result$n))
# n per group: 64Unequal groups? With allocation ratio r = n₂/n₁: ,
Compares means from the same subjects measured twice (e.g., before/after). The correlation between measurements reduces the required sample size.
where is the correlation between paired measurements.
Detect d = 0.5 with 80% power, , and correlation :
Compare to the independent t-test which needed 63 per group — the paired design needs only 26 pairs because the correlation reduces variance.
library(pwr)
# Paired t-test
result <- pwr.t.test(
d = 0.5, sig.level = 0.05,
power = 0.8, type = "paired"
)
cat("n pairs:", ceiling(result$n))
# n pairs: 34 (without rho adjustment)
# With rho = 0.6, effective d:
rho <- 0.6
d_eff <- 0.5 / sqrt(2 * (1 - rho))
result2 <- pwr.t.test(
d = d_eff, sig.level = 0.05,
power = 0.8, type = "one.sample"
)
cat("n pairs (rho=0.6):", ceiling(result2$n))
# n pairs (rho=0.6): 28Compares two proportions (e.g., conversion rates in A/B testing). Uses the arc-sine transformation or the direct proportion formula.
where is the average proportion.
Detect a change from 10% to 15% conversion, 80% power, :
The code below uses Cohen's h (arcsine transformation) rather than the raw proportion difference, which is why it gives 681 instead of 685.
library(pwr)
# Cohen's h effect size
h <- ES.h(0.10, 0.15)
result <- pwr.2p.test(
h = h, sig.level = 0.05,
power = 0.8, alternative = "two.sided"
)
cat("n per group:", ceiling(result$n))
# n per group: 681Compares means across three or more groups. The effect size is Cohen's f (related to ).
ANOVA sample size uses the non-central F-distribution. The approximation is:
where k is the number of groups and .
3 groups, medium effect (f = 0.25), 80% power, :
Using the pwr package: n = 53 per group (159 total).
library(pwr)
result <- pwr.anova.test(
f = 0.25, k = 3,
sig.level = 0.05, power = 0.8
)
cat("n per group:", ceiling(result$n))
# n per group: 53Tests associations between categorical variables. The effect size is Cohen's w.
where and df = (rows - 1)(columns - 1).
2×2 table (df = 1), medium effect (w = 0.3), 80% power, :
Using software: n = 88 total.
library(pwr)
result <- pwr.chisq.test(
w = 0.3, df = 1,
sig.level = 0.05, power = 0.8
)
cat("Total n:", ceiling(result$N))
# Total n: 88Tests whether a set of predictors explains a significant portion of variance. The effect size is Cohen's .
where p is the number of predictors and .
Cohen's conventions: (small), (medium), (large).
3 predictors, medium effect (), 80% power, :
Using software: n = 77 total.
library(pwr)
result <- pwr.f2.test(
f2 = 0.15, # Cohen's f²
u = 3, # numerator df (predictors)
sig.level = 0.05,
power = 0.8
)
# Total n = v + u + 1
total_n <- ceiling(result$v) + 3 + 1
cat("Total n:", total_n)
# Total n: 77A psychologist wants to test whether cognitive behavioral therapy (CBT) reduces anxiety scores more than a waitlist control group. Previous studies report that CBT reduces anxiety by about 8 points on the GAD-7 scale, with a pooled standard deviation of 12 points. The study expects 10% dropout. Budget allows recruiting up to 100 participants.
This is a medium-to-large effect size.
Round up: 36 per group (72 total).
Recruit 80 participants (40 per group). This is within the budget of 100.
The study needs 80 participants (40 per group) to detect a Cohen's d of 0.667 with 80% power at the 0.05 significance level, accounting for 10% dropout. This is feasible within the budget.
Try this example yourself with our Sample Size & Power Analysis Calculator. Select "Mean Difference (Independent t-test)", set d = 0.667, power = 0.80, and alpha = 0.05.
Practice applying the sample size formulas with these problems. Work through each step before revealing the answer.
Understanding the formulas behind sample size calculations gives you the ability to critically evaluate study designs, communicate with statisticians, and make informed decisions when planning research. In this tutorial, we break down the sample size formula for six commonly used statistical tests, each with a step-by-step worked example and code in R.
If you're looking for a high-level process rather than formulas, see our How to Determine Sample Size for a Study guide. For a deeper dive into the concept of statistical power, see Power Analysis: A Complete Guide.
Skip the math? Use our Sample Size & Power Analysis Calculator to compute sample sizes instantly with interactive power curves.
All sample size formulas share the same underlying logic. You need four quantities, and knowing any three lets you solve for the fourth:
Significance level
Statistical power
ES
Effect size
n
Sample size
The general pattern for most formulas is:
This tells us two key relationships:
Note: The manual formulas below use -values (normal approximation), while R's pwr package uses the -distribution, which has heavier tails. As a result, software-computed sample sizes are often slightly larger than the manual formula results. The software values are more accurate in practice, especially for smaller sample sizes.
Compares the means of two independent groups. The effect size is Cohen's d, the standardized mean difference.
where is Cohen's d.
Detect a medium effect (d = 0.5) with 80% power at :
library(pwr)
result <- pwr.t.test(
d = 0.5, sig.level = 0.05,
power = 0.8, type = "two.sample"
)
cat("n per group:", ceiling(result$n))
# n per group: 64Unequal groups? With allocation ratio r = n₂/n₁: ,
Compares means from the same subjects measured twice (e.g., before/after). The correlation between measurements reduces the required sample size.
where is the correlation between paired measurements.
Detect d = 0.5 with 80% power, , and correlation :
Compare to the independent t-test which needed 63 per group — the paired design needs only 26 pairs because the correlation reduces variance.
library(pwr)
# Paired t-test
result <- pwr.t.test(
d = 0.5, sig.level = 0.05,
power = 0.8, type = "paired"
)
cat("n pairs:", ceiling(result$n))
# n pairs: 34 (without rho adjustment)
# With rho = 0.6, effective d:
rho <- 0.6
d_eff <- 0.5 / sqrt(2 * (1 - rho))
result2 <- pwr.t.test(
d = d_eff, sig.level = 0.05,
power = 0.8, type = "one.sample"
)
cat("n pairs (rho=0.6):", ceiling(result2$n))
# n pairs (rho=0.6): 28Compares two proportions (e.g., conversion rates in A/B testing). Uses the arc-sine transformation or the direct proportion formula.
where is the average proportion.
Detect a change from 10% to 15% conversion, 80% power, :
The code below uses Cohen's h (arcsine transformation) rather than the raw proportion difference, which is why it gives 681 instead of 685.
library(pwr)
# Cohen's h effect size
h <- ES.h(0.10, 0.15)
result <- pwr.2p.test(
h = h, sig.level = 0.05,
power = 0.8, alternative = "two.sided"
)
cat("n per group:", ceiling(result$n))
# n per group: 681Compares means across three or more groups. The effect size is Cohen's f (related to ).
ANOVA sample size uses the non-central F-distribution. The approximation is:
where k is the number of groups and .
3 groups, medium effect (f = 0.25), 80% power, :
Using the pwr package: n = 53 per group (159 total).
library(pwr)
result <- pwr.anova.test(
f = 0.25, k = 3,
sig.level = 0.05, power = 0.8
)
cat("n per group:", ceiling(result$n))
# n per group: 53Tests associations between categorical variables. The effect size is Cohen's w.
where and df = (rows - 1)(columns - 1).
2×2 table (df = 1), medium effect (w = 0.3), 80% power, :
Using software: n = 88 total.
library(pwr)
result <- pwr.chisq.test(
w = 0.3, df = 1,
sig.level = 0.05, power = 0.8
)
cat("Total n:", ceiling(result$N))
# Total n: 88Tests whether a set of predictors explains a significant portion of variance. The effect size is Cohen's .
where p is the number of predictors and .
Cohen's conventions: (small), (medium), (large).
3 predictors, medium effect (), 80% power, :
Using software: n = 77 total.
library(pwr)
result <- pwr.f2.test(
f2 = 0.15, # Cohen's f²
u = 3, # numerator df (predictors)
sig.level = 0.05,
power = 0.8
)
# Total n = v + u + 1
total_n <- ceiling(result$v) + 3 + 1
cat("Total n:", total_n)
# Total n: 77A psychologist wants to test whether cognitive behavioral therapy (CBT) reduces anxiety scores more than a waitlist control group. Previous studies report that CBT reduces anxiety by about 8 points on the GAD-7 scale, with a pooled standard deviation of 12 points. The study expects 10% dropout. Budget allows recruiting up to 100 participants.
This is a medium-to-large effect size.
Round up: 36 per group (72 total).
Recruit 80 participants (40 per group). This is within the budget of 100.
The study needs 80 participants (40 per group) to detect a Cohen's d of 0.667 with 80% power at the 0.05 significance level, accounting for 10% dropout. This is feasible within the budget.
Try this example yourself with our Sample Size & Power Analysis Calculator. Select "Mean Difference (Independent t-test)", set d = 0.667, power = 0.80, and alpha = 0.05.
Practice applying the sample size formulas with these problems. Work through each step before revealing the answer.