Statistical power is one of the most important — yet often overlooked — concepts in research design. A study without adequate power is like searching for your keys in the dark: even if they're there, you might not find them.
This guide covers everything you need to know about power analysis: what statistical power is, why it matters, the different types of power analysis, and how to conduct one in Python and R. By the end, you'll be able to plan studies that are properly powered to detect the effects you care about.
Run a power analysis now: Use our Sample Size & Power Analysis Calculator to calculate sample size, power, or minimum detectable effect size for your study.
Statistical power is the probability that a study will correctly detect an effect when one truly exists. Formally:
where is the probability of a Type II error (failing to detect a real effect).
The 80% convention: Most fields consider 80% power as the minimum acceptable level. This means accepting a 20% chance of a Type II error. For high-stakes research (clinical trials, regulatory decisions), 90% or higher is often required.
Power analysis involves four interconnected quantities. If you know any three, you can solve for the fourth. This is what makes power analysis so versatile.
The number of observations per group. Larger samples increase power by providing more information about the population.
Relationship: n ↑ = Power ↑
The magnitude of the difference or relationship you're trying to detect. Larger effects are easier to detect.
Relationship: Effect Size ↑ = Power ↑
The threshold for rejecting the null hypothesis. A more lenient alpha (e.g., 0.10) increases power but also increases the risk of false positives.
Relationship: = Power (but Type I error )
The probability of correctly rejecting a false null hypothesis. This is what you're trying to maximize (usually to at least 0.80).
Target: 0.80 (minimum) to 0.95 (clinical trials)
There are three main types of power analysis, each answering a different question. Understanding which to use is critical.
Question: "How many participants do I need?"
You specify: , power, and effect size. You solve for: sample size.
This is the gold standard for study planning and is typically required in grant proposals, IRB applications, and clinical trial registrations.
from statsmodels.stats.power import TTestIndPower
import numpy as np
analysis = TTestIndPower()
# A priori: find n given power, alpha, effect size
n = analysis.solve_power(
effect_size=0.5, # medium effect
alpha=0.05,
power=0.8,
alternative='two-sided'
)
print(f"Required n per group: {np.ceil(n):.0f}")
# Required n per group: 64library(pwr)
# A priori: find n given power, alpha, effect size
result <- pwr.t.test(
d = 0.5,
sig.level = 0.05,
power = 0.8,
type = "two.sample"
)
cat("Required n per group:", ceiling(result$n))
# Required n per group: 64For a step-by-step planning process, see our How to Determine Sample Size for a Study guide.
Question: "How much power did my study have?"
You specify: , sample size, and observed effect size. You solve for: power.
Post-hoc power analysis using the observedeffect size is widely criticized by statisticians. The observed power is a direct function of the p-value — if p < 0.05, observed power will always be > 50%, and vice versa. It adds no information beyond the p-value itself.
Better alternative: If you want to assess power after a study, use a hypothetical effect size from prior research or the minimum clinically important difference — not the observed effect.
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Post-hoc: find power given n, alpha, effect size
power = analysis.power(
effect_size=0.5,
nobs1=50, # actual n per group
alpha=0.05,
alternative='two-sided'
)
print(f"Achieved power: {power:.4f} ({power*100:.1f}%)")
# Achieved power: 0.6969 (69.7%)library(pwr)
# Post-hoc: find power given n, alpha, effect size
result <- pwr.t.test(
d = 0.5,
n = 50,
sig.level = 0.05,
type = "two.sample"
)
cat("Achieved power:", round(result$power, 4))
# Achieved power: 0.6969Question: "What is the smallest effect my study can detect?"
You specify: , power, and sample size. You solve for: minimum detectable effect size.
Sensitivity analysis is particularly useful when:
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Sensitivity: find minimum detectable effect
mde = analysis.solve_power(
effect_size=None, # solve for this
nobs1=100,
alpha=0.05,
power=0.8,
alternative='two-sided'
)
print(f"Minimum detectable d: {mde:.4f}")
# Minimum detectable d: 0.3981library(pwr)
# Sensitivity: find minimum detectable effect
result <- pwr.t.test(
n = 100,
sig.level = 0.05,
power = 0.8,
type = "two.sample"
)
cat("Minimum detectable d:", round(result$d, 4))
# Minimum detectable d: 0.3981Different tests have different power functions. A t-test, ANOVA, chi-square, and regression all require different approaches.
This is the most critical and challenging step. Base it on prior research, pilot data, or the smallest effect that would be practically meaningful.
Standard: , power = 0.80. Adjust based on the consequences of Type I vs. Type II errors in your context.
Use our Sample Size Calculator, Python (statsmodels), R (pwr), or G*Power.
Account for dropout, multiple comparisons, and budget. Consider running a sensitivity analysis across a range of effect sizes.
library(pwr)
# Scenario: Testing new checkout flow
# Baseline conversion: 3%, expected: 4%
p1 <- 0.03
p2 <- 0.04
# Step 1: Calculate effect size (Cohen's h)
h <- ES.h(p2, p1)
cat(sprintf("Cohen's h: %.4f\n", h))
# Step 2: A priori power analysis
result <- pwr.2p.test(h = h, sig.level = 0.05, power = 0.80,
alternative = "two.sided")
cat(sprintf("Required n per group: %.0f\n", ceiling(result$n)))
# Step 3: Sensitivity - what if we can only afford 3000 per group?
sens <- pwr.2p.test(n = 3000, sig.level = 0.05, power = 0.80,
alternative = "two.sided")
cat(sprintf("Min detectable Cohen's h with n=3000: %.4f\n", sens$h))
# Step 4: Power curve across sample sizes
for (sample_n in c(1000, 3000, 5000, 7000, 10000)) {
p <- pwr.2p.test(h = h, n = sample_n, sig.level = 0.05,
alternative = "two.sided")
cat(sprintf(" n=%5d: power = %.2f%%\n", sample_n, p$power * 100))
}Output:
Cohen's h: 0.0545 Required n per group: 5276 Min detectable Cohen's h with n=3000: 0.0723 n= 1000: power = 23.03% n= 3000: power = 56.07% n= 5000: power = 77.86% n= 7000: power = 89.75% n=10000: power = 97.11%
Adjust the sample size and effect size below to see how they influence statistical power for a two-sample t-test (, two-sided).
n = 50 per group (100 total)
d = 0.50 (medium)
This study is underpowered. Consider increasing the sample size to at least reach 80% power.
Observed power is a monotonic function of the p-value. A non-significant result will always show low observed power. This is circular reasoning and adds no information.
Cohen's conventions are generic defaults, not universal truths. A "small" effect in psychology might be a "large" effect in economics. Always base your effect size on domain knowledge or prior research when possible.
"How many participants do I need?" is incomplete without specifying what effect size you want to detect. The answer ranges from 26 (large effect) to 788 (small effect) for a simple t-test.
If your study involves multiple tests (e.g., testing 5 outcomes), the effective alpha for each test is smaller (Bonferroni: 0.05/5 = 0.01), which reduces power. Your sample size calculation should use the adjusted alpha.
This is a form of self-deception. If the true effect is smaller than assumed, the study will be underpowered. Be honest and conservative in your effect size estimate.
These questions test your conceptual and practical understanding of power analysis. Work through each one carefully.
Statistical power is one of the most important — yet often overlooked — concepts in research design. A study without adequate power is like searching for your keys in the dark: even if they're there, you might not find them.
This guide covers everything you need to know about power analysis: what statistical power is, why it matters, the different types of power analysis, and how to conduct one in Python and R. By the end, you'll be able to plan studies that are properly powered to detect the effects you care about.
Run a power analysis now: Use our Sample Size & Power Analysis Calculator to calculate sample size, power, or minimum detectable effect size for your study.
Statistical power is the probability that a study will correctly detect an effect when one truly exists. Formally:
where is the probability of a Type II error (failing to detect a real effect).
The 80% convention: Most fields consider 80% power as the minimum acceptable level. This means accepting a 20% chance of a Type II error. For high-stakes research (clinical trials, regulatory decisions), 90% or higher is often required.
Power analysis involves four interconnected quantities. If you know any three, you can solve for the fourth. This is what makes power analysis so versatile.
The number of observations per group. Larger samples increase power by providing more information about the population.
Relationship: n ↑ = Power ↑
The magnitude of the difference or relationship you're trying to detect. Larger effects are easier to detect.
Relationship: Effect Size ↑ = Power ↑
The threshold for rejecting the null hypothesis. A more lenient alpha (e.g., 0.10) increases power but also increases the risk of false positives.
Relationship: = Power (but Type I error )
The probability of correctly rejecting a false null hypothesis. This is what you're trying to maximize (usually to at least 0.80).
Target: 0.80 (minimum) to 0.95 (clinical trials)
There are three main types of power analysis, each answering a different question. Understanding which to use is critical.
Question: "How many participants do I need?"
You specify: , power, and effect size. You solve for: sample size.
This is the gold standard for study planning and is typically required in grant proposals, IRB applications, and clinical trial registrations.
from statsmodels.stats.power import TTestIndPower
import numpy as np
analysis = TTestIndPower()
# A priori: find n given power, alpha, effect size
n = analysis.solve_power(
effect_size=0.5, # medium effect
alpha=0.05,
power=0.8,
alternative='two-sided'
)
print(f"Required n per group: {np.ceil(n):.0f}")
# Required n per group: 64library(pwr)
# A priori: find n given power, alpha, effect size
result <- pwr.t.test(
d = 0.5,
sig.level = 0.05,
power = 0.8,
type = "two.sample"
)
cat("Required n per group:", ceiling(result$n))
# Required n per group: 64For a step-by-step planning process, see our How to Determine Sample Size for a Study guide.
Question: "How much power did my study have?"
You specify: , sample size, and observed effect size. You solve for: power.
Post-hoc power analysis using the observedeffect size is widely criticized by statisticians. The observed power is a direct function of the p-value — if p < 0.05, observed power will always be > 50%, and vice versa. It adds no information beyond the p-value itself.
Better alternative: If you want to assess power after a study, use a hypothetical effect size from prior research or the minimum clinically important difference — not the observed effect.
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Post-hoc: find power given n, alpha, effect size
power = analysis.power(
effect_size=0.5,
nobs1=50, # actual n per group
alpha=0.05,
alternative='two-sided'
)
print(f"Achieved power: {power:.4f} ({power*100:.1f}%)")
# Achieved power: 0.6969 (69.7%)library(pwr)
# Post-hoc: find power given n, alpha, effect size
result <- pwr.t.test(
d = 0.5,
n = 50,
sig.level = 0.05,
type = "two.sample"
)
cat("Achieved power:", round(result$power, 4))
# Achieved power: 0.6969Question: "What is the smallest effect my study can detect?"
You specify: , power, and sample size. You solve for: minimum detectable effect size.
Sensitivity analysis is particularly useful when:
from statsmodels.stats.power import TTestIndPower
analysis = TTestIndPower()
# Sensitivity: find minimum detectable effect
mde = analysis.solve_power(
effect_size=None, # solve for this
nobs1=100,
alpha=0.05,
power=0.8,
alternative='two-sided'
)
print(f"Minimum detectable d: {mde:.4f}")
# Minimum detectable d: 0.3981library(pwr)
# Sensitivity: find minimum detectable effect
result <- pwr.t.test(
n = 100,
sig.level = 0.05,
power = 0.8,
type = "two.sample"
)
cat("Minimum detectable d:", round(result$d, 4))
# Minimum detectable d: 0.3981Different tests have different power functions. A t-test, ANOVA, chi-square, and regression all require different approaches.
This is the most critical and challenging step. Base it on prior research, pilot data, or the smallest effect that would be practically meaningful.
Standard: , power = 0.80. Adjust based on the consequences of Type I vs. Type II errors in your context.
Use our Sample Size Calculator, Python (statsmodels), R (pwr), or G*Power.
Account for dropout, multiple comparisons, and budget. Consider running a sensitivity analysis across a range of effect sizes.
library(pwr)
# Scenario: Testing new checkout flow
# Baseline conversion: 3%, expected: 4%
p1 <- 0.03
p2 <- 0.04
# Step 1: Calculate effect size (Cohen's h)
h <- ES.h(p2, p1)
cat(sprintf("Cohen's h: %.4f\n", h))
# Step 2: A priori power analysis
result <- pwr.2p.test(h = h, sig.level = 0.05, power = 0.80,
alternative = "two.sided")
cat(sprintf("Required n per group: %.0f\n", ceiling(result$n)))
# Step 3: Sensitivity - what if we can only afford 3000 per group?
sens <- pwr.2p.test(n = 3000, sig.level = 0.05, power = 0.80,
alternative = "two.sided")
cat(sprintf("Min detectable Cohen's h with n=3000: %.4f\n", sens$h))
# Step 4: Power curve across sample sizes
for (sample_n in c(1000, 3000, 5000, 7000, 10000)) {
p <- pwr.2p.test(h = h, n = sample_n, sig.level = 0.05,
alternative = "two.sided")
cat(sprintf(" n=%5d: power = %.2f%%\n", sample_n, p$power * 100))
}Output:
Cohen's h: 0.0545 Required n per group: 5276 Min detectable Cohen's h with n=3000: 0.0723 n= 1000: power = 23.03% n= 3000: power = 56.07% n= 5000: power = 77.86% n= 7000: power = 89.75% n=10000: power = 97.11%
Adjust the sample size and effect size below to see how they influence statistical power for a two-sample t-test (, two-sided).
n = 50 per group (100 total)
d = 0.50 (medium)
This study is underpowered. Consider increasing the sample size to at least reach 80% power.
Observed power is a monotonic function of the p-value. A non-significant result will always show low observed power. This is circular reasoning and adds no information.
Cohen's conventions are generic defaults, not universal truths. A "small" effect in psychology might be a "large" effect in economics. Always base your effect size on domain knowledge or prior research when possible.
"How many participants do I need?" is incomplete without specifying what effect size you want to detect. The answer ranges from 26 (large effect) to 788 (small effect) for a simple t-test.
If your study involves multiple tests (e.g., testing 5 outcomes), the effective alpha for each test is smaller (Bonferroni: 0.05/5 = 0.01), which reduces power. Your sample size calculation should use the adjusted alpha.
This is a form of self-deception. If the true effect is smaller than assumed, the study will be underpowered. Be honest and conservative in your effect size estimate.
These questions test your conceptual and practical understanding of power analysis. Work through each one carefully.