Power Analysis: A Complete Guide

Created:March 5, 2026

Last Updated:March 8, 2026

Statistical power is one of the most important — yet often overlooked — concepts in research design. A study without adequate power is like searching for your keys in the dark: even if they're there, you might not find them.

This guide covers everything you need to know about power analysis: what statistical power is, why it matters, the different types of power analysis, and how to conduct one in Python and R. By the end, you'll be able to plan studies that are properly powered to detect the effects you care about.

Run a power analysis now: Use our Sample Size & Power Analysis Calculator to calculate sample size, power, or minimum detectable effect size for your study.

What Is Statistical Power?

Statistical power is the probability that a study will correctly detect an effect when one truly exists. Formally:

\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_0 \text{ is false})

where $\beta$ is the probability of a Type II error (failing to detect a real effect).

High Power (e.g., 90%)

90% chance of detecting a real effect
Only 10% risk of missing it
Requires larger sample sizes
Ideal for confirmatory studies

Low Power (e.g., 30%)

Only 30% chance of detecting a real effect
70% chance of missing it
Results are essentially uninformative
Wastes resources and participants' time

The 80% convention: Most fields consider 80% power as the minimum acceptable level. This means accepting a 20% chance of a Type II error. For high-stakes research (clinical trials, regulatory decisions), 90% or higher is often required.

The Four Pillars of Power Analysis

Power analysis involves four interconnected quantities. If you know any three, you can solve for the fourth. This is what makes power analysis so versatile.

1. Sample Size (n)

The number of observations per group. Larger samples increase power by providing more information about the population.

Relationship: n ↑ = Power ↑

2. Effect Size

The magnitude of the difference or relationship you're trying to detect. Larger effects are easier to detect.

Relationship: Effect Size ↑ = Power ↑

3. Significance Level ( $\alpha$ )

The threshold for rejecting the null hypothesis. A more lenient alpha (e.g., 0.10) increases power but also increases the risk of false positives.

Relationship: $\alpha \uparrow$ = Power $\uparrow$ (but Type I error $\uparrow$ )

4. Power ( $1 - \beta$ )

The probability of correctly rejecting a false null hypothesis. This is what you're trying to maximize (usually to at least 0.80).

Target: 0.80 (minimum) to 0.95 (clinical trials)

Types of Power Analysis

There are three main types of power analysis, each answering a different question. Understanding which to use is critical.

A Priori Power Analysis (Before the Study)

Most important — always do this

Question: "How many participants do I need?"

You specify: $\alpha$ , power, and effect size. You solve for: sample size.

This is the gold standard for study planning and is typically required in grant proposals, IRB applications, and clinical trial registrations.

Python

from statsmodels.stats.power import TTestIndPower
import numpy as np

analysis = TTestIndPower()

# A priori: find n given power, alpha, effect size
n = analysis.solve_power(
    effect_size=0.5,   # medium effect
    alpha=0.05,
    power=0.8,
    alternative='two-sided'
)
print(f"Required n per group: {np.ceil(n):.0f}")
# Required n per group: 64

R

library(pwr)

# A priori: find n given power, alpha, effect size
result <- pwr.t.test(
    d = 0.5,
    sig.level = 0.05,
    power = 0.8,
    type = "two.sample"
)
cat("Required n per group:", ceiling(result$n))
# Required n per group: 64

For a step-by-step planning process, see our How to Determine Sample Size for a Study guide.

Post-Hoc Power Analysis (After the Study)

Use with caution

Question: "How much power did my study have?"

You specify: $\alpha$ , sample size, and observed effect size. You solve for: power.

Important Caveat

Post-hoc power analysis using the observedeffect size is widely criticized by statisticians. The observed power is a direct function of the p-value — if p < 0.05, observed power will always be > 50%, and vice versa. It adds no information beyond the p-value itself.

Better alternative: If you want to assess power after a study, use a hypothetical effect size from prior research or the minimum clinically important difference — not the observed effect.

Python

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Post-hoc: find power given n, alpha, effect size
power = analysis.power(
    effect_size=0.5,
    nobs1=50,          # actual n per group
    alpha=0.05,
    alternative='two-sided'
)
print(f"Achieved power: {power:.4f} ({power*100:.1f}%)")
# Achieved power: 0.6969 (69.7%)

R

library(pwr)

# Post-hoc: find power given n, alpha, effect size
result <- pwr.t.test(
    d = 0.5,
    n = 50,
    sig.level = 0.05,
    type = "two.sample"
)
cat("Achieved power:", round(result$power, 4))
# Achieved power: 0.6969

Sensitivity Analysis

Very useful — underutilized

Question: "What is the smallest effect my study can detect?"

You specify: $\alpha$ , power, and sample size. You solve for: minimum detectable effect size.

Sensitivity analysis is particularly useful when:

Your sample size is fixed (e.g., existing dataset, budget constraint)
You want to assess what effects are realistically detectable
Reviewing others' studies to understand their detection limits

Python

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Sensitivity: find minimum detectable effect
mde = analysis.solve_power(
    effect_size=None,  # solve for this
    nobs1=100,
    alpha=0.05,
    power=0.8,
    alternative='two-sided'
)
print(f"Minimum detectable d: {mde:.4f}")
# Minimum detectable d: 0.3981

R

library(pwr)

# Sensitivity: find minimum detectable effect
result <- pwr.t.test(
    n = 100,
    sig.level = 0.05,
    power = 0.8,
    type = "two.sample"
)
cat("Minimum detectable d:", round(result$d, 4))
# Minimum detectable d: 0.3981

How to Conduct a Power Analysis: Step by Step

Choose your statistical test

Different tests have different power functions. A t-test, ANOVA, chi-square, and regression all require different approaches.

Determine the expected effect size

This is the most critical and challenging step. Base it on prior research, pilot data, or the smallest effect that would be practically meaningful.

Set alpha and desired power

Standard: $\alpha = 0.05$ , power = 0.80. Adjust based on the consequences of Type I vs. Type II errors in your context.

Calculate using software or a calculator

Use our Sample Size Calculator, Python (statsmodels), R (pwr), or G*Power.

Adjust for practical constraints

Account for dropout, multiple comparisons, and budget. Consider running a sensitivity analysis across a range of effect sizes.

Complete Example: Power Analysis for an A/B Test

library(pwr)

# Scenario: Testing new checkout flow
# Baseline conversion: 3%, expected: 4%
p1 <- 0.03
p2 <- 0.04

# Step 1: Calculate effect size (Cohen's h)
h <- ES.h(p2, p1)
cat(sprintf("Cohen's h: %.4f\n", h))

# Step 2: A priori power analysis
result <- pwr.2p.test(h = h, sig.level = 0.05, power = 0.80,
                      alternative = "two.sided")
cat(sprintf("Required n per group: %.0f\n", ceiling(result$n)))

# Step 3: Sensitivity - what if we can only afford 3000 per group?
sens <- pwr.2p.test(n = 3000, sig.level = 0.05, power = 0.80,
                    alternative = "two.sided")
cat(sprintf("Min detectable Cohen's h with n=3000: %.4f\n", sens$h))

# Step 4: Power curve across sample sizes
for (sample_n in c(1000, 3000, 5000, 7000, 10000)) {
  p <- pwr.2p.test(h = h, n = sample_n, sig.level = 0.05,
                   alternative = "two.sided")
  cat(sprintf("  n=%5d: power = %.2f%%\n", sample_n, p$power * 100))
}

Output:

Cohen's h: 0.0545
Required n per group: 5276
Min detectable Cohen's h with n=3000: 0.0723
  n= 1000: power = 23.03%
  n= 3000: power = 56.07%
  n= 5000: power = 77.86%
  n= 7000: power = 89.75%
  n=10000: power = 97.11%

Loading visualization...

Interactive: See How Power Changes

Adjust the sample size and effect size below to see how they influence statistical power for a two-sample t-test ( $\alpha = 0.05$ , two-sided).

Sample Size per Group (n)

n = 50 per group (100 total)

Effect Size (Cohen's d)

d = 0.50 (medium)

Estimated Power:70.5%

0%80% threshold100%

This study is underpowered. Consider increasing the sample size to at least reach 80% power.

Common Pitfalls in Power Analysis

1. Using post-hoc power to interpret non-significant results

Observed power is a monotonic function of the p-value. A non-significant result will always show low observed power. This is circular reasoning and adds no information.

2. Using Cohen's "small/medium/large" without context

Cohen's conventions are generic defaults, not universal truths. A "small" effect in psychology might be a "large" effect in economics. Always base your effect size on domain knowledge or prior research when possible.

3. Ignoring the effect size and fixating on sample size

"How many participants do I need?" is incomplete without specifying what effect size you want to detect. The answer ranges from 26 (large effect) to 788 (small effect) for a simple t-test.

4. Not accounting for multiple comparisons

If your study involves multiple tests (e.g., testing 5 outcomes), the effective alpha for each test is smaller (Bonferroni: 0.05/5 = 0.01), which reduces power. Your sample size calculation should use the adjusted alpha.

5. Inflating the expected effect size to reduce the sample size

This is a form of self-deception. If the true effect is smaller than assumed, the study will be underpowered. Be honest and conservative in your effect size estimate.

Power Analysis: A Complete Guide

Created:March 5, 2026

Last Updated:March 8, 2026

Run a power analysis now: Use our Sample Size & Power Analysis Calculator to calculate sample size, power, or minimum detectable effect size for your study.

What Is Statistical Power?

Statistical power is the probability that a study will correctly detect an effect when one truly exists. Formally:

\text{Power} = 1 - \beta = P(\text{Reject } H_0 \mid H_0 \text{ is false})

where $\beta$ is the probability of a Type II error (failing to detect a real effect).

High Power (e.g., 90%)

90% chance of detecting a real effect
Only 10% risk of missing it
Requires larger sample sizes
Ideal for confirmatory studies

Low Power (e.g., 30%)

Only 30% chance of detecting a real effect
70% chance of missing it
Results are essentially uninformative
Wastes resources and participants' time

The Four Pillars of Power Analysis

Power analysis involves four interconnected quantities. If you know any three, you can solve for the fourth. This is what makes power analysis so versatile.

1. Sample Size (n)

The number of observations per group. Larger samples increase power by providing more information about the population.

Relationship: n ↑ = Power ↑

2. Effect Size

The magnitude of the difference or relationship you're trying to detect. Larger effects are easier to detect.

Relationship: Effect Size ↑ = Power ↑

3. Significance Level ( $\alpha$ )

The threshold for rejecting the null hypothesis. A more lenient alpha (e.g., 0.10) increases power but also increases the risk of false positives.

Relationship: $\alpha \uparrow$ = Power $\uparrow$ (but Type I error $\uparrow$ )

4. Power ( $1 - \beta$ )

The probability of correctly rejecting a false null hypothesis. This is what you're trying to maximize (usually to at least 0.80).

Target: 0.80 (minimum) to 0.95 (clinical trials)

Types of Power Analysis

There are three main types of power analysis, each answering a different question. Understanding which to use is critical.

A Priori Power Analysis (Before the Study)

Most important — always do this

Question: "How many participants do I need?"

You specify: $\alpha$ , power, and effect size. You solve for: sample size.

This is the gold standard for study planning and is typically required in grant proposals, IRB applications, and clinical trial registrations.

Python

from statsmodels.stats.power import TTestIndPower
import numpy as np

analysis = TTestIndPower()

# A priori: find n given power, alpha, effect size
n = analysis.solve_power(
    effect_size=0.5,   # medium effect
    alpha=0.05,
    power=0.8,
    alternative='two-sided'
)
print(f"Required n per group: {np.ceil(n):.0f}")
# Required n per group: 64

R

library(pwr)

# A priori: find n given power, alpha, effect size
result <- pwr.t.test(
    d = 0.5,
    sig.level = 0.05,
    power = 0.8,
    type = "two.sample"
)
cat("Required n per group:", ceiling(result$n))
# Required n per group: 64

For a step-by-step planning process, see our How to Determine Sample Size for a Study guide.

Post-Hoc Power Analysis (After the Study)

Use with caution

Question: "How much power did my study have?"

You specify: $\alpha$ , sample size, and observed effect size. You solve for: power.

Important Caveat

Better alternative: If you want to assess power after a study, use a hypothetical effect size from prior research or the minimum clinically important difference — not the observed effect.

Python

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Post-hoc: find power given n, alpha, effect size
power = analysis.power(
    effect_size=0.5,
    nobs1=50,          # actual n per group
    alpha=0.05,
    alternative='two-sided'
)
print(f"Achieved power: {power:.4f} ({power*100:.1f}%)")
# Achieved power: 0.6969 (69.7%)

R

library(pwr)

# Post-hoc: find power given n, alpha, effect size
result <- pwr.t.test(
    d = 0.5,
    n = 50,
    sig.level = 0.05,
    type = "two.sample"
)
cat("Achieved power:", round(result$power, 4))
# Achieved power: 0.6969

Sensitivity Analysis

Very useful — underutilized

Question: "What is the smallest effect my study can detect?"

You specify: $\alpha$ , power, and sample size. You solve for: minimum detectable effect size.

Sensitivity analysis is particularly useful when:

Your sample size is fixed (e.g., existing dataset, budget constraint)
You want to assess what effects are realistically detectable
Reviewing others' studies to understand their detection limits

Python

from statsmodels.stats.power import TTestIndPower

analysis = TTestIndPower()

# Sensitivity: find minimum detectable effect
mde = analysis.solve_power(
    effect_size=None,  # solve for this
    nobs1=100,
    alpha=0.05,
    power=0.8,
    alternative='two-sided'
)
print(f"Minimum detectable d: {mde:.4f}")
# Minimum detectable d: 0.3981

R

library(pwr)

# Sensitivity: find minimum detectable effect
result <- pwr.t.test(
    n = 100,
    sig.level = 0.05,
    power = 0.8,
    type = "two.sample"
)
cat("Minimum detectable d:", round(result$d, 4))
# Minimum detectable d: 0.3981

How to Conduct a Power Analysis: Step by Step

Choose your statistical test

Different tests have different power functions. A t-test, ANOVA, chi-square, and regression all require different approaches.

Determine the expected effect size

This is the most critical and challenging step. Base it on prior research, pilot data, or the smallest effect that would be practically meaningful.

Set alpha and desired power

Standard: $\alpha = 0.05$ , power = 0.80. Adjust based on the consequences of Type I vs. Type II errors in your context.

Calculate using software or a calculator

Use our Sample Size Calculator, Python (statsmodels), R (pwr), or G*Power.

Adjust for practical constraints

Account for dropout, multiple comparisons, and budget. Consider running a sensitivity analysis across a range of effect sizes.

Complete Example: Power Analysis for an A/B Test

library(pwr)

# Scenario: Testing new checkout flow
# Baseline conversion: 3%, expected: 4%
p1 <- 0.03
p2 <- 0.04

# Step 1: Calculate effect size (Cohen's h)
h <- ES.h(p2, p1)
cat(sprintf("Cohen's h: %.4f\n", h))

# Step 2: A priori power analysis
result <- pwr.2p.test(h = h, sig.level = 0.05, power = 0.80,
                      alternative = "two.sided")
cat(sprintf("Required n per group: %.0f\n", ceiling(result$n)))

# Step 3: Sensitivity - what if we can only afford 3000 per group?
sens <- pwr.2p.test(n = 3000, sig.level = 0.05, power = 0.80,
                    alternative = "two.sided")
cat(sprintf("Min detectable Cohen's h with n=3000: %.4f\n", sens$h))

# Step 4: Power curve across sample sizes
for (sample_n in c(1000, 3000, 5000, 7000, 10000)) {
  p <- pwr.2p.test(h = h, n = sample_n, sig.level = 0.05,
                   alternative = "two.sided")
  cat(sprintf("  n=%5d: power = %.2f%%\n", sample_n, p$power * 100))
}

Output:

Cohen's h: 0.0545
Required n per group: 5276
Min detectable Cohen's h with n=3000: 0.0723
  n= 1000: power = 23.03%
  n= 3000: power = 56.07%
  n= 5000: power = 77.86%
  n= 7000: power = 89.75%
  n=10000: power = 97.11%

Loading visualization...

Interactive: See How Power Changes

Adjust the sample size and effect size below to see how they influence statistical power for a two-sample t-test ( $\alpha = 0.05$ , two-sided).

Sample Size per Group (n)

n = 50 per group (100 total)

Effect Size (Cohen's d)

d = 0.50 (medium)

Estimated Power:70.5%

0%80% threshold100%

This study is underpowered. Consider increasing the sample size to at least reach 80% power.

Common Pitfalls in Power Analysis

1. Using post-hoc power to interpret non-significant results

Observed power is a monotonic function of the p-value. A non-significant result will always show low observed power. This is circular reasoning and adds no information.

2. Using Cohen's "small/medium/large" without context

3. Ignoring the effect size and fixating on sample size

"How many participants do I need?" is incomplete without specifying what effect size you want to detect. The answer ranges from 26 (large effect) to 788 (small effect) for a simple t-test.

4. Not accounting for multiple comparisons

5. Inflating the expected effect size to reduce the sample size

This is a form of self-deception. If the true effect is smaller than assumed, the study will be underpowered. Be honest and conservative in your effect size estimate.

Power Analysis: A Complete Guide

What Is Statistical Power?

High Power (e.g., 90%)

Low Power (e.g., 30%)

The Four Pillars of Power Analysis

1. Sample Size (n)

2. Effect Size

3. Significance Level (α\alphaα)

4. Power (1−β1 - \beta1−β)

Types of Power Analysis

A Priori Power Analysis (Before the Study)

Python

R

Post-Hoc Power Analysis (After the Study)

Important Caveat

Python

R

Sensitivity Analysis

Python

R

How to Conduct a Power Analysis: Step by Step

Choose your statistical test

Determine the expected effect size

Set alpha and desired power

Calculate using software or a calculator

Adjust for practical constraints

Complete Example: Power Analysis for an A/B Test

Interactive: See How Power Changes

Common Pitfalls in Power Analysis

1. Using post-hoc power to interpret non-significant results

2. Using Cohen's "small/medium/large" without context

3. Ignoring the effect size and fixating on sample size

4. Not accounting for multiple comparisons

5. Inflating the expected effect size to reduce the sample size

Test Your Understanding

View Step-by-Step Solution

Think About It

I've thought about it - Show me the solution

Power Analysis: A Complete Guide

What Is Statistical Power?

High Power (e.g., 90%)

Low Power (e.g., 30%)

The Four Pillars of Power Analysis

1. Sample Size (n)

2. Effect Size

3. Significance Level (α\alphaα)

4. Power (1−β1 - \beta1−β)

Types of Power Analysis

A Priori Power Analysis (Before the Study)

Python

R

Post-Hoc Power Analysis (After the Study)

Important Caveat

Python

R

Sensitivity Analysis

Python

R

How to Conduct a Power Analysis: Step by Step

Choose your statistical test

Determine the expected effect size

Set alpha and desired power

Calculate using software or a calculator

Adjust for practical constraints

Complete Example: Power Analysis for an A/B Test

Interactive: See How Power Changes

Common Pitfalls in Power Analysis

1. Using post-hoc power to interpret non-significant results

2. Using Cohen's "small/medium/large" without context

3. Ignoring the effect size and fixating on sample size

4. Not accounting for multiple comparisons

5. Inflating the expected effect size to reduce the sample size

Test Your Understanding

View Step-by-Step Solution

Think About It

I've thought about it - Show me the solution

3. Significance Level ( $\alpha$ )

4. Power ( $1 - \beta$ )

3. Significance Level ( $\alpha$ )

4. Power ( $1 - \beta$ )