StatsCalculators.com

P-Value Simulation

Interactive Simulation

Simulation Parameters

Set equal to H₀ mean to simulate Type I errors, different to simulate power

P-Value Distribution

p ≤ α (0.05) (Reject H₀) | p > α (Fail to Reject H₀)

Related Calculators

Learn More

Understanding the P-Value Simulation

What This Simulation Shows

This simulation demonstrates the behavior of p-values under two critical scenarios:

The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.

Key Concepts

Understanding P-Values

A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:

  • The probability that H₀ is true
  • The probability that your results occurred by chance
  • The probability of making a Type I error

Type I vs Type II Errors

Type I Error (α)

Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.

Type II Error (β)

Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).

Statistical Power

Power is the probability of correctly rejecting a false null hypothesis. It depends on:

  • Effect size: Larger effects are easier to detect
  • Sample size: More data provides more power
  • Significance level: Higher α increases power but also Type I error risk
  • Population variability: Less variance makes effects easier to detect

Effect Size (Cohen's d)

Measures the standardized difference between means:

d = (μ₁ - μ₀) / σ
  • Small effect: d ≈ 0.2
  • Medium effect: d ≈ 0.5
  • Large effect: d ≈ 0.8

How to Use This Simulation

  1. Explore Type I Errors: Set the actual population mean equal to the null hypothesis mean. Run simulations and observe that the rejection rate approximates your chosen significance level (α).
  2. Investigate Power: Set the actual population mean different from the null hypothesis mean. Observe how the rejection rate (power) changes with different effect sizes and sample sizes.
  3. Sample Size Impact: Keep other parameters constant and vary only the sample size. Notice how larger samples increase power when H₀ is false.
  4. Significance Level Trade-offs: Try different α values to see the trade-off between Type I error control and statistical power.
  5. Visualization Insights: Toggle between histogram and scatter plot views. The histogram clearly shows the distribution shape, while the scatter plot shows individual test outcomes.

Example Scenarios to Try

Scenario 1: Understanding Type I Errors

Null Hypothesis Mean: 0
Population Mean: 0
Population Std Dev: 1
Sample Size: 30
Significance Level: 0.05
Simulations: 100

Expected Result: The histogram should be roughly flat (uniform distribution), and you should see about 5% of tests incorrectly rejecting H₀. This demonstrates that even when the null hypothesis is true, we make errors at the rate of α.

Scenario 2: Detecting a Large Effect

Null Hypothesis Mean: 0
Population Mean: 1.0
Population Std Dev: 1
Sample Size: 30
Significance Level: 0.05
Simulations: 100

Expected Result: Effect size (Cohen's d) = 1.0 (large). The histogram should be heavily left-skewed with most p-values near 0. You should see very high power (95%+ rejection rate). This shows that large effects are easy to detect.

Scenario 3: Sample Size Matters for Small Effects

Try these two configurations:

Small Sample

Population Mean: 0.3
Sample Size: 20
Other params: Default

Large Sample

Population Mean: 0.3
Sample Size: 100
Other params: Default

Expected Result: Same small effect (d = 0.3), but drastically different power. The small sample might have ~30% power, while the large sample could have ~80% power. This illustrates why sample size planning is crucial.

Scenario 4: The Trade-off Between α and Power

Keep all parameters the same but vary α:

Null Hypothesis Mean: 0
Population Mean: 0.5
Sample Size: 30
Try α = 0.001, 0.01, 0.05, 0.10 (one at a time)

Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.

Common Misconceptions