P-Value Simulation
Interactive Simulation
Simulation Parameters
Set equal to H₀ mean to simulate Type I errors, different to simulate power
Statistical Power Scenario
P-Value Distribution
Related Calculators
Learn More
Understanding the P-Value Simulation
What This Simulation Shows
This simulation demonstrates the behavior of p-values under two critical scenarios:
When the actual population mean equals the null hypothesis mean, we expect p-values to be uniformly distributed between 0 and 1. The rejection rate should match the significance level (α), demonstrating the controlled Type I error rate.
When the actual population mean differs from the null hypothesis mean, p-values cluster near 0. The rejection rate represents the test's power - its ability to correctly detect a false null hypothesis.
The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.
Key Concepts
Understanding P-Values
A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:
- The probability that H₀ is true
- The probability that your results occurred by chance
- The probability of making a Type I error
Type I vs Type II Errors
Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.
Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).
Statistical Power
Power is the probability of correctly rejecting a false null hypothesis. It depends on:
- Effect size: Larger effects are easier to detect
- Sample size: More data provides more power
- Significance level: Higher α increases power but also Type I error risk
- Population variability: Less variance makes effects easier to detect
Effect Size (Cohen's d)
Measures the standardized difference between means:
- Small effect: d ≈ 0.2
- Medium effect: d ≈ 0.5
- Large effect: d ≈ 0.8
How to Use This Simulation
- Explore Type I Errors: Set the actual population mean equal to the null hypothesis mean. Run simulations and observe that the rejection rate approximates your chosen significance level (α).
- Investigate Power: Set the actual population mean different from the null hypothesis mean. Observe how the rejection rate (power) changes with different effect sizes and sample sizes.
- Sample Size Impact: Keep other parameters constant and vary only the sample size. Notice how larger samples increase power when H₀ is false.
- Significance Level Trade-offs: Try different α values to see the trade-off between Type I error control and statistical power.
- Visualization Insights: Toggle between histogram and scatter plot views. The histogram clearly shows the distribution shape, while the scatter plot shows individual test outcomes.
Example Scenarios to Try
Scenario 1: Understanding Type I Errors
Expected Result: The histogram should be roughly flat (uniform distribution), and you should see about 5% of tests incorrectly rejecting H₀. This demonstrates that even when the null hypothesis is true, we make errors at the rate of α.
Scenario 2: Detecting a Large Effect
Expected Result: Effect size (Cohen's d) = 1.0 (large). The histogram should be heavily left-skewed with most p-values near 0. You should see very high power (95%+ rejection rate). This shows that large effects are easy to detect.
Scenario 3: Sample Size Matters for Small Effects
Try these two configurations:
Small Sample
Large Sample
Expected Result: Same small effect (d = 0.3), but drastically different power. The small sample might have ~30% power, while the large sample could have ~80% power. This illustrates why sample size planning is crucial.
Scenario 4: The Trade-off Between α and Power
Keep all parameters the same but vary α:
Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.
Common Misconceptions
Reality: The p-value assumes H₀ is true. It can't tell you the probability that H₀ is true.
Reality: It only means insufficient evidence against H₀. The test might lack power to detect a real effect.
Reality: P-values depend on both effect size AND sample size. Large samples can produce tiny p-values even for trivial effects.
P-Value Simulation
Interactive Simulation
Simulation Parameters
Set equal to H₀ mean to simulate Type I errors, different to simulate power
Statistical Power Scenario
P-Value Distribution
Related Calculators
Learn More
Understanding the P-Value Simulation
What This Simulation Shows
This simulation demonstrates the behavior of p-values under two critical scenarios:
When the actual population mean equals the null hypothesis mean, we expect p-values to be uniformly distributed between 0 and 1. The rejection rate should match the significance level (α), demonstrating the controlled Type I error rate.
When the actual population mean differs from the null hypothesis mean, p-values cluster near 0. The rejection rate represents the test's power - its ability to correctly detect a false null hypothesis.
The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.
Key Concepts
Understanding P-Values
A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:
- The probability that H₀ is true
- The probability that your results occurred by chance
- The probability of making a Type I error
Type I vs Type II Errors
Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.
Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).
Statistical Power
Power is the probability of correctly rejecting a false null hypothesis. It depends on:
- Effect size: Larger effects are easier to detect
- Sample size: More data provides more power
- Significance level: Higher α increases power but also Type I error risk
- Population variability: Less variance makes effects easier to detect
Effect Size (Cohen's d)
Measures the standardized difference between means:
- Small effect: d ≈ 0.2
- Medium effect: d ≈ 0.5
- Large effect: d ≈ 0.8
How to Use This Simulation
- Explore Type I Errors: Set the actual population mean equal to the null hypothesis mean. Run simulations and observe that the rejection rate approximates your chosen significance level (α).
- Investigate Power: Set the actual population mean different from the null hypothesis mean. Observe how the rejection rate (power) changes with different effect sizes and sample sizes.
- Sample Size Impact: Keep other parameters constant and vary only the sample size. Notice how larger samples increase power when H₀ is false.
- Significance Level Trade-offs: Try different α values to see the trade-off between Type I error control and statistical power.
- Visualization Insights: Toggle between histogram and scatter plot views. The histogram clearly shows the distribution shape, while the scatter plot shows individual test outcomes.
Example Scenarios to Try
Scenario 1: Understanding Type I Errors
Expected Result: The histogram should be roughly flat (uniform distribution), and you should see about 5% of tests incorrectly rejecting H₀. This demonstrates that even when the null hypothesis is true, we make errors at the rate of α.
Scenario 2: Detecting a Large Effect
Expected Result: Effect size (Cohen's d) = 1.0 (large). The histogram should be heavily left-skewed with most p-values near 0. You should see very high power (95%+ rejection rate). This shows that large effects are easy to detect.
Scenario 3: Sample Size Matters for Small Effects
Try these two configurations:
Small Sample
Large Sample
Expected Result: Same small effect (d = 0.3), but drastically different power. The small sample might have ~30% power, while the large sample could have ~80% power. This illustrates why sample size planning is crucial.
Scenario 4: The Trade-off Between α and Power
Keep all parameters the same but vary α:
Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.
Common Misconceptions
Reality: The p-value assumes H₀ is true. It can't tell you the probability that H₀ is true.
Reality: It only means insufficient evidence against H₀. The test might lack power to detect a real effect.
Reality: P-values depend on both effect size AND sample size. Large samples can produce tiny p-values even for trivial effects.