Set equal to H₀ mean to simulate Type I errors, different to simulate power
This simulation demonstrates the behavior of p-values under two critical scenarios:
The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.
A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:
Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.
Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).
Power is the probability of correctly rejecting a false null hypothesis. It depends on:
Measures the standardized difference between means:
Expected Result: The histogram should be roughly flat (uniform distribution), and you should see about 5% of tests incorrectly rejecting H₀. This demonstrates that even when the null hypothesis is true, we make errors at the rate of α.
Expected Result: Effect size (Cohen's d) = 1.0 (large). The histogram should be heavily left-skewed with most p-values near 0. You should see very high power (95%+ rejection rate). This shows that large effects are easy to detect.
Try these two configurations:
Small Sample
Large Sample
Expected Result: Same small effect (d = 0.3), but drastically different power. The small sample might have ~30% power, while the large sample could have ~80% power. This illustrates why sample size planning is crucial.
Keep all parameters the same but vary α:
Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.
Set equal to H₀ mean to simulate Type I errors, different to simulate power
This simulation demonstrates the behavior of p-values under two critical scenarios:
The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.
A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:
Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.
Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).
Power is the probability of correctly rejecting a false null hypothesis. It depends on:
Measures the standardized difference between means:
Expected Result: The histogram should be roughly flat (uniform distribution), and you should see about 5% of tests incorrectly rejecting H₀. This demonstrates that even when the null hypothesis is true, we make errors at the rate of α.
Expected Result: Effect size (Cohen's d) = 1.0 (large). The histogram should be heavily left-skewed with most p-values near 0. You should see very high power (95%+ rejection rate). This shows that large effects are easy to detect.
Try these two configurations:
Small Sample
Large Sample
Expected Result: Same small effect (d = 0.3), but drastically different power. The small sample might have ~30% power, while the large sample could have ~80% power. This illustrates why sample size planning is crucial.
Keep all parameters the same but vary α:
Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.