P-Value Simulation

Interactive Simulation

Simulation Parameters

Null Hypothesis Mean (H₀: μ = ?)

Actual Population Mean

Set equal to H₀ mean to simulate Type I errors, different to simulate power

Population Standard Deviation (σ)

Sample Size (n)

Significance Level (α)

Number of Simulations

Statistical Power Scenario

H₀ is FALSE. Effect size (Cohen's d) = 0.50

P-Value Distribution

■ p ≤ α (0.05) (Reject H₀) | ■p > α (Fail to Reject H₀)

Related Calculators

One Sample t-Test Calculator

Two Sample t-Test Calculator

Chi-Square Test of Independence Calculator

View All Hypothesis Testing Calculators

Learn More

Understanding the P-Value Simulation

What This Simulation Shows

This simulation demonstrates the behavior of p-values under two critical scenarios:

Scenario 1: Type I Error (H₀ is TRUE)
When the actual population mean equals the null hypothesis mean, we expect p-values to be uniformly distributed between 0 and 1. The rejection rate should match the significance level (α), demonstrating the controlled Type I error rate.

Scenario 2: Statistical Power (H₀ is FALSE)
When the actual population mean differs from the null hypothesis mean, p-values cluster near 0. The rejection rate represents the test's power - its ability to correctly detect a false null hypothesis.

The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.

Key Concepts

Understanding P-Values

A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:

The probability that H₀ is true
The probability that your results occurred by chance
The probability of making a Type I error

Type I vs Type II Errors

Type I Error (α)

Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.

Type II Error (β)

Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).

Statistical Power

Power is the probability of correctly rejecting a false null hypothesis. It depends on:

Effect size: Larger effects are easier to detect
Sample size: More data provides more power
Significance level: Higher α increases power but also Type I error risk
Population variability: Less variance makes effects easier to detect

Effect Size (Cohen's d)

Measures the standardized difference between means:

d = (μ₁ - μ₀) / σ

Small effect: d ≈ 0.2
Medium effect: d ≈ 0.5
Large effect: d ≈ 0.8

How to Use This Simulation

Explore Type I Errors: Set the actual population mean equal to the null hypothesis mean. Run simulations and observe that the rejection rate approximates your chosen significance level (α).
Investigate Power: Set the actual population mean different from the null hypothesis mean. Observe how the rejection rate (power) changes with different effect sizes and sample sizes.
Sample Size Impact: Keep other parameters constant and vary only the sample size. Notice how larger samples increase power when H₀ is false.
Significance Level Trade-offs: Try different α values to see the trade-off between Type I error control and statistical power.
Visualization Insights: Toggle between histogram and scatter plot views. The histogram clearly shows the distribution shape, while the scatter plot shows individual test outcomes.

Example Scenarios to Try

Scenario 1: Understanding Type I Errors

Null Hypothesis Mean: 0

Population Mean: 0

Population Std Dev: 1

Sample Size: 30

Significance Level: 0.05

Simulations: 100

Expected Result: The histogram should be roughly flat (uniform distribution), and you should see about 5% of tests incorrectly rejecting H₀. This demonstrates that even when the null hypothesis is true, we make errors at the rate of α.

Scenario 2: Detecting a Large Effect

Null Hypothesis Mean: 0

Population Mean: 1.0

Population Std Dev: 1

Sample Size: 30

Significance Level: 0.05

Simulations: 100

Expected Result: Effect size (Cohen's d) = 1.0 (large). The histogram should be heavily left-skewed with most p-values near 0. You should see very high power (95%+ rejection rate). This shows that large effects are easy to detect.

Scenario 3: Sample Size Matters for Small Effects

Try these two configurations:

Small Sample

Population Mean: 0.3

Sample Size: 20

Other params: Default

Large Sample

Population Mean: 0.3

Sample Size: 100

Other params: Default

Expected Result: Same small effect (d = 0.3), but drastically different power. The small sample might have ~30% power, while the large sample could have ~80% power. This illustrates why sample size planning is crucial.

Scenario 4: The Trade-off Between α and Power

Keep all parameters the same but vary α:

Null Hypothesis Mean: 0

Population Mean: 0.5

Sample Size: 30

Try α = 0.001, 0.01, 0.05, 0.10 (one at a time)

Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.

Common Misconceptions

Misconception: "A p-value of 0.05 means there's a 5% chance the null hypothesis is true."
Reality: The p-value assumes H₀ is true. It can't tell you the probability that H₀ is true.

Misconception: "Failing to reject H₀ means H₀ is true."
Reality: It only means insufficient evidence against H₀. The test might lack power to detect a real effect.

Misconception: "A smaller p-value means a larger effect."
Reality: P-values depend on both effect size AND sample size. Large samples can produce tiny p-values even for trivial effects.

P-Value Simulation

Interactive Simulation

Simulation Parameters

Null Hypothesis Mean (H₀: μ = ?)

Actual Population Mean

Set equal to H₀ mean to simulate Type I errors, different to simulate power

Population Standard Deviation (σ)

Sample Size (n)

Significance Level (α)

Number of Simulations

Statistical Power Scenario

H₀ is FALSE. Effect size (Cohen's d) = 0.50

P-Value Distribution

■ p ≤ α (0.05) (Reject H₀) | ■p > α (Fail to Reject H₀)

Learn More

Understanding the P-Value Simulation

What This Simulation Shows

This simulation demonstrates the behavior of p-values under two critical scenarios:

The histogram visualization is particularly informative: a flat distribution when H₀ is true, and a left-skewed distribution when H₀ is false.

Key Concepts

Understanding P-Values

A p-value is the probability of observing test results at least as extreme as the actual results, assuming the null hypothesis is true. It's NOT:

The probability that H₀ is true
The probability that your results occurred by chance
The probability of making a Type I error

Type I vs Type II Errors

Type I Error (α)

Rejecting a TRUE null hypothesis (false positive). Controlled by significance level.

Type II Error (β)

Failing to reject a FALSE null hypothesis (false negative). Related to statistical power (1-β).

Statistical Power

Power is the probability of correctly rejecting a false null hypothesis. It depends on:

Effect size: Larger effects are easier to detect
Sample size: More data provides more power
Significance level: Higher α increases power but also Type I error risk
Population variability: Less variance makes effects easier to detect

Effect Size (Cohen's d)

Measures the standardized difference between means:

d = (μ₁ - μ₀) / σ

Small effect: d ≈ 0.2
Medium effect: d ≈ 0.5
Large effect: d ≈ 0.8

How to Use This Simulation

Explore Type I Errors: Set the actual population mean equal to the null hypothesis mean. Run simulations and observe that the rejection rate approximates your chosen significance level (α).
Investigate Power: Set the actual population mean different from the null hypothesis mean. Observe how the rejection rate (power) changes with different effect sizes and sample sizes.
Sample Size Impact: Keep other parameters constant and vary only the sample size. Notice how larger samples increase power when H₀ is false.
Significance Level Trade-offs: Try different α values to see the trade-off between Type I error control and statistical power.
Visualization Insights: Toggle between histogram and scatter plot views. The histogram clearly shows the distribution shape, while the scatter plot shows individual test outcomes.

Example Scenarios to Try

Scenario 1: Understanding Type I Errors

Null Hypothesis Mean: 0

Population Mean: 0

Population Std Dev: 1

Sample Size: 30

Significance Level: 0.05

Simulations: 100

Scenario 2: Detecting a Large Effect

Null Hypothesis Mean: 0

Population Mean: 1.0

Population Std Dev: 1

Sample Size: 30

Significance Level: 0.05

Simulations: 100

Scenario 3: Sample Size Matters for Small Effects

Try these two configurations:

Small Sample

Population Mean: 0.3

Sample Size: 20

Other params: Default

Large Sample

Population Mean: 0.3

Sample Size: 100

Other params: Default

Scenario 4: The Trade-off Between α and Power

Keep all parameters the same but vary α:

Null Hypothesis Mean: 0

Population Mean: 0.5

Sample Size: 30

Try α = 0.001, 0.01, 0.05, 0.10 (one at a time)

Expected Result: As α increases, you'll see higher power (more correctly detected effects) but also higher Type I error risk. This demonstrates the fundamental trade-off in hypothesis testing.

Common Misconceptions

Misconception: "A p-value of 0.05 means there's a 5% chance the null hypothesis is true."
Reality: The p-value assumes H₀ is true. It can't tell you the probability that H₀ is true.

Misconception: "Failing to reject H₀ means H₀ is true."
Reality: It only means insufficient evidence against H₀. The test might lack power to detect a real effect.

Misconception: "A smaller p-value means a larger effect."
Reality: P-values depend on both effect size AND sample size. Large samples can produce tiny p-values even for trivial effects.