This Two-Sample Z-Test Calculator helps you compare means between two independent groups when both population standard deviations are known. For example, you could compare the average output between two production lines, given known variability in each line's process. The calculator performs comprehensive statistical analysis including descriptive statistics and hypothesis testing. It also generates publication-ready APA format reports. To learn about the data format required and test this calculator, click here to populate the sample data.
Calculator
1. Load Your Data
2. Select Columns & Options
Learn More
Two-Sample Z-Test
Definition
Two-Sample Z-Test is a statistical test used to determine whether the means of two populations are significantly different from each other when both population standard deviations are known. It's particularly useful for large samples and when working with known population parameters.
Formula
Test Statistic:
Where:
- = sample means
- = population means
- = known population standard deviations
- = sample sizes
Confidence Interval for Mean Difference:
Key Assumptions
Practical Example
Comparing the efficiency of two production lines with known process variations:
Step 1: State the Data
- Line 1: = 50, = 95.2 units/hour, = 4.0
- Line 2: = 45, = 93.8 units/hour, = 3.8
Step 2: State Hypotheses
- (no difference)
- (there is a difference)
Step 3: Calculate Test Statistic
Z-statistic:
Step 4: Calculate P-value
For two-tailed test:
Step 5: Calculate Confidence Interval
Step 6: Draw Conclusion
Critical value at 5% significance level:
Since and , we fail to reject . There is no significant difference between the two production lines.
Effect Size
Cohen's d for two-sample z-test:
Interpretation guidelines:
- Small effect:
- Medium effect:
- Large effect:
Power Analysis
Required sample size per group for equal sample sizes:
Where:
- = significance level
- = probability of Type II error
- = minimum detectable difference
Decision Rules
Reject if:
- Two-sided test:
- Left-tailed test:
- Right-tailed test:
- Or if
Reporting Results
Standard format:
Code Examples
library(tidyverse)
set.seed(42)
# Production Line 1 data (known σ₁ = 4.0)
line1 <- tibble(
line = "Line 1",
units = rnorm(50, mean = 95.2, sd = 4.0)
)
# Production Line 2 data (known σ₂ = 3.8)
line2 <- tibble(
line = "Line 2",
units = rnorm(45, mean = 93.8, sd = 3.8)
)
# Combine data
production_data <- bind_rows(line1, line2)
# Summarize the data
summary_stats <- production_data |>
group_by(line) |>
summarise(
n = n(),
mean = mean(units),
".groups" = "drop"
) |>
mutate(known_sd = if_else(line == "Line 1", line1_pop_sd, line2_pop_sd))
# Perform two-sample Z-test
line1_stats <- summary_stats |> filter(line == "Line 1")
line2_stats <- summary_stats |> filter(line == "Line 2")
# Calculate z-statistic
z_stat <- (line1_stats$mean - line2_stats$mean) / sqrt((line1_stats$known_sd^2 / line1_stats$n) + (line2_stats$known_sd^2 / line2_stats$n))
print(str_glue("Z-statistic: {round(z_stat, 3)}"))
# 95% confidence interval
alpha <- 0.05
z_alpha <- qnorm(1 - alpha/2)
mean_diff <- line1_stats$mean - line2_stats$mean
margin_of_error <- z_alpha * sqrt((line1_stats$known_sd^2 / line1_stats$n) + (line2_stats$known_sd^2 / line2_stats$n))
ci_lower <- mean_diff - margin_of_error
ci_upper <- mean_diff + margin_of_error
print(str_glue("95% CI: [{round(ci_lower, 2)}, {round(ci_upper, 2)}]")
# Calculate p-value (two-sided test)
p_value <- 2 * (1 - pnorm(abs(z_stat)))
print(str_glue("P-value: {round(p_value, 4)}")
# Calculate effect size (Cohen's d)
pooled_sd <- sqrt((4.0^2 + 3.8^2) / 2)
cohens_d <- abs(line1_stats$mean - line2_stats$mean) / pooled_sd
print(str_glue("Effect size (Cohen's d): {round(cohens_d, 3)}"
# Visualization
ggplot(production_data, aes(x = line, y = units, fill = line)) +
geom_boxplot(alpha = 0.5) +
geom_jitter(width = 0.2, alpha = 0.5) +
theme_minimal() +
labs(
title = "Production Output by Line",
y = "Units per Hour",
x = "Production Line"
)
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Set random seed for reproducibility
np.random.seed(42)
# Generate sample data
# Production Line 1 (known σ₁ = 4.0)
line1_data = np.random.normal(95.2, 40, 50)
# Production Line 2 (known σ₂ = 3.8)
line2_data = np.random.normal(93.8, 3.8, 45)
# Calculate sample means
sample_mean1 = np.mean(line1_data)
sample_mean2 = np.mean(line2_data)
# Calculate z-statistic
z_numerator = (sample_mean1 - sample_mean2)
z_denominator = np.sqrt((4.0**2/50) + (3.8**2/45))
z_stat = z_numerator / z_denominator
print(f"Z-statistic: {z_stat:.2f}")
# Calculate p-value (two-sided test)
p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
print(f"P-value: {p_value:.4f}")
# Calculate 95% Confidence Interval
alpha = 0.05
z_critical = stats.norm.ppf(1 - alpha/2)
margin_of_error = z_critical * z_denominator
ci_lower = z_numerator - margin_of_error
ci_upper = z_numerator + margin_of_error
print(f"95% Confidence Interval for mean difference: ({ci_lower:.2f}, {ci_upper:.2f})")
# Calculate effect size (Cohen's d)
pooled_sd = np.sqrt((4.0**2 + 3.8**2) / 2)
cohens_d = abs(sample_mean1 - sample_mean2) / pooled_sd
print(f"Cohen's d: {cohens_d:.2f}")
# Create DataFrame for plotting
df = pd.DataFrame({
'Production Line': ['Line 1']*50 + ['Line 2']*45,
'Units': np.concatenate([line1_data, line2_data])
})
# Create visualization
plt.figure(figsize=(12, 5))
# Subplot 1: Boxplot
plt.subplot(1, 2, 1)
sns.boxplot(data=df, x='Production Line', y='Units')
plt.title('Units per Hour by Production Line')
# Subplot 2: Distribution
plt.subplot(1, 2, 2)
sns.histplot(data=df, x='Units', hue='Production Line',
element="step", stat="density")
plt.title('Distribution of Units per Hour')
plt.tight_layout()
plt.show()
Verification
Related Calculators
Help us improve
Found an error or have a suggestion? Let us know!