StatsCalculators.com

Hypergeometric Distribution

Created:October 20, 2024
Last Updated:May 2, 2025

This calculator helps you compute the probabilities of a hypergeometric distribution given the parameters N (population size), K (success states), and n (number of draws). You can find the probability of values being equal to, less than, greater than, or between specific values. The distribution chart shows the probability mass function (PMF) of the hypergeometric distribution, which models sampling without replacement from a finite population.

Calculator

Parameters

Interactive Distribution Chart

Click Calculate to view the distribution chart

Related Calculators

Learn More

Hypergeometric Distribution: Definition, Formula, and Applications

Hypergeometric Distribution

Definition:The hypergeometric distribution describes the probability of obtaining exactly kk successes in nn draws without replacement from a finite population of size NN that contains exactly KK successes.

Formula:P(X=k)=(Kk)(NKnk)(Nn)P(X = k) = \frac{\binom{K}{k}\binom{N-K}{n-k}}{\binom{N}{n}}

Where:

  • NN is the population size
  • KK is the number of success states in the population
  • nn is the number of draws
  • kk is the number of successes in the sample
  • (nk)\binom{n}{k} represents the binomial coefficient
Example: Imagine a box containing 20 marbles, of which 8 are blue and 12 are red. If you draw 5 marbles without replacement, what is the probability of getting exactly 3 blue marbles?

In this case:

  • N=20N = 20 (total marbles)
  • K=8K = 8 (blue marbles - success states)
  • n=5n = 5 (draws)
  • k=3k = 3 (desired blue marbles)

Plugging these values into the formula:

P(X=3)=(83)(122)(205)=5666155040.238P(X = 3) = \frac{\binom{8}{3}\binom{12}{2}}{\binom{20}{5}} = \frac{56 \cdot 66}{15504} \approx 0.238

Therefore, the probability of drawing exactly 3 blue marbles is about 23.8%.

Properties

  • Mean: E(X)=nKNE(X) = n\frac{K}{N}
  • Variance: Var(X)=nKNNKNNnN1\text{Var}(X) = n\frac{K}{N}\frac{N-K}{N}\frac{N-n}{N-1}
  • Support: max(0,n(NK))kmin(n,K)\text{max}(0, n-(N-K)) \leq k \leq \text{min}(n,K)

How to Calculate Hypergeometric Distribution in R

R
library(tidyverse)

# Parameters
N <- 100  # population size
K <- 20   # success states in population
n <- 10   # number of draws
x1 <- 2   # lower comparison point
x2 <- 5   # upper comparison point

# calculate P(X = x1)
p_exact <- dhyper(x1, K, N-K, n)
print(str_glue("P(X = {x1}) = {round(p_exact, 4)}")) # P(X = 2) = 0.3182

# calculate P(X <= x1)
p_cumulative <- phyper(x1, K, N-K, n)
print(str_glue("P(X ≤ {x1}) = {round(p_cumulative, 4)}")) # P(X ≤ 2) = 0.6812

# calculate P(x1 <= X <= x2)
p_between <- phyper(x2, K, N-K, n) - phyper(x1-1, K, N-K, n)
print(str_glue("P({x1} ≤ X ≤ {x2}) = {round(p_between, 4)}")) # P(2 ≤ X ≤ 5) = 0.633

# mean and variance
mean <- n * (K/N)
variance <- n * (K/N) * ((N-K)/N) * ((N-n)/(N-1))
print(str_glue("Mean: {round(mean, 4)}")) # Mean: 2
print(str_glue("Variance: {round(variance, 4)}")) # Variance: 1.4545

# plot the PMF
x_values <- 0:min(n, K)
pmf <- dhyper(x_values, K, N-K, n)
pmf_data <- tibble(x = x_values, probability = pmf)

ggplot(pmf_data, aes(x = x, y = probability)) +
  geom_col(fill = "blue", alpha = 0.7) +
  geom_vline(xintercept = c(x1, x2), linetype = "dashed", color = "red") +
  geom_rect(data = subset(pmf_data, x >= x1 & x <= x2),
            aes(xmin = x - 0.4, xmax = x + 0.4, ymin = 0, ymax = probability),
            fill = "red", alpha = 0.3) +
  labs(title = str_glue("Hypergeometric Distribution PMF (N={N}, K={K}, n={n})"),
       subtitle = str_glue("P({x1} ≤ X ≤ {x2}) = {round(p_between, 4)}"),
       x = "Number of Successes",
       y = "Probability") +
  theme_minimal()

How to Calculate Hypergeometric Distribution in Python

Python
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
import pandas as pd

# Parameters
N = 100  # population size
K = 20   # success states in population
n = 10   # number of draws
x1 = 2   # lower comparison point
x2 = 5   # upper comparison point

# Calculate P(X = x1)
p_exact = stats.hypergeom.pmf(x1, N, K, n)
print(f"P(X = {x1}) = {p_exact:.4f}")

# Calculate P(X <= x1)
p_cumulative = stats.hypergeom.cdf(x1, N, K, n)
print(f"P(X ≤ {x1}) = {p_cumulative:.4f}")

# Calculate P(x1 <= X <= x2)
p_between = stats.hypergeom.cdf(x2, N, K, n) - stats.hypergeom.cdf(x1-1, N, K, n)
print(f"P({x1} ≤ X ≤ {x2}) = {p_between:.4f}")

# Calculate mean and variance
mean = n * (K/N)
variance = n * (K/N) * ((N-K)/N) * ((N-n)/(N-1))
print(f"Mean: {mean:.4f}")
print(f"Variance: {variance:.4f}")

# Create PMF plot
x_values = np.arange(0, min(n, K) + 1)
pmf = stats.hypergeom.pmf(x_values, N, K, n)

plt.figure(figsize=(10, 6))
plt.bar(x_values, pmf, alpha=0.7, color='blue')

# Highlight the area between x1 and x2
highlight_x = x_values[(x_values >= x1) & (x_values <= x2)]
highlight_pmf = pmf[(x_values >= x1) & (x_values <= x2)]
plt.bar(highlight_x, highlight_pmf, alpha=0.5, color='red')

# Add vertical lines at x1 and x2
plt.axvline(x=x1, color='red', linestyle='--', alpha=0.7)
plt.axvline(x=x2, color='red', linestyle='--', alpha=0.7)

# Add labels and title
plt.title(f'Hypergeometric Distribution PMF (N={N}, K={K}, n={n})')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.grid(True, alpha=0.3)
plt.xticks(x_values)
plt.tight_layout()
plt.show()