Mean, Median, Mode

Created:August 24, 2024

This calculator helps you analyze the central tendency of your data distribution. It calculates the arithmetic mean (average), median (middle value), and mode (most frequent value), helping you understand the typical values and patterns in your dataset. It also provides a histogram with the mean, median, and mode marked for easy visualization if a numeric column is selected. A bar chart is also provided to visualize the frequency distribution of a categorical column.

Quick Calculator

Need a quick calculation? Enter your numbers below:

Enter numbers (separated by commas or spaces):

Calculator

1. Load Your Data

Note: Column names will be converted to snake_case (e.g., "Product ID" → "product_id") for processing.

2. Select Columns & Options

Select Numeric Column:

Exclude Outliers

Related Calculators

Range, Variance, Standard Deviation Calculator

Skewness Calculator

Histogram

Bar Chart

Learn More

Mean (Arithmetic Mean)

Definition

The arithmetic average of all values, representing the central tendency of the data.

Formula

\text{Mean} = \frac{1}{n} \sum_{i = 1}^{n} x_i = \frac{x_1 + x_2 + ... + x_n}{n}

Example

For the numbers: $2, 4, 6, 8$

\text{Mean} = \frac{2 + 4 + 6 + 8}{4} = \frac{20}{4} = 5

Key Points

Sensitive to extreme values (outliers)
Best used when data is symmetrically distributed

Comparison of Different Means

Try Geometric Mean Try Harmonic Mean

Important Relationship:

For any set of positive numbers:

\text{Harmonic Mean} \leq \text{Geometric Mean} \leq \text{Arithmetic Mean}

(Equality occurs only when all numbers are the same)

Type	Formula	Example
Arithmetic Mean	$\frac{1}{n}\sum_{i=1}^n x_i = \frac{x_1 + x_2 + ... + x_n}{n}$	$\frac{2 + 4 + 6 + 8}{4} = 5$
Geometric Mean	$\sqrt[n]{\prod_{i=1}^n x_i} = \sqrt[n]{x_1 \times x_2 \times ... \times x_n}$	$\sqrt[4]{2 \times 4 \times 6 \times 8} \approx 4.4$
Harmonic Mean	$\frac{n}{\sum_{i=1}^n \frac{1}{x_i}} = \frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + ... + \frac{1}{x_n}}$	$\frac{4}{\frac{1}{2} + \frac{1}{4} + \frac{1}{6} + \frac{1}{8}} \approx 3.8$

Arithmetic Mean(Current)

Best For:

Simple averages of quantities
Calculating central tendency
Equal weight to all values
Linear relationships

Common Uses:

Test scores
Heights/weights
Daily temperatures
Income levels

Limitations:

Sensitive to outliers
Not ideal for ratios or rates

Geometric Mean

Best For:

Growth rates
Percentage changes
Ratios and proportions
Exponential relationships

Common Uses:

Investment returns
Population growth
Interest rates
Price indices

Limitations:

Only works with positive numbers
More complex calculation

Harmonic Mean

Best For:

Rates and speeds
Per-unit quantities
Reciprocal relationships
Density measurements

Common Uses:

Average speed over different distances
Price per unit calculations
Density measurements
Circuit calculations (parallel resistors)

Limitations:

Only works with positive numbers
Gives more weight to smaller values

When to Use Each Mean

Use Arithmetic Mean when you need a simple average and all values should have equal weight

Use Geometric Mean when dealing with growth rates, returns, or multiplicative changes

Use Harmonic Mean when working with rates, speeds, or other measures where using reciprocals makes sense

Median

Definition

The middle value in a sorted set of numbers. For an even number of values, it's the average of the two middle numbers.

Formula

\text{Median} = \begin{cases} x_{(n+1)/2} & \text{if n is odd} \\ \frac{x_{n/2} + x_{n/2 + 1}}{2} & \text{if n is even} \end{cases}

Example

For odd number of values: $1, 3, 5, 7, 9$

\text{Median} = x_{(n+1)/2} = x_{(5+1)/2} = x_3 = 5

For even number of values: $1, 3, 5, 7$

\text{Median} = \frac{x_{n/2} + x_{n/2 + 1}}{2} = \frac{x_{4/2} + x_{4/2 + 1}}{2} = \frac{x_2 + x_3}{2} = \frac{3 + 5}{2} = 4

Key Points

Not affected by extreme values (outliers)
Best used when data is skewed or contains outliers

Mode

Definition

The most frequently occurring value(s) in a set of numbers. A dataset can have no mode, one mode, or multiple modes.

Formula

\text{Mode} = \text{value(s) with the highest frequency}

Example

For the numbers with multiple modes: $1, 2, 2, 3, 4, 4, 5$

\text{Mode} = 2, 4

Because 2 and 4 each appear twice, more than any other number

For numbers $1, 2, 3, 4$ , there is no mode because all numbers appear exactly once.

Key Points

Can have multiple modes (bimodal, trimodal, etc.) or no mode
Particularly useful for categorical data and discrete numerical data

Mean vs Median: When to Use Which?

Mean (Average)

Best for:Symmetric, normally distributed data

Sensitive to:Outliers and extreme values

Uses:All data points in calculation

Median (Middle Value)

Best for:Skewed data or data with outliers

Resistant to:Outliers and extreme values

Uses:Only middle value(s) position

Example: Impact of Outliers

Normal data: Annual salaries in thousands: $40, 42, 45, 48, 50$

Mean:\rac{40+42+45+48+50}{5} = 45

Median:

45

(middle value)

With outlier: Adding CEO salary: $40, 42, 45, 48, 500$

Mean:\rac{40+42+45+48+500}{5} = 135↑ Dramatically increased!

Median:

45

(still middle value)→ Unchanged

Decision Guide

Use Mean when:

• Data is approximately normally distributed
• No significant outliers present
• You want to include all data points in analysis

Use Median when:

• Data is skewed (left or right)
• Outliers are present
• You want a robust measure of central tendency

Quick Reference

Symmetric Data

Mean ≈ Median

Right Skewed

Mean > Median

Left Skewed

Mean < Median

Calculating Mean, Median, and Mode from a Bar Chart

Mean Calculation

1. Multiply each value by its frequency:

2 × 1 = 2
4 × 2 = 8
6 × 3 = 18
8 × 2 = 16
10 × 1 = 10

2. Sum all products: 2 + 8 + 18 + 16 + 10 = 54

3. Sum all frequencies: 1 + 2 + 3 + 2 + 1 = 9

4. Divide: 54 ÷ 9 = 6

Mean = 6

Median Calculation

1. List all values (including duplicates):

2, 4, 4, 6, 6, 6, 8, 8, 10

2. Find middle position: (n+1) ÷ 2 = (9+1) ÷ 2 = 5th

3. Find 5th value in ordered list

Median = 6

Mode Calculation

1. Count frequency of each value:

2: occurs 1 time
4: occurs 2 times
6: occurs 3 times
8: occurs 2 times
10: occurs 1 time

2. Find highest frequency (3)

3. Identify value(s) with that frequency

Mode = 6

Final Results:

Mean = 6.00

Median = 6

Mode = 6

Calculating Mean, Median, and Mode in R

Here is an example of how to calculate the mean, median, and mode of a dataset in R using the tidyverse package.

library(tidyverse)

tips <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

central_measures <- tips |>
  summarise(
    mean_tip = mean(tip), # 2.9983
    median_tip = median(tip) # 2.9
  )

day_counts <- table(tips$day)
mode_days <- names(day_counts[day_counts == max(day_counts)]) # Sat

# histogram of tip with mean, median vertical lines
ggplot(tips, aes(x = tip)) +
  geom_histogram(binwidth = 0.5, fill = "darkblue", alpha = 0.7) +
  geom_vline(aes(xintercept = central_measures$mean_tip, color = "Mean"), 
             linetype = "dashed", linewidth = 1) +
  geom_vline(aes(xintercept = central_measures$median_tip, color = "Median"), 
             linetype = "dotted", linewidth = 1) +
  scale_color_manual(name = "Statistics", 
                     values = c("Mean" = "red", "Median" = "green")) +
  labs(title = "Distribution of Tip Amounts",
       x = "Tip Amount ($)",
       y = "Frequency") +
  theme_minimal() +
  theme(legend.position = "bottom")

# bar chart of day frequencies
day_freq <- tips |>
  count(day) |>
  arrange(desc(n))

# bar chart with calculated frequencies
ggplot(day_freq, aes(x = reorder(day, -n), y = n)) +
  geom_bar(stat = "identity", fill = "steelblue", width = 0.6) +
  geom_text(aes(label = n), vjust = -0.5, size = 3.5) +
  labs(title = "Frequency of Restaurant Visits by Day",
       x = "Day of Week",
       y = "Number of Visits") +
  theme_minimal()

Calculating Mean, Median, and Mode in Python

Here is an example of how to calculate the mean, median, and mode of a dataset in Python using the pandas, numpy, and scipy libraries.

Python

import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Load the tips dataset
tips = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Calculate central measures for tip amounts
central_measures = {
    'mean_tip': tips['tip'].mean(),      # 2.9983
    'median_tip': tips['tip'].median(),  # 2.9
}

# Calculate mode for categorical data (day)
mode_day = tips['day'].mode()[0]  # Sat

print(f"Mean tip: {central_measures['mean_tip']:.2f}")
print(f"Median tip: {central_measures['median_tip']:.2f}")
print(f"Most common day: {mode_day}")

# Create histogram with mean and median lines
plt.figure(figsize=(10, 6))
plt.hist(tips['tip'], bins=20, alpha=0.7, color='darkblue', edgecolor='black')
plt.axvline(central_measures['mean_tip'], color='red', linestyle='--', 
           linewidth=2, label=f"Mean: {central_measures['mean_tip']:.2f}")
plt.axvline(central_measures['median_tip'], color='green', linestyle=':', 
           linewidth=2, label=f"Median: {central_measures['median_tip']:.2f}")
plt.xlabel('Tip Amount ($)')
plt.ylabel('Frequency')
plt.title('Distribution of Tip Amounts')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Bar chart of day frequencies
day_counts = tips['day'].value_counts().sort_values(ascending=False)

plt.figure(figsize=(8, 6))
bars = plt.bar(day_counts.index, day_counts.values, color='steelblue', width=0.6)
plt.xlabel('Day of Week')
plt.ylabel('Number of Visits')
plt.title('Frequency of Restaurant Visits by Day')

# Add value labels on bars
for bar, value in zip(bars, day_counts.values):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
             str(value), ha='center', va='bottom')

plt.show()

# Summary statistics
print("\nSummary Statistics:")
print(tips['tip'].describe())

Calculating Mean, Median, and Mode in SQL

Here is an example of how to calculate the mean, median, and mode of a dataset in SQL using aggregate functions and window functions.

Csv

employee_id,department,salary
1,Sales,45000
2,Sales,52000
3,Engineering,75000
4,Marketing,48000
5,Engineering,72000
6,Sales,45000
7,Marketing,51000
8,Engineering,78000
9,Sales,49000
10,Marketing,48000

SQL

-- Create and populate the employee table
CREATE TABLE employees (
    employee_id INT,
    department VARCHAR(50),
    salary INT
);

INSERT INTO employees VALUES
(1, 'Sales', 45000),
(2, 'Sales', 52000),
(3, 'Engineering', 75000),
(4, 'Marketing', 48000),
(5, 'Engineering', 72000),
(6, 'Sales', 45000),
(7, 'Marketing', 51000),
(8, 'Engineering', 78000),
(9, 'Sales', 49000),
(10, 'Marketing', 48000);

-- Calculate mean, median, and mode for salaries
WITH salary_statistics AS (
  SELECT 
    -- Mean calculation
    AVG(salary) as mean_salary,
    
    -- Median calculation (PostgreSQL/SQL Server)
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary,
    
    -- Count total records
    COUNT(*) as total_employees
  FROM employees
),

-- Mode calculation for categorical data (most frequent department)
dept_mode AS (
  SELECT 
    department as mode_department,
    COUNT(*) as dept_frequency
  FROM employees
  GROUP BY department
  ORDER BY COUNT(*) DESC
  LIMIT 1
),

-- Mode calculation for numerical data (most frequent salary)
salary_mode AS (
  SELECT 
    salary as mode_salary,
    COUNT(*) as salary_frequency
  FROM employees
  GROUP BY salary
  ORDER BY COUNT(*) DESC
  LIMIT 1
)

-- Combine all statistics
SELECT 
  ROUND(ss.mean_salary, 2) as mean_salary,      -- 54300.00
  ss.median_salary,                             -- 50500.00
  sm.mode_salary,                               -- 45000 (appears twice)
  dm.mode_department,                           -- Sales (appears 4 times)
  ss.total_employees                            -- 10
FROM salary_statistics ss
CROSS JOIN dept_mode dm
CROSS JOIN salary_mode sm;

-- Alternative median calculation (works in most SQL databases)
WITH ordered_salaries AS (
  SELECT 
    salary,
    ROW_NUMBER() OVER (ORDER BY salary) as row_num,
    COUNT(*) OVER () as total_count
  FROM employees
)
SELECT AVG(salary) as median_salary
FROM ordered_salaries
WHERE row_num IN (
  FLOOR((total_count + 1) / 2.0),
  CEIL((total_count + 1) / 2.0)
);

-- Department analysis
SELECT 
  department,
  COUNT(*) as employee_count,
  ROUND(AVG(salary), 0) as avg_salary,
  MIN(salary) as min_salary,
  MAX(salary) as max_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

-- Results:
-- Engineering: 3 employees, avg $75,000, min $72,000, max $78,000
-- Sales: 4 employees, avg $47,750, min $45,000, max $52,000  
-- Marketing: 3 employees, avg $49,000, min $48,000, max $51,000

Calculating Mean, Median, and Mode in Excel

Here is an example of how to calculate the mean, median, and mode of a dataset in Excel using built-in functions with product sales data.

Csv

Product,Category,Units_Sold
Widget A,Electronics,15
Widget B,Electronics,22
Gadget X,Home,18
Tool Pro,Tools,15
Smart TV,Electronics,12
Lamp Z,Home,25
Hammer,Tools,20
Phone Y,Electronics,18
Chair,Home,15
Drill,Tools,22

Excel

// Excel worksheet layout:
//    A         B           C
// 1  Product   Category    Units_Sold
// 2  Widget A  Electronics 15
// 3  Widget B  Electronics 22
// 4  Gadget X  Home        18
// 5  Tool Pro  Tools       15
// 6  Smart TV  Electronics 12
// 7  Lamp Z    Home        25
// 8  Hammer    Tools       20
// 9  Phone Y   Electronics 18
// 10 Chair     Home        15
// 11 Drill     Tools       22

// Basic statistics for Units_Sold column (column C)
=AVERAGE(C2:C11)    // Mean: 18.2
=MEDIAN(C2:C11)     // Median: 18
=MODE.SNGL(C2:C11)  // Mode: 15 (appears 3 times)

// Category frequency analysis in columns E and F:
E1: "Category"      F1: "Count"
E2: "Electronics"   F2: =COUNTIF(B2:B11,"Electronics")  // 4
E3: "Home"          F3: =COUNTIF(B2:B11,"Home")         // 3  
E4: "Tools"         F4: =COUNTIF(B2:B11,"Tools")        // 3

// Find mode category:
=INDEX(E2:E4, MATCH(MAX(F2:F4), F2:F4, 0))  // Electronics

// Summary statistics table in columns H and I:
H1: "Statistic"           I1: "Value"
H2: "Mean Units"          I2: =ROUND(AVERAGE(C2:C11), 1)      // 18.2
H3: "Median Units"        I3: =MEDIAN(C2:C11)                 // 18
H4: "Mode Units"          I4: =MODE.SNGL(C2:C11)              // 15
H5: "Mode Category"       I5: =INDEX(E2:E4,MATCH(MAX(F2:F4),F2:F4,0))  // Electronics
H6: "Total Products"      I6: =COUNT(C2:C11)                  // 10
H7: "Min Units"           I7: =MIN(C2:C11)                    // 12
H8: "Max Units"           I8: =MAX(C2:C11)                    // 25
H9: "Standard Deviation"  I9: =ROUND(STDEV.S(C2:C11), 1)     // 4.3

// Category analysis in columns K, L, M:
K1: "Category"      L1: "Avg Units"    M1: "Total Units"
K2: "Electronics"   L2: =AVERAGEIF(B2:B11,"Electronics",C2:C11)  // 16.8
K3: "Home"          L3: =AVERAGEIF(B2:B11,"Home",C2:C11)         // 19.3
K4: "Tools"         L4: =AVERAGEIF(B2:B11,"Tools",C2:C11)        // 19.0

M2: =SUMIF(B2:B11,"Electronics",C2:C11)  // 67
M3: =SUMIF(B2:B11,"Home",C2:C11)         // 58  
M4: =SUMIF(B2:B11,"Tools",C2:C11)        // 57

// Performance classification:
// In column D, add performance labels
D1: "Performance"
D2: =IF(C2>=20,"High",IF(C2>=15,"Medium","Low"))
// Copy formula down to D11

// Count performance levels:
O1: "Performance"   P1: "Count"
O2: "High"          P2: =COUNTIF(D2:D11,"High")    // 4 products
O3: "Medium"        P3: =COUNTIF(D2:D11,"Medium")  // 3 products  
O4: "Low"           P4: =COUNTIF(D2:D11,"Low")     // 3 products

// Conditional formatting for units sold:
// Select C2:C11, then Home > Conditional Formatting > Color Scales
// Or create custom rules:
// Green: =C2>=20    (High performers)
// Yellow: =AND(C2>=15,C2<20)  (Medium performers)  
// Red: =C2<15       (Low performers)

// Chart creation:
// 1. Select category data (E2:F4) for category distribution
// 2. Insert > Pie Chart or Column Chart
// 3. Title: "Product Distribution by Category"

// For units sold distribution:
// 1. Select performance data (O2:P4)  
// 2. Insert > Column Chart
// 3. Title: "Products by Performance Level"

// Advanced formulas:
// Multiple modes (if they exist):
=MODE.MULT(C2:C11)

// Products above average:
=COUNTIF(C2:C11,">"&AVERAGE(C2:C11))  // 4 products

// Quartile analysis:
=QUARTILE(C2:C11,1)  // Q1: 15
=QUARTILE(C2:C11,3)  // Q3: 22

// Identify outliers (values beyond 1.5 * IQR):
// IQR = Q3 - Q1 = 22 - 15 = 7
// Lower bound: Q1 - 1.5*IQR = 15 - 10.5 = 4.5
// Upper bound: Q3 + 1.5*IQR = 22 + 10.5 = 32.5
// No outliers in this dataset