This calculator helps you analyze the central tendency of your data distribution. It calculates the arithmetic mean (average), median (middle value), and mode (most frequent value), helping you understand the typical values and patterns in your dataset. It also provides a histogram with the mean, median, and mode marked for easy visualization if a numeric column is selected. A bar chart is also provided to visualize the frequency distribution of a categorical column.
Quick Calculator
Need a quick calculation? Enter your numbers below:
Calculator
1. Load Your Data
2. Select Columns & Options
Related Calculators
Learn More
Mean (Arithmetic Mean)
Definition
The arithmetic average of all values, representing the central tendency of the data.
Formula
Example
For the numbers:
Key Points
- Sensitive to extreme values (outliers)
- Best used when data is symmetrically distributed
Comparison of Different Means
Important Relationship:
For any set of positive numbers:
(Equality occurs only when all numbers are the same)
Type | Formula | Example |
---|---|---|
Arithmetic Mean | ||
Geometric Mean | ||
Harmonic Mean |
Arithmetic Mean(Current)
Best For:
- Simple averages of quantities
- Calculating central tendency
- Equal weight to all values
- Linear relationships
Common Uses:
- Test scores
- Heights/weights
- Daily temperatures
- Income levels
Limitations:
- Sensitive to outliers
- Not ideal for ratios or rates
Geometric Mean
Best For:
- Growth rates
- Percentage changes
- Ratios and proportions
- Exponential relationships
Common Uses:
- Investment returns
- Population growth
- Interest rates
- Price indices
Limitations:
- Only works with positive numbers
- More complex calculation
Harmonic Mean
Best For:
- Rates and speeds
- Per-unit quantities
- Reciprocal relationships
- Density measurements
Common Uses:
- Average speed over different distances
- Price per unit calculations
- Density measurements
- Circuit calculations (parallel resistors)
Limitations:
- Only works with positive numbers
- Gives more weight to smaller values
When to Use Each Mean
Use Arithmetic Mean when you need a simple average and all values should have equal weight
Use Geometric Mean when dealing with growth rates, returns, or multiplicative changes
Use Harmonic Mean when working with rates, speeds, or other measures where using reciprocals makes sense
Median
Definition
The middle value in a sorted set of numbers. For an even number of values, it's the average of the two middle numbers.
Formula
Example
For odd number of values:
For even number of values:
Key Points
- Not affected by extreme values (outliers)
- Best used when data is skewed or contains outliers
Mode
Definition
The most frequently occurring value(s) in a set of numbers. A dataset can have no mode, one mode, or multiple modes.
Formula
Example
For the numbers with multiple modes:
Because 2 and 4 each appear twice, more than any other number
For numbers , there is no mode because all numbers appear exactly once.
Key Points
- Can have multiple modes (bimodal, trimodal, etc.) or no mode
- Particularly useful for categorical data and discrete numerical data
Mean vs Median: When to Use Which?
Mean (Average)
Median (Middle Value)
Example: Impact of Outliers
Normal data: Annual salaries in thousands:
With outlier: Adding CEO salary:
Decision Guide
- • Data is approximately normally distributed
- • No significant outliers present
- • You want to include all data points in analysis
- • Data is skewed (left or right)
- • Outliers are present
- • You want a robust measure of central tendency
Quick Reference
Calculating Mean, Median, and Mode from a Bar Chart
Mean Calculation
1. Multiply each value by its frequency:
4 × 2 = 8
6 × 3 = 18
8 × 2 = 16
10 × 1 = 10
2. Sum all products: 2 + 8 + 18 + 16 + 10 = 54
3. Sum all frequencies: 1 + 2 + 3 + 2 + 1 = 9
4. Divide: 54 ÷ 9 = 6
Mean = 6
Median Calculation
1. List all values (including duplicates):
2. Find middle position: (n+1) ÷ 2 = (9+1) ÷ 2 = 5th
3. Find 5th value in ordered list
Median = 6
Mode Calculation
1. Count frequency of each value:
4: occurs 2 times
6: occurs 3 times
8: occurs 2 times
10: occurs 1 time
2. Find highest frequency (3)
3. Identify value(s) with that frequency
Mode = 6
Final Results:
Calculating Mean, Median, and Mode in R
Here is an example of how to calculate the mean, median, and mode of a dataset in R using the tidyverse package.
library(tidyverse)
tips <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
central_measures <- tips |>
summarise(
mean_tip = mean(tip), # 2.9983
median_tip = median(tip) # 2.9
)
day_counts <- table(tips$day)
mode_days <- names(day_counts[day_counts == max(day_counts)]) # Sat
# histogram of tip with mean, median vertical lines
ggplot(tips, aes(x = tip)) +
geom_histogram(binwidth = 0.5, fill = "darkblue", alpha = 0.7) +
geom_vline(aes(xintercept = central_measures$mean_tip, color = "Mean"),
linetype = "dashed", linewidth = 1) +
geom_vline(aes(xintercept = central_measures$median_tip, color = "Median"),
linetype = "dotted", linewidth = 1) +
scale_color_manual(name = "Statistics",
values = c("Mean" = "red", "Median" = "green")) +
labs(title = "Distribution of Tip Amounts",
x = "Tip Amount ($)",
y = "Frequency") +
theme_minimal() +
theme(legend.position = "bottom")
# bar chart of day frequencies
day_freq <- tips |>
count(day) |>
arrange(desc(n))
# bar chart with calculated frequencies
ggplot(day_freq, aes(x = reorder(day, -n), y = n)) +
geom_bar(stat = "identity", fill = "steelblue", width = 0.6) +
geom_text(aes(label = n), vjust = -0.5, size = 3.5) +
labs(title = "Frequency of Restaurant Visits by Day",
x = "Day of Week",
y = "Number of Visits") +
theme_minimal()
Calculating Mean, Median, and Mode in Python
Here is an example of how to calculate the mean, median, and mode of a dataset in Python using the pandas, numpy, and scipy libraries.
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Load the tips dataset
tips = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")
# Calculate central measures for tip amounts
central_measures = {
'mean_tip': tips['tip'].mean(), # 2.9983
'median_tip': tips['tip'].median(), # 2.9
}
# Calculate mode for categorical data (day)
mode_day = tips['day'].mode()[0] # Sat
print(f"Mean tip: {central_measures['mean_tip']:.2f}")
print(f"Median tip: {central_measures['median_tip']:.2f}")
print(f"Most common day: {mode_day}")
# Create histogram with mean and median lines
plt.figure(figsize=(10, 6))
plt.hist(tips['tip'], bins=20, alpha=0.7, color='darkblue', edgecolor='black')
plt.axvline(central_measures['mean_tip'], color='red', linestyle='--',
linewidth=2, label=f"Mean: {central_measures['mean_tip']:.2f}")
plt.axvline(central_measures['median_tip'], color='green', linestyle=':',
linewidth=2, label=f"Median: {central_measures['median_tip']:.2f}")
plt.xlabel('Tip Amount ($)')
plt.ylabel('Frequency')
plt.title('Distribution of Tip Amounts')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
# Bar chart of day frequencies
day_counts = tips['day'].value_counts().sort_values(ascending=False)
plt.figure(figsize=(8, 6))
bars = plt.bar(day_counts.index, day_counts.values, color='steelblue', width=0.6)
plt.xlabel('Day of Week')
plt.ylabel('Number of Visits')
plt.title('Frequency of Restaurant Visits by Day')
# Add value labels on bars
for bar, value in zip(bars, day_counts.values):
plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
str(value), ha='center', va='bottom')
plt.show()
# Summary statistics
print("\nSummary Statistics:")
print(tips['tip'].describe())
Calculating Mean, Median, and Mode in SQL
Here is an example of how to calculate the mean, median, and mode of a dataset in SQL using aggregate functions and window functions.
employee_id,department,salary
1,Sales,45000
2,Sales,52000
3,Engineering,75000
4,Marketing,48000
5,Engineering,72000
6,Sales,45000
7,Marketing,51000
8,Engineering,78000
9,Sales,49000
10,Marketing,48000
-- Create and populate the employee table
CREATE TABLE employees (
employee_id INT,
department VARCHAR(50),
salary INT
);
INSERT INTO employees VALUES
(1, 'Sales', 45000),
(2, 'Sales', 52000),
(3, 'Engineering', 75000),
(4, 'Marketing', 48000),
(5, 'Engineering', 72000),
(6, 'Sales', 45000),
(7, 'Marketing', 51000),
(8, 'Engineering', 78000),
(9, 'Sales', 49000),
(10, 'Marketing', 48000);
-- Calculate mean, median, and mode for salaries
WITH salary_statistics AS (
SELECT
-- Mean calculation
AVG(salary) as mean_salary,
-- Median calculation (PostgreSQL/SQL Server)
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary,
-- Count total records
COUNT(*) as total_employees
FROM employees
),
-- Mode calculation for categorical data (most frequent department)
dept_mode AS (
SELECT
department as mode_department,
COUNT(*) as dept_frequency
FROM employees
GROUP BY department
ORDER BY COUNT(*) DESC
LIMIT 1
),
-- Mode calculation for numerical data (most frequent salary)
salary_mode AS (
SELECT
salary as mode_salary,
COUNT(*) as salary_frequency
FROM employees
GROUP BY salary
ORDER BY COUNT(*) DESC
LIMIT 1
)
-- Combine all statistics
SELECT
ROUND(ss.mean_salary, 2) as mean_salary, -- 54300.00
ss.median_salary, -- 50500.00
sm.mode_salary, -- 45000 (appears twice)
dm.mode_department, -- Sales (appears 4 times)
ss.total_employees -- 10
FROM salary_statistics ss
CROSS JOIN dept_mode dm
CROSS JOIN salary_mode sm;
-- Alternative median calculation (works in most SQL databases)
WITH ordered_salaries AS (
SELECT
salary,
ROW_NUMBER() OVER (ORDER BY salary) as row_num,
COUNT(*) OVER () as total_count
FROM employees
)
SELECT AVG(salary) as median_salary
FROM ordered_salaries
WHERE row_num IN (
FLOOR((total_count + 1) / 2.0),
CEIL((total_count + 1) / 2.0)
);
-- Department analysis
SELECT
department,
COUNT(*) as employee_count,
ROUND(AVG(salary), 0) as avg_salary,
MIN(salary) as min_salary,
MAX(salary) as max_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;
-- Results:
-- Engineering: 3 employees, avg $75,000, min $72,000, max $78,000
-- Sales: 4 employees, avg $47,750, min $45,000, max $52,000
-- Marketing: 3 employees, avg $49,000, min $48,000, max $51,000
Calculating Mean, Median, and Mode in Excel
Here is an example of how to calculate the mean, median, and mode of a dataset in Excel using built-in functions with product sales data.
Product,Category,Units_Sold
Widget A,Electronics,15
Widget B,Electronics,22
Gadget X,Home,18
Tool Pro,Tools,15
Smart TV,Electronics,12
Lamp Z,Home,25
Hammer,Tools,20
Phone Y,Electronics,18
Chair,Home,15
Drill,Tools,22
// Excel worksheet layout:
// A B C
// 1 Product Category Units_Sold
// 2 Widget A Electronics 15
// 3 Widget B Electronics 22
// 4 Gadget X Home 18
// 5 Tool Pro Tools 15
// 6 Smart TV Electronics 12
// 7 Lamp Z Home 25
// 8 Hammer Tools 20
// 9 Phone Y Electronics 18
// 10 Chair Home 15
// 11 Drill Tools 22
// Basic statistics for Units_Sold column (column C)
=AVERAGE(C2:C11) // Mean: 18.2
=MEDIAN(C2:C11) // Median: 18
=MODE.SNGL(C2:C11) // Mode: 15 (appears 3 times)
// Category frequency analysis in columns E and F:
E1: "Category" F1: "Count"
E2: "Electronics" F2: =COUNTIF(B2:B11,"Electronics") // 4
E3: "Home" F3: =COUNTIF(B2:B11,"Home") // 3
E4: "Tools" F4: =COUNTIF(B2:B11,"Tools") // 3
// Find mode category:
=INDEX(E2:E4, MATCH(MAX(F2:F4), F2:F4, 0)) // Electronics
// Summary statistics table in columns H and I:
H1: "Statistic" I1: "Value"
H2: "Mean Units" I2: =ROUND(AVERAGE(C2:C11), 1) // 18.2
H3: "Median Units" I3: =MEDIAN(C2:C11) // 18
H4: "Mode Units" I4: =MODE.SNGL(C2:C11) // 15
H5: "Mode Category" I5: =INDEX(E2:E4,MATCH(MAX(F2:F4),F2:F4,0)) // Electronics
H6: "Total Products" I6: =COUNT(C2:C11) // 10
H7: "Min Units" I7: =MIN(C2:C11) // 12
H8: "Max Units" I8: =MAX(C2:C11) // 25
H9: "Standard Deviation" I9: =ROUND(STDEV.S(C2:C11), 1) // 4.3
// Category analysis in columns K, L, M:
K1: "Category" L1: "Avg Units" M1: "Total Units"
K2: "Electronics" L2: =AVERAGEIF(B2:B11,"Electronics",C2:C11) // 16.8
K3: "Home" L3: =AVERAGEIF(B2:B11,"Home",C2:C11) // 19.3
K4: "Tools" L4: =AVERAGEIF(B2:B11,"Tools",C2:C11) // 19.0
M2: =SUMIF(B2:B11,"Electronics",C2:C11) // 67
M3: =SUMIF(B2:B11,"Home",C2:C11) // 58
M4: =SUMIF(B2:B11,"Tools",C2:C11) // 57
// Performance classification:
// In column D, add performance labels
D1: "Performance"
D2: =IF(C2>=20,"High",IF(C2>=15,"Medium","Low"))
// Copy formula down to D11
// Count performance levels:
O1: "Performance" P1: "Count"
O2: "High" P2: =COUNTIF(D2:D11,"High") // 4 products
O3: "Medium" P3: =COUNTIF(D2:D11,"Medium") // 3 products
O4: "Low" P4: =COUNTIF(D2:D11,"Low") // 3 products
// Conditional formatting for units sold:
// Select C2:C11, then Home > Conditional Formatting > Color Scales
// Or create custom rules:
// Green: =C2>=20 (High performers)
// Yellow: =AND(C2>=15,C2<20) (Medium performers)
// Red: =C2<15 (Low performers)
// Chart creation:
// 1. Select category data (E2:F4) for category distribution
// 2. Insert > Pie Chart or Column Chart
// 3. Title: "Product Distribution by Category"
// For units sold distribution:
// 1. Select performance data (O2:P4)
// 2. Insert > Column Chart
// 3. Title: "Products by Performance Level"
// Advanced formulas:
// Multiple modes (if they exist):
=MODE.MULT(C2:C11)
// Products above average:
=COUNTIF(C2:C11,">"&AVERAGE(C2:C11)) // 4 products
// Quartile analysis:
=QUARTILE(C2:C11,1) // Q1: 15
=QUARTILE(C2:C11,3) // Q3: 22
// Identify outliers (values beyond 1.5 * IQR):
// IQR = Q3 - Q1 = 22 - 15 = 7
// Lower bound: Q1 - 1.5*IQR = 15 - 10.5 = 4.5
// Upper bound: Q3 + 1.5*IQR = 22 + 10.5 = 32.5
// No outliers in this dataset