StatsCalculators.com

Histogram Maker

Created:November 10, 2024
Last Updated:May 16, 2025

The Histogram is a powerful data visualization tool that displays the distribution of continuous numerical data by dividing it into bins (intervals) and showing the frequency of observations in each bin. Histograms reveal important patterns in your data including central tendency, spread, skewness, and potential outliers. They're particularly useful for understanding data distributions, identifying patterns, comparing groups, and detecting anomalies. Simply upload your data or use our sample datasets to create professional histograms with customizable bins and density curves. Not sure where to start? Check out our step-by-step tutorial.

Calculator

1. Load Your Data

Note: Column names will be converted to snake_case (e.g., "Product ID" → "product_id") for processing.

2. Select Column & Options

Related Calculators

Learn More About Histograms

What is a Histogram?

A histogram is a graphical representation that organizes a group of numerical data points into bins, displaying the frequency of data points that fall into each bin. Unlike bar charts, histograms are used for continuous data where bins represent ranges of values. The height of each bar shows how many observations fall into that range, helping visualize the distribution shape, central tendency, and variability of the data.

Histogram anatomy showing all components: bins, frequencies, distribution shape, and data range

How to Read a Histogram

Understanding histograms becomes intuitive once you know what to look for in the distribution:

Key Components

  • Bins: Consecutive, non-overlapping intervals of a variable
  • Frequency: Number of observations falling in each bin
  • Bin Width: Affects the granularity of the distribution
  • Range: Span from minimum to maximum values

Distribution Patterns

  • Bell-shaped histogram: Normal distribution, symmetric around the mean
  • Right/positively skewed histogram: Tail extends to the right
  • Left/negatively skewed histogram: Tail extends to the left
  • Bimodal histogram: Two distinct peaks showing two subgroups
  • Uniform histogram: Relatively equal frequencies across all bins

How to Make a Histogram with Our Calculator

  1. Click Sample Data and select Restaurant Tips
  2. For Value column, select total_bill
  3. For Group By column, select a categorical variable or leave it as None
  4. For bin type, select either Number of Bins or Custom Bin Edges
  5. Toggle Show Density on or off as needed
  6. Click Generate Histogram to visualize the data

How to Make a Histogram by Hand

Understanding how to create a histogram manually helps you grasp the underlying concepts. Here's a step-by-step guide:

Example Dataset: Test Scores

65, 72, 68, 74, 61, 76, 71, 69, 73, 67, 70, 68, 75, 62, 77

  1. Step 1: Determine the range

    Find the minimum and maximum values:

    Min = 61, Max = 77, Range = 77 - 61 = 16

  2. Step 2: Choose the number of bins

    Common rules include Sturges' rule or simply taking the square root of the number of observations. For 15 values, we might use 4 bins.

  3. Step 3: Determine bin width

    Bin width = Range ÷ Number of bins

    Bin width = 16 ÷ 4 = 4

  4. Step 4: Set up bin boundaries

    Bin 1: 61-64, Bin 2: 65-68, Bin 3: 69-72, Bin 4: 73-77

  5. Step 5: Count frequencies

    Count how many values fall in each bin:

    Bin 1 (61-64): 2 values (61, 62)

    Bin 2 (65-68): 4 values (65, 67, 68, 68)

    Bin 3 (69-72): 4 values (69, 70, 71, 72)

    Bin 4 (73-77): 5 values (73, 74, 75, 76, 77)

  6. Step 6: Draw the histogram

    • Draw an x-axis with bin ranges and y-axis with frequencies

    • Draw adjacent bars for each bin with heights equal to their frequencies

    • Add title, labels, and any other necessary annotations

Pro Tip:

The appropriate number of bins is a balance between too few (hiding details) and too many (creating noise). For most datasets, between 5-15 bins works well. Many software tools use algorithms to determine optimal bin counts automatically.

Histogram Quick Reference

Key Formulas

Bin width = (Max - Min) / Number of bins

Sturges' rule for bin count = 1 + 3.322 × log(n)

Relative frequency = Frequency / Total observations

Mean = Sum of all values / Number of values

Median = Middle value when data is sorted

Mode = Most frequent value (tallest bin)

Frequently Asked Questions

What is the difference between a histogram and a bar graph?

Histograms display the distribution of continuous numerical data, with no gaps between bars as they represent continuous ranges. Bar graphs display categorical data with gaps between bars as they represent distinct categories. Histograms show distribution shapes while bar graphs compare categories.

How do I interpret a right skewed histogram?

A right (positively) skewed histogram has its peak on the left side with a longer tail extending to the right. This indicates most values are concentrated on the lower end, with fewer higher values pulling the mean to the right of the median. Examples include income distributions and reaction times.

How many bins should I use in my histogram?

The optimal bin count balances detail and smoothness. Too few bins can hide important features, while too many create noise. Common approaches include Sturges' rule (1 + 3.322 × log(n)), the square root of n, or using software to determine bins automatically. For most datasets, 5-15 bins typically work well.

Why would someone make a histogram instead of a bar chart?

Histograms are ideal for visualizing distributions of continuous data (heights, weights, temperatures, etc.) to see patterns, identify outliers, and understand central tendency. Bar charts cannot show these distribution shapes as they're designed for comparing distinct categories rather than showing how values are spread across a continuous range.

What does a bimodal histogram indicate?

A bimodal histogram shows two distinct peaks, suggesting two different subgroups or processes within the data. This could indicate a mixed population (e.g., heights of men and women combined), two different states of a system, or the influence of two different factors on the measured variable.

Creating Histogram in R

Here's a simple example of creating and customizing a histogram in R using the ggplot2 package.

R
library(tidyverse)

tips <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# create histogram with 10 bins by setting bins = 10
ggplot(tips, aes(x = total_bill)) +
  geom_histogram(bins = 10, fill = "steelblue", color = "white") +
  labs(title = "Distribution of Total Bill",
       x = "Total Bill Amount",
       y = "Frequency") +
  theme_minimal()

# customize bin edges
custom_breaks <- seq(0, 50, by = 5)
ggplot(tips, aes(x = total_bill)) +
  geom_histogram(breaks = custom_breaks, fill = "steelblue", color = "white") +
  labs(title = "Distribution of Total Bill",
       x = "Total Bill Amount",
       y = "Frequency") +
  theme_minimal()

# showing density instead of count
ggplot(tips, aes(x = total_bill)) +
  geom_histogram(aes(y = after_stat(density)), bins = 15, 
                 fill = "steelblue", color = "white") +
  labs(title = "Density Distribution of Total Bill",
       x = "Total Bill Amount",
       y = "Density") +
  theme_minimal()

# histogram with density curve (figure below)
ggplot(tips, aes(x = total_bill)) +
  geom_histogram(aes(y = after_stat(density)), bins = 15, 
                 fill = "steelblue", color = "white") +
  geom_density(aes(y = after_stat(density)), color = "red", linewidth = 1) +
  labs(title = "Density Distribution of Total Bill with Density Curve",
       x = "Total Bill Amount",
       y = "Density") +
  theme_minimal()
Histogram in R

The ggpubr package provides the gghistogram() function, which offers a simpler way to create publication-ready histograms with built-in statistical features:

R
library(ggpubr)

tips <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Basic histogram with mean line
gghistogram(tips, x = "total_bill",
            add = "mean",            # Add vertical line for mean
            color = "steelblue",
            fill = "steelblue",
            alpha = 0.7,
            bins = 15,
            title = "Distribution of Total Bill") +
  labs(x = "Total Bill Amount", y = "Count")

# Density histogram with normal curve and rug plot
gghistogram(tips, x = "total_bill", 
            add = c("mean", "median"), # Add lines for mean and median
            add.params = list(color = c("red", "blue"), linetype = c("dashed", "dotted")),
            color = "darkblue",
            fill = "lightblue",
            alpha = 0.8,
            bins = 20,
            rug = TRUE,              # Add rug plot at bottom
            add.normal = TRUE,       # Add normal density curve
            title = "Total Bill Distribution with Normal Curve") +
  labs(x = "Total Bill Amount", y = "Density")

# Comparing groups with facets
gghistogram(tips, x = "total_bill",
            add = "mean",
            color = "time",         # Color by time (lunch/dinner)
            fill = "time",
            palette = c("#00AFBB", "#E7B800"),
            bins = 12,
            facet.by = "day",       # Create separate panels by day
            panel.labs = list(day = c("Thursday", "Friday", "Saturday", "Sunday")),
            title = "Total Bill Distribution by Day and Time") +
  labs(x = "Total Bill Amount", y = "Count")
Histogram in R

This code creates a histogram with 15 bins for the 'total_bill' variable from a restaurant tips dataset. The red line represents the density curve, showing the distribution of total bill amounts.

Creating Histogram in Python

Here's how to create histograms in Python using popular visualization libraries like matplotlib and seaborn.

Python
    import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data
tips = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Set style for better-looking plots
plt.style.use('seaborn-v0_8')

# Basic matplotlib histogram
plt.figure(figsize=(10, 6))
plt.hist(tips['total_bill'], bins=10, color='steelblue', edgecolor='white')
plt.title('Distribution of Total Bills')
plt.xlabel('Total Bill Amount')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.75)
plt.show()

# Seaborn histogram with KDE (Kernel Density Estimation)
plt.figure(figsize=(10, 6))
sns.histplot(tips['total_bill'], bins=15, kde=True, color='steelblue')
plt.title('Distribution of Total Bills with Density Curve')
plt.xlabel('Total Bill Amount')
plt.ylabel('Frequency')
plt.show()

# Comparing distributions by day with seaborn
plt.figure(figsize=(12, 6))
sns.histplot(data=tips, x='total_bill', hue='day', bins=12, 
            alpha=0.7, palette='viridis', multiple='layer')
plt.title('Distribution of Total Bills by Day')
plt.xlabel('Total Bill Amount')
plt.ylabel('Frequency')
plt.show()
Histogram in Python