Create professional Pareto charts to identify the vital few causes that account for the majority of effects. Apply the 80/20 rule to prioritize quality improvements, defect reduction, and resource allocation.
Not sure how to format your data? to see how it works, or upload your own data to get started!
A Pareto chart is a combination of a bar chart and a line graph used to identify the most significant factors in a dataset. Named after Italian economist Vilfredo Pareto, it is based on the Pareto Principle (also known as the 80/20 rule), which states that roughly 80% of effects come from 20% of causes.
The bars represent individual values sorted in descending order, while the cumulative line shows the running total as a percentage. This makes it easy to identify which categories contribute most to the overall total.
Follow these steps to create a Pareto chart by hand:
The Pareto Principle states that for many outcomes, roughly 80% of consequences come from 20% of causes. In quality management and business analysis, this principle helps prioritize efforts:
The categories to the left of the 80% threshold line are the "vital few" — the small number of causes that produce the majority of the effect. Categories to the right are the "trivial many" — numerous small contributors.
The default threshold is 80%, but you can adjust it. A lower threshold (e.g., 70%) identifies a stricter set of priorities. A higher threshold (e.g., 90%) captures more categories. Choose based on your resource constraints.
A manufacturing plant tracks 10 types of defects. The quality team wants to determine which defects to focus on for the greatest improvement.
defect_type,count
Surface Scratches,142
Dimensional Error,98
Color Mismatch,75
Missing Parts,52
Packaging Damage,41
Weld Defects,28
Material Contamination,19
Electrical Failure,12
Label Error,8
Assembly Misalignment,5After generating the chart, the analysis shows:
The top 3 defects (Surface Scratches, Dimensional Error, Color Mismatch) account for about 65% of all defects. Adding Missing Parts brings it to ~76%.
Just 5 of 10 categories (50%) account for ~85% of all defects. Fixing these five would eliminate the vast majority of quality issues.
The bottom 5 defects combined represent only ~15% of total defects. Addressing them individually yields diminishing returns.
Using ggplot2 to create a Pareto chart with dual y-axes.
library(tidyverse)
df <- tibble(
defect_type = c("Surface Scratches", "Dimensional Error", "Color Mismatch",
"Missing Parts", "Packaging Damage", "Weld Defects",
"Material Contamination", "Electrical Failure",
"Label Error", "Assembly Misalignment"),
count = c(142, 98, 75, 52, 41, 28, 19, 12, 8, 5)
)
# sort and calculate cumulative percentage
df <- df |>
arrange(desc(count)) |>
mutate(
defect_type = factor(defect_type, levels = defect_type),
cumulative_pct = cumsum(count) / sum(count) * 100
)
# scale factor for secondary axis
scale_factor <- max(df$count) / 100
# create Pareto chart
ggplot(df, aes(x = defect_type)) +
geom_col(aes(y = count), fill = "#1565C0", width = 0.7) +
geom_line(aes(y = cumulative_pct * scale_factor, group = 1),
color = "#FF6F00", linewidth = 1) +
geom_point(aes(y = cumulative_pct * scale_factor),
color = "#FF6F00", size = 3) +
geom_hline(yintercept = 80 * scale_factor,
linetype = "dashed", color = "#C62828") +
scale_y_continuous(
name = "Count",
sec.axis = sec_axis(~ . / scale_factor,
name = "Cumulative Percentage (%)",
labels = function(x) paste0(x, "%"))
) +
labs(
title = "Pareto Chart - Manufacturing Defects",
x = "Defect Type"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.x = element_blank()
)Using Plotly to create an interactive Pareto chart with Python.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({
"defect_type": [
"Surface Scratches", "Dimensional Error", "Color Mismatch",
"Missing Parts", "Packaging Damage", "Weld Defects",
"Material Contamination", "Electrical Failure",
"Label Error", "Assembly Misalignment"
],
"count": [142, 98, 75, 52, 41, 28, 19, 12, 8, 5]
}).sort_values("count", ascending=False).reset_index(drop=True)
df["cumulative_pct"] = df["count"].cumsum() / df["count"].sum() * 100
fig, ax1 = plt.subplots(figsize=(12, 6))
sns.barplot(data=df, x="defect_type", y="count", color="#1565C0", ax=ax1)
ax1.set_ylabel("Count")
ax1.set_xlabel("Defect Type")
ax1.tick_params(axis="x", rotation=45)
ax2 = ax1.twinx()
ax2.plot(df["defect_type"], df["cumulative_pct"], color="#FF6F00", marker="o")
ax2.axhline(80, color="#C62828", linestyle="--")
ax2.set_ylabel("Cumulative Percentage (%)")
ax2.set_ylim(0, 105)
plt.title("Pareto Chart - Manufacturing Defects")
plt.tight_layout()
plt.show()A bar chart simply displays values for categories. A Pareto chart adds two key features: bars are sorted in descending order, and a cumulative percentage line is overlaid. This combination makes it easy to identify which categories contribute most to the total.
Pareto analysis is based on cumulative contribution to a total. Negative values would make the cumulative percentage meaningless. If you have negative values, consider using absolute values or a different chart type like a waterfall chart.
Yes. While 80% is the classic Pareto threshold, you can adjust it based on your analysis needs. Use a lower threshold (e.g., 70%) for stricter prioritization, or a higher threshold (e.g., 90%) to capture more categories.
Duplicate categories are automatically aggregated by summing their values. For example, if "Defect A" appears three times with values 10, 20, and 30, it will be shown as a single bar with value 60.
In Six Sigma's DMAIC methodology, Pareto charts are primarily used in the Measure and Analyze phases. They help identify the most significant defect types or process issues to focus improvement efforts on the areas with the greatest impact.
Ideally, 5-15 categories. Too few categories may not provide enough granularity, while too many make the chart hard to read. If you have more than 15 categories, consider grouping the smallest ones into an "Other" category.
Create professional Pareto charts to identify the vital few causes that account for the majority of effects. Apply the 80/20 rule to prioritize quality improvements, defect reduction, and resource allocation.
Not sure how to format your data? to see how it works, or upload your own data to get started!
A Pareto chart is a combination of a bar chart and a line graph used to identify the most significant factors in a dataset. Named after Italian economist Vilfredo Pareto, it is based on the Pareto Principle (also known as the 80/20 rule), which states that roughly 80% of effects come from 20% of causes.
The bars represent individual values sorted in descending order, while the cumulative line shows the running total as a percentage. This makes it easy to identify which categories contribute most to the overall total.
Follow these steps to create a Pareto chart by hand:
The Pareto Principle states that for many outcomes, roughly 80% of consequences come from 20% of causes. In quality management and business analysis, this principle helps prioritize efforts:
The categories to the left of the 80% threshold line are the "vital few" — the small number of causes that produce the majority of the effect. Categories to the right are the "trivial many" — numerous small contributors.
The default threshold is 80%, but you can adjust it. A lower threshold (e.g., 70%) identifies a stricter set of priorities. A higher threshold (e.g., 90%) captures more categories. Choose based on your resource constraints.
A manufacturing plant tracks 10 types of defects. The quality team wants to determine which defects to focus on for the greatest improvement.
defect_type,count
Surface Scratches,142
Dimensional Error,98
Color Mismatch,75
Missing Parts,52
Packaging Damage,41
Weld Defects,28
Material Contamination,19
Electrical Failure,12
Label Error,8
Assembly Misalignment,5After generating the chart, the analysis shows:
The top 3 defects (Surface Scratches, Dimensional Error, Color Mismatch) account for about 65% of all defects. Adding Missing Parts brings it to ~76%.
Just 5 of 10 categories (50%) account for ~85% of all defects. Fixing these five would eliminate the vast majority of quality issues.
The bottom 5 defects combined represent only ~15% of total defects. Addressing them individually yields diminishing returns.
Using ggplot2 to create a Pareto chart with dual y-axes.
library(tidyverse)
df <- tibble(
defect_type = c("Surface Scratches", "Dimensional Error", "Color Mismatch",
"Missing Parts", "Packaging Damage", "Weld Defects",
"Material Contamination", "Electrical Failure",
"Label Error", "Assembly Misalignment"),
count = c(142, 98, 75, 52, 41, 28, 19, 12, 8, 5)
)
# sort and calculate cumulative percentage
df <- df |>
arrange(desc(count)) |>
mutate(
defect_type = factor(defect_type, levels = defect_type),
cumulative_pct = cumsum(count) / sum(count) * 100
)
# scale factor for secondary axis
scale_factor <- max(df$count) / 100
# create Pareto chart
ggplot(df, aes(x = defect_type)) +
geom_col(aes(y = count), fill = "#1565C0", width = 0.7) +
geom_line(aes(y = cumulative_pct * scale_factor, group = 1),
color = "#FF6F00", linewidth = 1) +
geom_point(aes(y = cumulative_pct * scale_factor),
color = "#FF6F00", size = 3) +
geom_hline(yintercept = 80 * scale_factor,
linetype = "dashed", color = "#C62828") +
scale_y_continuous(
name = "Count",
sec.axis = sec_axis(~ . / scale_factor,
name = "Cumulative Percentage (%)",
labels = function(x) paste0(x, "%"))
) +
labs(
title = "Pareto Chart - Manufacturing Defects",
x = "Defect Type"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
panel.grid.major.x = element_blank()
)Using Plotly to create an interactive Pareto chart with Python.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({
"defect_type": [
"Surface Scratches", "Dimensional Error", "Color Mismatch",
"Missing Parts", "Packaging Damage", "Weld Defects",
"Material Contamination", "Electrical Failure",
"Label Error", "Assembly Misalignment"
],
"count": [142, 98, 75, 52, 41, 28, 19, 12, 8, 5]
}).sort_values("count", ascending=False).reset_index(drop=True)
df["cumulative_pct"] = df["count"].cumsum() / df["count"].sum() * 100
fig, ax1 = plt.subplots(figsize=(12, 6))
sns.barplot(data=df, x="defect_type", y="count", color="#1565C0", ax=ax1)
ax1.set_ylabel("Count")
ax1.set_xlabel("Defect Type")
ax1.tick_params(axis="x", rotation=45)
ax2 = ax1.twinx()
ax2.plot(df["defect_type"], df["cumulative_pct"], color="#FF6F00", marker="o")
ax2.axhline(80, color="#C62828", linestyle="--")
ax2.set_ylabel("Cumulative Percentage (%)")
ax2.set_ylim(0, 105)
plt.title("Pareto Chart - Manufacturing Defects")
plt.tight_layout()
plt.show()A bar chart simply displays values for categories. A Pareto chart adds two key features: bars are sorted in descending order, and a cumulative percentage line is overlaid. This combination makes it easy to identify which categories contribute most to the total.
Pareto analysis is based on cumulative contribution to a total. Negative values would make the cumulative percentage meaningless. If you have negative values, consider using absolute values or a different chart type like a waterfall chart.
Yes. While 80% is the classic Pareto threshold, you can adjust it based on your analysis needs. Use a lower threshold (e.g., 70%) for stricter prioritization, or a higher threshold (e.g., 90%) to capture more categories.
Duplicate categories are automatically aggregated by summing their values. For example, if "Defect A" appears three times with values 10, 20, and 30, it will be shown as a single bar with value 60.
In Six Sigma's DMAIC methodology, Pareto charts are primarily used in the Measure and Analyze phases. They help identify the most significant defect types or process issues to focus improvement efforts on the areas with the greatest impact.
Ideally, 5-15 categories. Too few categories may not provide enough granularity, while too many make the chart hard to read. If you have more than 15 categories, consider grouping the smallest ones into an "Other" category.