Create Bland-Altman plots to assess agreement between two measurement methods. Visualize limits of agreement (LOA), detect fixed and proportional bias, and evaluate whether two methods can be used interchangeably.
Not sure how to format your data? to see how it works, or upload your own data to get started!
A Bland-Altman plot (also known as a difference plot or Tukey mean-difference plot) is a graphical method for comparing two measurement techniques. Instead of simply correlating the two methods, it plots the difference between measurements against their mean, revealing systematic bias and the range of agreement between methods.
It was introduced by J. Martin Bland and Douglas Altman in 1986 and has become the standard method for assessing measurement agreement in clinical research.
The average of all differences between the two methods. If this is significantly different from zero, there is a systematic (fixed) bias between methods.
Mean difference ± 1.96 × SD. These define the range within which 95% of differences are expected to fall. If these limits are clinically acceptable, the methods can be used interchangeably.
A constant systematic difference between methods (mean difference ≠ 0). Tested using a one-sample t-test on the differences.
When the difference between methods changes with the magnitude of measurement. Detected by correlating the means with the differences (significant r indicates proportional bias).
Points scattered randomly around the mean difference line, no obvious trend, narrow LOA within clinically acceptable limits, and roughly 5% of points outside the LOA.
Mean difference line far from zero. One method consistently reads higher or lower than the other. This can potentially be corrected by applying a constant offset.
A fan or funnel shape in the scatter, indicating differences grow (or shrink) with measurement magnitude. Consider using percentage differences or log-transformation.
Correlation (Pearson's r) measures the strength of a linear relationship, not agreement. Two methods can be highly correlated (r = 0.99) but still disagree substantially — for example, if one method always reads 20 units higher.
Bland-Altman analysis specifically quantifies the magnitude and pattern of disagreement, making it the appropriate tool for method comparison studies.
Using plotly to create an interactive Bland-Altman plot.
library(tidyverse)
# sample data
method_a <- c(120, 118, 135, 140, 125, 130, 145, 138, 122, 128)
method_b <- c(118, 120, 132, 138, 127, 128, 142, 140, 120, 130)
# Bland-Altman statistics
means <- (method_a + method_b) / 2
diffs <- method_a - method_b
mean_diff <- mean(diffs)
sd_diff <- sd(diffs)
upper_loa <- mean_diff + 1.96 * sd_diff
lower_loa <- mean_diff - 1.96 * sd_diff
# Test for bias
t_test <- t.test(diffs, mu = 0)
cat("Fixed Bias p-value:", t_test$p.value, "\n")
# Test for proportional bias
cor_test <- cor.test(means, diffs)
cat("Proportional Bias p-value:", cor_test$p.value, "\n")
# Create data frame
ba_data <- data.frame(means = means, diffs = diffs)
# Bland-Altman plot with ggplot2
ggplot(ba_data, aes(x = means, y = diffs)) +
geom_point(color = "#1565C0", size = 3) +
geom_hline(aes(yintercept = mean_diff, linetype = paste0("Mean (", round(mean_diff, 2), ")")),
color = "green", linewidth = 0.8) +
geom_hline(aes(yintercept = upper_loa, linetype = paste0("+1.96SD (", round(upper_loa, 2), ")")),
color = "red", linewidth = 0.6) +
geom_hline(aes(yintercept = lower_loa, linetype = paste0("-1.96SD (", round(lower_loa, 2), ")")),
color = "red", linewidth = 0.6) +
scale_linetype_manual(name = "Reference Lines",
values = c(setNames("solid", paste0("Mean (", round(mean_diff, 2), ")")),
setNames("dashed", paste0("+1.96SD (", round(upper_loa, 2), ")")),
setNames("dashed", paste0("-1.96SD (", round(lower_loa, 2), ")"))
)) +
labs(title = "Bland-Altman Plot",
x = "Mean of Methods",
y = "Difference (A - B)") +
theme_minimal() +
theme(legend.position = "bottom")
cat("\nMean Difference:", round(mean_diff, 4))
cat("\nSD:", round(sd_diff, 4))
cat("\n95% LOA: [", round(lower_loa, 4), ",", round(upper_loa, 4), "]\n")Using Matplotlib and Seaborn to create a Bland-Altman plot with limits of agreement.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Sample: Two blood pressure measurement methods
method_a = np.array([120, 118, 135, 140, 125, 130, 145, 138, 122, 128])
method_b = np.array([118, 120, 132, 138, 127, 128, 142, 140, 120, 130])
# Calculate Bland-Altman statistics
means = (method_a + method_b) / 2
diffs = method_a - method_b
mean_diff = np.mean(diffs)
sd_diff = np.std(diffs, ddof=1)
upper_loa = mean_diff + 1.96 * sd_diff
lower_loa = mean_diff - 1.96 * sd_diff
# Plot
sns.set_theme(style="whitegrid")
fig, ax = plt.subplots(figsize=(9, 6))
ax.scatter(means, diffs, color='#1565C0', s=60, alpha=0.8, label='Data')
x_range = [means.min() - 2, means.max() + 2]
ax.set_xlim(x_range)
# Mean difference line
ax.axhline(mean_diff, color='green', linewidth=2,
label=f'Mean Diff ({mean_diff:.2f})')
# Upper and Lower LOA
ax.axhline(upper_loa, color='red', linewidth=1.5, linestyle='--',
label=f'+1.96 SD ({upper_loa:.2f})')
ax.axhline(lower_loa, color='red', linewidth=1.5, linestyle='--',
label=f'-1.96 SD ({lower_loa:.2f})')
ax.set_title('Bland-Altman Plot')
ax.set_xlabel('Mean of Method A and Method B')
ax.set_ylabel('Difference (Method A - Method B)')
ax.legend(frameon=False)
plt.tight_layout()
plt.show()
print(f"Mean Difference: {mean_diff:.4f}")
print(f"SD of Differences: {sd_diff:.4f}")
print(f"95% LOA: [{lower_loa:.4f}, {upper_loa:.4f}]")You need paired measurements: the same subjects (patients, samples, etc.) measured by two different methods. Each row should contain one measurement from Method 1 and one from Method 2 for the same subject.
Bland and Altman recommended at least 50 pairs for reliable estimates. With fewer data points (minimum 5 for this tool), the confidence intervals around the LOA will be wider, reflecting greater uncertainty.
The standard Bland-Altman method assumes normally distributed differences. If the Shapiro-Wilk test is significant, consider using a non-parametric approach (e.g., percentile-based LOA) or transforming the data (e.g., log transformation).
Use percentage differences when you observe proportional bias (differences that increase with measurement magnitude) or when the clinical interpretation is more meaningful in relative terms (e.g., "Method A reads 5% higher" rather than "Method A reads 10 units higher").
Before analyzing, define clinically acceptable limits of agreement. If the calculated LOA fall within your pre-defined acceptable range, the methods can be considered interchangeable. This decision should be based on clinical judgment, not just statistical significance.
The standard Bland-Altman plot compares exactly two methods. For multiple methods, create pairwise Bland-Altman plots for each combination, or consider using an intraclass correlation coefficient (ICC) for an overall agreement measure.
Create Bland-Altman plots to assess agreement between two measurement methods. Visualize limits of agreement (LOA), detect fixed and proportional bias, and evaluate whether two methods can be used interchangeably.
Not sure how to format your data? to see how it works, or upload your own data to get started!
A Bland-Altman plot (also known as a difference plot or Tukey mean-difference plot) is a graphical method for comparing two measurement techniques. Instead of simply correlating the two methods, it plots the difference between measurements against their mean, revealing systematic bias and the range of agreement between methods.
It was introduced by J. Martin Bland and Douglas Altman in 1986 and has become the standard method for assessing measurement agreement in clinical research.
The average of all differences between the two methods. If this is significantly different from zero, there is a systematic (fixed) bias between methods.
Mean difference ± 1.96 × SD. These define the range within which 95% of differences are expected to fall. If these limits are clinically acceptable, the methods can be used interchangeably.
A constant systematic difference between methods (mean difference ≠ 0). Tested using a one-sample t-test on the differences.
When the difference between methods changes with the magnitude of measurement. Detected by correlating the means with the differences (significant r indicates proportional bias).
Points scattered randomly around the mean difference line, no obvious trend, narrow LOA within clinically acceptable limits, and roughly 5% of points outside the LOA.
Mean difference line far from zero. One method consistently reads higher or lower than the other. This can potentially be corrected by applying a constant offset.
A fan or funnel shape in the scatter, indicating differences grow (or shrink) with measurement magnitude. Consider using percentage differences or log-transformation.
Correlation (Pearson's r) measures the strength of a linear relationship, not agreement. Two methods can be highly correlated (r = 0.99) but still disagree substantially — for example, if one method always reads 20 units higher.
Bland-Altman analysis specifically quantifies the magnitude and pattern of disagreement, making it the appropriate tool for method comparison studies.
Using plotly to create an interactive Bland-Altman plot.
library(tidyverse)
# sample data
method_a <- c(120, 118, 135, 140, 125, 130, 145, 138, 122, 128)
method_b <- c(118, 120, 132, 138, 127, 128, 142, 140, 120, 130)
# Bland-Altman statistics
means <- (method_a + method_b) / 2
diffs <- method_a - method_b
mean_diff <- mean(diffs)
sd_diff <- sd(diffs)
upper_loa <- mean_diff + 1.96 * sd_diff
lower_loa <- mean_diff - 1.96 * sd_diff
# Test for bias
t_test <- t.test(diffs, mu = 0)
cat("Fixed Bias p-value:", t_test$p.value, "\n")
# Test for proportional bias
cor_test <- cor.test(means, diffs)
cat("Proportional Bias p-value:", cor_test$p.value, "\n")
# Create data frame
ba_data <- data.frame(means = means, diffs = diffs)
# Bland-Altman plot with ggplot2
ggplot(ba_data, aes(x = means, y = diffs)) +
geom_point(color = "#1565C0", size = 3) +
geom_hline(aes(yintercept = mean_diff, linetype = paste0("Mean (", round(mean_diff, 2), ")")),
color = "green", linewidth = 0.8) +
geom_hline(aes(yintercept = upper_loa, linetype = paste0("+1.96SD (", round(upper_loa, 2), ")")),
color = "red", linewidth = 0.6) +
geom_hline(aes(yintercept = lower_loa, linetype = paste0("-1.96SD (", round(lower_loa, 2), ")")),
color = "red", linewidth = 0.6) +
scale_linetype_manual(name = "Reference Lines",
values = c(setNames("solid", paste0("Mean (", round(mean_diff, 2), ")")),
setNames("dashed", paste0("+1.96SD (", round(upper_loa, 2), ")")),
setNames("dashed", paste0("-1.96SD (", round(lower_loa, 2), ")"))
)) +
labs(title = "Bland-Altman Plot",
x = "Mean of Methods",
y = "Difference (A - B)") +
theme_minimal() +
theme(legend.position = "bottom")
cat("\nMean Difference:", round(mean_diff, 4))
cat("\nSD:", round(sd_diff, 4))
cat("\n95% LOA: [", round(lower_loa, 4), ",", round(upper_loa, 4), "]\n")Using Matplotlib and Seaborn to create a Bland-Altman plot with limits of agreement.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Sample: Two blood pressure measurement methods
method_a = np.array([120, 118, 135, 140, 125, 130, 145, 138, 122, 128])
method_b = np.array([118, 120, 132, 138, 127, 128, 142, 140, 120, 130])
# Calculate Bland-Altman statistics
means = (method_a + method_b) / 2
diffs = method_a - method_b
mean_diff = np.mean(diffs)
sd_diff = np.std(diffs, ddof=1)
upper_loa = mean_diff + 1.96 * sd_diff
lower_loa = mean_diff - 1.96 * sd_diff
# Plot
sns.set_theme(style="whitegrid")
fig, ax = plt.subplots(figsize=(9, 6))
ax.scatter(means, diffs, color='#1565C0', s=60, alpha=0.8, label='Data')
x_range = [means.min() - 2, means.max() + 2]
ax.set_xlim(x_range)
# Mean difference line
ax.axhline(mean_diff, color='green', linewidth=2,
label=f'Mean Diff ({mean_diff:.2f})')
# Upper and Lower LOA
ax.axhline(upper_loa, color='red', linewidth=1.5, linestyle='--',
label=f'+1.96 SD ({upper_loa:.2f})')
ax.axhline(lower_loa, color='red', linewidth=1.5, linestyle='--',
label=f'-1.96 SD ({lower_loa:.2f})')
ax.set_title('Bland-Altman Plot')
ax.set_xlabel('Mean of Method A and Method B')
ax.set_ylabel('Difference (Method A - Method B)')
ax.legend(frameon=False)
plt.tight_layout()
plt.show()
print(f"Mean Difference: {mean_diff:.4f}")
print(f"SD of Differences: {sd_diff:.4f}")
print(f"95% LOA: [{lower_loa:.4f}, {upper_loa:.4f}]")You need paired measurements: the same subjects (patients, samples, etc.) measured by two different methods. Each row should contain one measurement from Method 1 and one from Method 2 for the same subject.
Bland and Altman recommended at least 50 pairs for reliable estimates. With fewer data points (minimum 5 for this tool), the confidence intervals around the LOA will be wider, reflecting greater uncertainty.
The standard Bland-Altman method assumes normally distributed differences. If the Shapiro-Wilk test is significant, consider using a non-parametric approach (e.g., percentile-based LOA) or transforming the data (e.g., log transformation).
Use percentage differences when you observe proportional bias (differences that increase with measurement magnitude) or when the clinical interpretation is more meaningful in relative terms (e.g., "Method A reads 5% higher" rather than "Method A reads 10 units higher").
Before analyzing, define clinically acceptable limits of agreement. If the calculated LOA fall within your pre-defined acceptable range, the methods can be considered interchangeable. This decision should be based on clinical judgment, not just statistical significance.
The standard Bland-Altman plot compares exactly two methods. For multiple methods, create pairwise Bland-Altman plots for each combination, or consider using an intraclass correlation coefficient (ICC) for an overall agreement measure.