Generate ROC (Receiver Operating Characteristic) curves to evaluate binary classification model performance. Calculate AUC, find optimal thresholds using Youden's index, and view confusion matrix metrics including sensitivity, specificity, PPV, and NPV.
Not sure how to format your data? to see how it works, or upload your own data to get started!
A Receiver Operating Characteristic (ROC) curve is a graphical tool used to evaluate the performance of a binary classification model. It plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 - Specificity) at various classification thresholds.
The curve shows the trade-off between correctly identifying positive cases and incorrectly classifying negative cases as positive. A perfect classifier would have a curve that passes through the top-left corner (100% sensitivity, 0% false positive rate).
The Area Under the ROC Curve (AUC) summarizes the overall diagnostic accuracy of a classification model in a single number between 0 and 1:
| AUC Range | Interpretation |
|---|---|
| 0.90 – 1.00 | Excellent discrimination |
| 0.80 – 0.90 | Good discrimination |
| 0.70 – 0.80 | Fair discrimination |
| 0.60 – 0.70 | Poor discrimination |
| 0.50 | No discrimination (random chance) |
AUC represents the probability that a randomly chosen positive case will have a higher predicted probability than a randomly chosen negative case.
This tool uses Youden's J statistic (J = Sensitivity + Specificity - 1) to find the optimal classification threshold. This point maximizes the vertical distance between the ROC curve and the diagonal reference line, balancing sensitivity and specificity equally.
In practice, the optimal threshold depends on the relative costs of false positives vs. false negatives for your specific application. For example:
Prioritize sensitivity — use a lower threshold to minimize false negatives (missed cases), even if it increases false positives.
Balance specificity and sensitivity — use a higher threshold to reduce false positives (costly investigations), but accept some missed fraud.
Using pROC and plotly to create an interactive ROC curve.
library(tidyverse)
# Sample data (same as this page's sampleData)
df <- tibble(
actual = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
predicted = c(0.95, 0.92, 0.88, 0.85, 0.82, 0.79, 0.76, 0.73, 0.71, 0.68, 0.65, 0.62, 0.58, 0.55, 0.52, 0.48, 0.45, 0.42, 0.38, 0.35, 0.32, 0.28, 0.25, 0.22, 0.18, 0.12, 0.15, 0.18, 0.22, 0.25, 0.28, 0.32, 0.35, 0.38, 0.08, 0.05, 0.42, 0.45, 0.1, 0.02, 0.48, 0.15, 0.2, 0.06, 0.3, 0.12, 0.08, 0.25, 0.04, 0.01)
)
# Calculate ROC curve exactly like backend
thresholds <- sort(unique(df$predicted), decreasing = TRUE)
tpr <- c(0)
fpr <- c(0)
threshold_values <- c(if (length(thresholds) > 0) thresholds[1] + 0.01 else 1.01)
total_positive <- sum(df$actual == 1)
total_negative <- sum(df$actual == 0)
for (thresh in thresholds) {
pred_positive <- df$predicted >= thresh
tp <- sum(pred_positive & (df$actual == 1))
fp <- sum(pred_positive & (df$actual == 0))
tpr <- c(tpr, tp / total_positive)
fpr <- c(fpr, fp / total_negative)
threshold_values <- c(threshold_values, thresh)
}
if (tail(tpr, 1) != 1 || tail(fpr, 1) != 1) {
tpr <- c(tpr, 1)
fpr <- c(fpr, 1)
threshold_values <- c(threshold_values, 0)
}
# AUC (trapezoidal rule)
roc_auc <- sum(diff(fpr) * (head(tpr, -1) + tail(tpr, -1)) / 2)
cat("AUC:", roc_auc, "\n")
# Optimal threshold (Youden's index) - same rule as backend
youden <- tpr - fpr
best_idx <- which.max(youden)
best_threshold <- threshold_values[best_idx]
best_sensitivity <- tpr[best_idx]
best_specificity <- 1 - fpr[best_idx]
cat("Optimal Threshold:", best_threshold, "\n")
cat("Sensitivity:", best_sensitivity, "\n")
cat("Specificity:", best_specificity, "\n")
best_point <- tibble(
fpr = 1 - best_specificity,
tpr = best_sensitivity,
threshold = best_threshold
)
# Plot with ggplot2
roc_df <- tibble(fpr = fpr, tpr = tpr)
ggplot(roc_df, aes(x = fpr, y = tpr)) +
geom_line(color = "#1565C0", linewidth = 1.2) +
geom_area(fill = "#1565C0", alpha = 0.1) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
geom_point(
data = best_point,
aes(x = fpr, y = tpr),
color = "red",
size = 3,
shape = 8,
inherit.aes = FALSE
) +
labs(
title = "ROC Curve",
subtitle = paste0("AUC = ", round(roc_auc, 4),
" | Optimal threshold = ", round(best_point$threshold, 3)),
x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensitivity)"
) +
coord_equal(xlim = c(0, 1), ylim = c(0, 1), expand = FALSE) +
theme_minimal(base_size = 12)Using scikit-learn and Plotly to create an interactive ROC curve with optimal threshold.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Sample data (same as this page's sampleData)
y_true = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
y_scores = np.array([0.95, 0.92, 0.88, 0.85, 0.82, 0.79, 0.76, 0.73, 0.71, 0.68, 0.65, 0.62, 0.58, 0.55, 0.52, 0.48, 0.45, 0.42, 0.38, 0.35, 0.32, 0.28, 0.25, 0.22, 0.18, 0.12, 0.15, 0.18, 0.22, 0.25, 0.28, 0.32, 0.35, 0.38, 0.08, 0.05, 0.42, 0.45, 0.1, 0.02, 0.48, 0.15, 0.2, 0.06, 0.3, 0.12, 0.08, 0.25, 0.04, 0.01])
# Calculate ROC curve exactly like backend
thresholds = np.sort(np.unique(y_scores))[::-1]
tpr_list = [0.0]
fpr_list = [0.0]
threshold_list = [thresholds[0] + 0.01 if len(thresholds) > 0 else 1.01]
total_positive = np.sum(y_true == 1)
total_negative = np.sum(y_true == 0)
for thresh in thresholds:
pred_positive = y_scores >= thresh
tp = np.sum((pred_positive) & (y_true == 1))
fp = np.sum((pred_positive) & (y_true == 0))
tpr_list.append(tp / total_positive)
fpr_list.append(fp / total_negative)
threshold_list.append(thresh)
if tpr_list[-1] != 1.0 or fpr_list[-1] != 1.0:
tpr_list.append(1.0)
fpr_list.append(1.0)
threshold_list.append(0.0)
fpr = np.array(fpr_list)
tpr = np.array(tpr_list)
thresholds = np.array(threshold_list)
roc_auc = np.trapezoid(tpr, fpr)
# Optimal threshold (Youden's index) - same rule as backend
youden = tpr - fpr
best_idx = np.argmax(youden)
optimal_threshold = thresholds[best_idx]
# Plot with seaborn/matplotlib
sns.set_theme(style='whitegrid')
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(
fpr,
tpr,
color='#1565C0',
linewidth=2.5,
label=f'ROC Curve (AUC = {roc_auc:.4f})'
)
ax.fill_between(fpr, tpr, alpha=0.1, color='#1565C0')
ax.plot([0, 1], [0, 1], linestyle='--', color='gray', linewidth=1, label='Random Classifier')
ax.scatter(
fpr[best_idx],
tpr[best_idx],
color='red',
s=120,
marker='*',
label=f'Optimal (threshold={optimal_threshold:.3f})',
zorder=5
)
ax.set_title('ROC Curve')
ax.set_xlabel('False Positive Rate (1 - Specificity)')
ax.set_ylabel('True Positive Rate (Sensitivity)')
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.legend(loc='lower right')
plt.tight_layout()
plt.show()
print(f"AUC: {roc_auc:.4f}")
print(f"Optimal Threshold: {optimal_threshold:.4f}")
print(f"Sensitivity at optimal: {tpr[best_idx]:.4f}")
print(f"Specificity at optimal: {1 - fpr[best_idx]:.4f}")You need two columns: (1) actual binary labels (0/1 or two categories like Yes/No) and (2) predicted probabilities (numeric values between 0 and 1). The predicted probabilities typically come from a logistic regression or other classification model.
AUC represents the probability that a randomly chosen positive case will have a higher predicted probability than a randomly chosen negative case. An AUC of 0.5 means the model is no better than random guessing, while an AUC of 1.0 indicates perfect discrimination.
ROC curves are ideal for evaluating binary classifiers, comparing multiple models, selecting classification thresholds, and assessing diagnostic test accuracy. They are particularly useful when class distributions are relatively balanced. For imbalanced datasets, consider also using Precision-Recall curves.
ROC curves plot TPR vs FPR and are robust to class imbalance in the evaluation metric. Precision-Recall curves plot precision vs recall and are more informative when the positive class is rare. Both are valuable diagnostic tools for classification models.
Yes, overlaying ROC curves for different models on the same plot is a standard way to compare classifier performance. The model with the highest AUC (curve closest to the top-left corner) generally has the best discriminative ability.
An AUC below 0.5 suggests the model is performing worse than random chance, which typically means the positive and negative labels are swapped. Try inverting the positive label or checking your data for labeling errors.
Generate ROC (Receiver Operating Characteristic) curves to evaluate binary classification model performance. Calculate AUC, find optimal thresholds using Youden's index, and view confusion matrix metrics including sensitivity, specificity, PPV, and NPV.
Not sure how to format your data? to see how it works, or upload your own data to get started!
A Receiver Operating Characteristic (ROC) curve is a graphical tool used to evaluate the performance of a binary classification model. It plots the True Positive Rate (Sensitivity) against the False Positive Rate (1 - Specificity) at various classification thresholds.
The curve shows the trade-off between correctly identifying positive cases and incorrectly classifying negative cases as positive. A perfect classifier would have a curve that passes through the top-left corner (100% sensitivity, 0% false positive rate).
The Area Under the ROC Curve (AUC) summarizes the overall diagnostic accuracy of a classification model in a single number between 0 and 1:
| AUC Range | Interpretation |
|---|---|
| 0.90 – 1.00 | Excellent discrimination |
| 0.80 – 0.90 | Good discrimination |
| 0.70 – 0.80 | Fair discrimination |
| 0.60 – 0.70 | Poor discrimination |
| 0.50 | No discrimination (random chance) |
AUC represents the probability that a randomly chosen positive case will have a higher predicted probability than a randomly chosen negative case.
This tool uses Youden's J statistic (J = Sensitivity + Specificity - 1) to find the optimal classification threshold. This point maximizes the vertical distance between the ROC curve and the diagonal reference line, balancing sensitivity and specificity equally.
In practice, the optimal threshold depends on the relative costs of false positives vs. false negatives for your specific application. For example:
Prioritize sensitivity — use a lower threshold to minimize false negatives (missed cases), even if it increases false positives.
Balance specificity and sensitivity — use a higher threshold to reduce false positives (costly investigations), but accept some missed fraud.
Using pROC and plotly to create an interactive ROC curve.
library(tidyverse)
# Sample data (same as this page's sampleData)
df <- tibble(
actual = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
predicted = c(0.95, 0.92, 0.88, 0.85, 0.82, 0.79, 0.76, 0.73, 0.71, 0.68, 0.65, 0.62, 0.58, 0.55, 0.52, 0.48, 0.45, 0.42, 0.38, 0.35, 0.32, 0.28, 0.25, 0.22, 0.18, 0.12, 0.15, 0.18, 0.22, 0.25, 0.28, 0.32, 0.35, 0.38, 0.08, 0.05, 0.42, 0.45, 0.1, 0.02, 0.48, 0.15, 0.2, 0.06, 0.3, 0.12, 0.08, 0.25, 0.04, 0.01)
)
# Calculate ROC curve exactly like backend
thresholds <- sort(unique(df$predicted), decreasing = TRUE)
tpr <- c(0)
fpr <- c(0)
threshold_values <- c(if (length(thresholds) > 0) thresholds[1] + 0.01 else 1.01)
total_positive <- sum(df$actual == 1)
total_negative <- sum(df$actual == 0)
for (thresh in thresholds) {
pred_positive <- df$predicted >= thresh
tp <- sum(pred_positive & (df$actual == 1))
fp <- sum(pred_positive & (df$actual == 0))
tpr <- c(tpr, tp / total_positive)
fpr <- c(fpr, fp / total_negative)
threshold_values <- c(threshold_values, thresh)
}
if (tail(tpr, 1) != 1 || tail(fpr, 1) != 1) {
tpr <- c(tpr, 1)
fpr <- c(fpr, 1)
threshold_values <- c(threshold_values, 0)
}
# AUC (trapezoidal rule)
roc_auc <- sum(diff(fpr) * (head(tpr, -1) + tail(tpr, -1)) / 2)
cat("AUC:", roc_auc, "\n")
# Optimal threshold (Youden's index) - same rule as backend
youden <- tpr - fpr
best_idx <- which.max(youden)
best_threshold <- threshold_values[best_idx]
best_sensitivity <- tpr[best_idx]
best_specificity <- 1 - fpr[best_idx]
cat("Optimal Threshold:", best_threshold, "\n")
cat("Sensitivity:", best_sensitivity, "\n")
cat("Specificity:", best_specificity, "\n")
best_point <- tibble(
fpr = 1 - best_specificity,
tpr = best_sensitivity,
threshold = best_threshold
)
# Plot with ggplot2
roc_df <- tibble(fpr = fpr, tpr = tpr)
ggplot(roc_df, aes(x = fpr, y = tpr)) +
geom_line(color = "#1565C0", linewidth = 1.2) +
geom_area(fill = "#1565C0", alpha = 0.1) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
geom_point(
data = best_point,
aes(x = fpr, y = tpr),
color = "red",
size = 3,
shape = 8,
inherit.aes = FALSE
) +
labs(
title = "ROC Curve",
subtitle = paste0("AUC = ", round(roc_auc, 4),
" | Optimal threshold = ", round(best_point$threshold, 3)),
x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensitivity)"
) +
coord_equal(xlim = c(0, 1), ylim = c(0, 1), expand = FALSE) +
theme_minimal(base_size = 12)Using scikit-learn and Plotly to create an interactive ROC curve with optimal threshold.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Sample data (same as this page's sampleData)
y_true = np.array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
y_scores = np.array([0.95, 0.92, 0.88, 0.85, 0.82, 0.79, 0.76, 0.73, 0.71, 0.68, 0.65, 0.62, 0.58, 0.55, 0.52, 0.48, 0.45, 0.42, 0.38, 0.35, 0.32, 0.28, 0.25, 0.22, 0.18, 0.12, 0.15, 0.18, 0.22, 0.25, 0.28, 0.32, 0.35, 0.38, 0.08, 0.05, 0.42, 0.45, 0.1, 0.02, 0.48, 0.15, 0.2, 0.06, 0.3, 0.12, 0.08, 0.25, 0.04, 0.01])
# Calculate ROC curve exactly like backend
thresholds = np.sort(np.unique(y_scores))[::-1]
tpr_list = [0.0]
fpr_list = [0.0]
threshold_list = [thresholds[0] + 0.01 if len(thresholds) > 0 else 1.01]
total_positive = np.sum(y_true == 1)
total_negative = np.sum(y_true == 0)
for thresh in thresholds:
pred_positive = y_scores >= thresh
tp = np.sum((pred_positive) & (y_true == 1))
fp = np.sum((pred_positive) & (y_true == 0))
tpr_list.append(tp / total_positive)
fpr_list.append(fp / total_negative)
threshold_list.append(thresh)
if tpr_list[-1] != 1.0 or fpr_list[-1] != 1.0:
tpr_list.append(1.0)
fpr_list.append(1.0)
threshold_list.append(0.0)
fpr = np.array(fpr_list)
tpr = np.array(tpr_list)
thresholds = np.array(threshold_list)
roc_auc = np.trapezoid(tpr, fpr)
# Optimal threshold (Youden's index) - same rule as backend
youden = tpr - fpr
best_idx = np.argmax(youden)
optimal_threshold = thresholds[best_idx]
# Plot with seaborn/matplotlib
sns.set_theme(style='whitegrid')
fig, ax = plt.subplots(figsize=(8, 6))
ax.plot(
fpr,
tpr,
color='#1565C0',
linewidth=2.5,
label=f'ROC Curve (AUC = {roc_auc:.4f})'
)
ax.fill_between(fpr, tpr, alpha=0.1, color='#1565C0')
ax.plot([0, 1], [0, 1], linestyle='--', color='gray', linewidth=1, label='Random Classifier')
ax.scatter(
fpr[best_idx],
tpr[best_idx],
color='red',
s=120,
marker='*',
label=f'Optimal (threshold={optimal_threshold:.3f})',
zorder=5
)
ax.set_title('ROC Curve')
ax.set_xlabel('False Positive Rate (1 - Specificity)')
ax.set_ylabel('True Positive Rate (Sensitivity)')
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.legend(loc='lower right')
plt.tight_layout()
plt.show()
print(f"AUC: {roc_auc:.4f}")
print(f"Optimal Threshold: {optimal_threshold:.4f}")
print(f"Sensitivity at optimal: {tpr[best_idx]:.4f}")
print(f"Specificity at optimal: {1 - fpr[best_idx]:.4f}")You need two columns: (1) actual binary labels (0/1 or two categories like Yes/No) and (2) predicted probabilities (numeric values between 0 and 1). The predicted probabilities typically come from a logistic regression or other classification model.
AUC represents the probability that a randomly chosen positive case will have a higher predicted probability than a randomly chosen negative case. An AUC of 0.5 means the model is no better than random guessing, while an AUC of 1.0 indicates perfect discrimination.
ROC curves are ideal for evaluating binary classifiers, comparing multiple models, selecting classification thresholds, and assessing diagnostic test accuracy. They are particularly useful when class distributions are relatively balanced. For imbalanced datasets, consider also using Precision-Recall curves.
ROC curves plot TPR vs FPR and are robust to class imbalance in the evaluation metric. Precision-Recall curves plot precision vs recall and are more informative when the positive class is rare. Both are valuable diagnostic tools for classification models.
Yes, overlaying ROC curves for different models on the same plot is a standard way to compare classifier performance. The model with the highest AUC (curve closest to the top-left corner) generally has the best discriminative ability.
An AUC below 0.5 suggests the model is performing worse than random chance, which typically means the positive and negative labels are swapped. Try inverting the positive label or checking your data for labeling errors.