This Logistic Regression Calculator helps you analyze binary outcome data and make classifications or predictions. It fits data to the model , providing comprehensive analysis including model coefficients, odds ratios, and performance metrics. Logistic regression is widely used in various fields including medicine (disease diagnosis), marketing (customer conversion), and finance (credit scoring). You can analyze both simple and multiple logistic regression models with one or more predictor variables. To learn about the data format required and test this calculator, click here to populate the sample data.
Calculator
1. Load Your Data
2. Select Columns & Options
Related Calculators
Learn More
Definition
Logistic Regression is a statistical method used to model the probability of a binary outcome based on one or more predictor variables. Unlike linear regression, logistic regression models the log-odds of an event as a linear combination of predictors, which constrains the predicted probabilities between 0 and 1.
Key Formulas for One Predictor
Logistic Model (Probability):
Logit Transformation (Log-odds):
Odds Ratio:
Decision Boundary (for classification):
where c is the probability cutoff (typically 0.5)
Key Formulas for Multiple Predictors
Logistic Model (Probability):
Logit Transformation (Log-odds):
Odds Ratio:
For the i-th predictor, representing the change in odds when increases by one unit, holding other predictors constant
Decision Boundary (for classification):
where c is the probability cutoff (typically 0.5)
(simplified for c = 0.5)
Key Assumptions
Practical Example of Logistic Regression with One Predictor
Step 1: Data
Consider a dataset of student exam scores and admission outcomes (1 = admitted, 0 = rejected):
| Exam Score (X) | Admitted (Y) |
|---|---|
| 35 | 0 |
| 42 | 0 |
| 57 | 0 |
| ⋮ | ⋮ |
| 78 | 1 |
| 93 | 1 |
Step 2: Fit Logistic Regression Model
After fitting a logistic regression model, we get:
Step 3: Interpret the Coefficients
The coefficient β₁ = 0.15 means that for each one-point increase in exam score, the log-odds of admission increase by 0.15.
Converting to odds ratio: OR = e^0.15 = 1.16
This means that for each one-point increase in exam score, the odds of admission increase by 16%.
Step 4: Calculate Probability for a New Student
For a student with an exam score of 70:
This student has a 45% probability of being admitted.
Step 5: Find the Decision Boundary
At what exam score is the probability of admission exactly 0.5?
Students scoring above 71.2 are more likely to be admitted than rejected.
Performance Metrics
A table comparing actual vs. predicted classifications:
| Actual Positive | Actual Negative | |
|---|---|---|
| Predicted Positive | True Positive (TP) | False Positive (FP) |
| Predicted Negative | False Negative (FN) | True Negative (TN) |
Proportion of correct predictions: (TP + TN) / (TP + FP + FN + TN)
Proportion of actual positives correctly identified: TP / (TP + FN)
Proportion of actual negatives correctly identified: TN / (TN + FP)
Measures the model's ability to distinguish between classes; ranges from 0.5 (no discrimination) to 1 (perfect discrimination)
How to Perform Logistic Regression with R
# Load required libraries
library(tidyverse)
library(pROC)
# Sample data - Predicting admission based on multiple factors
data <- tibble(
exam_score = c(42, 48, 51, 55, 58, 60, 62, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 89, 91, 93, 38, 45, 52, 64, 70, 76, 82, 88),
gpa = c(2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 3.7, 3.8, 3.9, 4.0, 3.9, 4.0, 2.1, 2.4, 2.8, 3.2, 3.5, 3.6, 3.8, 3.9),
study_hours = c(5, 8, 10, 12, 14, 15, 16, 18, 20, 22, 24, 25, 26, 28, 30,
32, 28, 30, 32, 35, 33, 36, 3, 6, 11, 17, 23, 27, 31, 34),
admitted = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1)
)
# Fit logistic regression model
model <- glm(admitted ~ exam_score + gpa + study_hours,
data = data, family = binomial)
# Display model summary
summary(model)
# Calculate and display odds ratios
odds_ratios <- exp(coef(model))
print("Odds Ratios:")
print(round(odds_ratios, 3))
# Make predictions
data$predicted_prob <- predict(model, type = "response")
data$predicted_class <- ifelse(data$predicted_prob > 0.5, 1, 0)
# Calculate ROC and AUC
roc_obj <- roc(data$admitted, data$predicted_prob)
print(paste("AUC:", round(auc(roc_obj), 3)))
# Confusion matrix
conf_matrix <- table(Predicted = data$predicted_class, Actual = data$admitted)
print("Confusion Matrix:")
print(conf_matrix)
# Accuracy
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
print(paste("Accuracy:", round(accuracy, 3)))
# Create visualization
ggplot(data, aes(x = exam_score, y = admitted)) +
geom_point(aes(color = factor(admitted)), size = 3) +
stat_smooth(method = "glm", method.args = list(family = "binomial"),
se = TRUE, color = "blue") +
labs(title = "Logistic Regression: Exam Score vs Admission",
x = "Exam Score",
y = "Probability of Admission",
color = "Admitted") +
scale_color_manual(values = c("0" = "red", "1" = "green")) +
theme_minimal()How to Perform Logistic Regression with Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.discrete.discrete_model import Logit
from sklearn.metrics import roc_curve, roc_auc_score, confusion_matrix, accuracy_score
import seaborn as sns
# Sample data - Predicting admission based on multiple factors
data = pd.DataFrame({
'exam_score': [42, 48, 51, 55, 58, 60, 62, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 89, 91, 93, 38, 45, 52, 64, 70, 76, 82, 88],
'gpa': [2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 3.7, 3.8, 3.9, 4.0, 3.9, 4.0, 2.1, 2.4, 2.8, 3.2, 3.5, 3.6, 3.8, 3.9],
'study_hours': [5, 8, 10, 12, 14, 15, 16, 18, 20, 22, 24, 25, 26, 28, 30,
32, 28, 30, 32, 35, 33, 36, 3, 6, 11, 17, 23, 27, 31, 34],
'admitted': [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1]
})
# Prepare data for model
X = sm.add_constant(data[['exam_score', 'gpa', 'study_hours']])
y = data['admitted']
# Fit logistic regression model
model = Logit(y, X)
results = model.fit()
# Display model summary
print(results.summary())
# Calculate odds ratios
odds_ratios = np.exp(results.params)
print("Odds Ratios:")
print(odds_ratios.round(3))
# Make predictions
data['predicted_prob'] = results.predict(X)
data['predicted_class'] = (data['predicted_prob'] > 0.5).astype(int)
# Calculate ROC and AUC
fpr, tpr, _ = roc_curve(y, data['predicted_prob'])
auc_score = roc_auc_score(y, data['predicted_prob'])
print(f"AUC: {auc_score:.3f}")
# Confusion matrix
conf_matrix = confusion_matrix(y, data['predicted_class'])
print("Confusion Matrix:")
print(conf_matrix)
# Accuracy
accuracy = accuracy_score(y, data['predicted_class'])
print(f"Accuracy: {accuracy:.3f}")
# Create visualization
plt.figure(figsize=(8, 6))
colors = ['red' if x == 0 else 'green' for x in data['admitted']]
plt.scatter(data['exam_score'], data['admitted'], c=colors, s=50, alpha=0.7)
sns.regplot(x='exam_score', y='admitted', data=data, logistic=True,
scatter=False, color='blue', ci=95)
plt.title('Logistic Regression: Exam Score vs Admission')
plt.xlabel('Exam Score')
plt.ylabel('Probability of Admission')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()