This Logistic Regression Calculator helps you analyze binary outcome data and make classifications or predictions. It fits data to the model , providing comprehensive analysis including model coefficients, odds ratios, and performance metrics. Logistic regression is widely used in various fields including medicine (disease diagnosis), marketing (customer conversion), and finance (credit scoring). You can analyze both simple and multiple logistic regression models with one or more predictor variables. To learn about the data format required and test this calculator, .
No variables available. Please enter data in the table above.
Logistic Regression is a statistical method used to model the probability of a binary outcome based on one or more predictor variables. Unlike linear regression, logistic regression models the log-odds of an event as a linear combination of predictors, which constrains the predicted probabilities between 0 and 1.
Logistic Model (Probability):
Logit Transformation (Log-odds):
Odds Ratio:
Decision Boundary (for classification):
where c is the probability cutoff (typically 0.5)
Logistic Model (Probability):
Logit Transformation (Log-odds):
Odds Ratio:
For the i-th predictor, representing the change in odds when increases by one unit, holding other predictors constant
Decision Boundary (for classification):
where c is the probability cutoff (typically 0.5)
(simplified for c = 0.5)
Consider a dataset of student exam scores and admission outcomes (1 = admitted, 0 = rejected):
| Exam Score (X) | Admitted (Y) |
|---|---|
| 35 | 0 |
| 42 | 0 |
| 57 | 0 |
| ⋮ | ⋮ |
| 78 | 1 |
| 93 | 1 |
After fitting a logistic regression model, we get:
The coefficient β₁ = 0.15 means that for each one-point increase in exam score, the log-odds of admission increase by 0.15.
Converting to odds ratio: OR = e^0.15 = 1.16
This means that for each one-point increase in exam score, the odds of admission increase by 16%.
For a student with an exam score of 70:
This student has a 45% probability of being admitted.
At what exam score is the probability of admission exactly 0.5?
Students scoring above 71.2 are more likely to be admitted than rejected.
A table comparing actual vs. predicted classifications:
| Actual Positive | Actual Negative | |
|---|---|---|
| Predicted Positive | True Positive (TP) | False Positive (FP) |
| Predicted Negative | False Negative (FN) | True Negative (TN) |
Proportion of correct predictions: (TP + TN) / (TP + FP + FN + TN)
Proportion of actual positives correctly identified: TP / (TP + FN)
Proportion of actual negatives correctly identified: TN / (TN + FP)
Measures the model's ability to distinguish between classes; ranges from 0.5 (no discrimination) to 1 (perfect discrimination)
# Load required libraries
library(tidyverse)
library(pROC)
# Sample data - Predicting admission based on multiple factors
data <- tibble(
exam_score = c(42, 48, 51, 55, 58, 60, 62, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 89, 91, 93, 38, 45, 52, 64, 70, 76, 82, 88),
gpa = c(2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 3.7, 3.8, 3.9, 4.0, 3.9, 4.0, 2.1, 2.4, 2.8, 3.2, 3.5, 3.6, 3.8, 3.9),
study_hours = c(5, 8, 10, 12, 14, 15, 16, 18, 20, 22, 24, 25, 26, 28, 30,
32, 28, 30, 32, 35, 33, 36, 3, 6, 11, 17, 23, 27, 31, 34),
admitted = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1)
)
# Fit logistic regression model
model <- glm(admitted ~ exam_score + gpa + study_hours,
data = data, family = binomial)
# Display model summary
summary(model)
# Calculate and display odds ratios
odds_ratios <- exp(coef(model))
print("Odds Ratios:")
print(round(odds_ratios, 3))
# Make predictions
data$predicted_prob <- predict(model, type = "response")
data$predicted_class <- ifelse(data$predicted_prob > 0.5, 1, 0)
# Calculate ROC and AUC
roc_obj <- roc(data$admitted, data$predicted_prob)
print(paste("AUC:", round(auc(roc_obj), 3)))
# Confusion matrix
conf_matrix <- table(Predicted = data$predicted_class, Actual = data$admitted)
print("Confusion Matrix:")
print(conf_matrix)
# Accuracy
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
print(paste("Accuracy:", round(accuracy, 3)))
# Create visualization
ggplot(data, aes(x = exam_score, y = admitted)) +
geom_point(aes(color = factor(admitted)), size = 3) +
stat_smooth(method = "glm", method.args = list(family = "binomial"),
se = TRUE, color = "blue") +
labs(title = "Logistic Regression: Exam Score vs Admission",
x = "Exam Score",
y = "Probability of Admission",
color = "Admitted") +
scale_color_manual(values = c("0" = "red", "1" = "green")) +
theme_minimal()import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.discrete.discrete_model import Logit
from sklearn.metrics import roc_curve, roc_auc_score, confusion_matrix, accuracy_score
import seaborn as sns
# Sample data - Predicting admission based on multiple factors
data = pd.DataFrame({
'exam_score': [42, 48, 51, 55, 58, 60, 62, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 89, 91, 93, 38, 45, 52, 64, 70, 76, 82, 88],
'gpa': [2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 3.7, 3.8, 3.9, 4.0, 3.9, 4.0, 2.1, 2.4, 2.8, 3.2, 3.5, 3.6, 3.8, 3.9],
'study_hours': [5, 8, 10, 12, 14, 15, 16, 18, 20, 22, 24, 25, 26, 28, 30,
32, 28, 30, 32, 35, 33, 36, 3, 6, 11, 17, 23, 27, 31, 34],
'admitted': [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1]
})
# Prepare data for model
X = sm.add_constant(data[['exam_score', 'gpa', 'study_hours']])
y = data['admitted']
# Fit logistic regression model
model = Logit(y, X)
results = model.fit()
# Display model summary
print(results.summary())
# Calculate odds ratios
odds_ratios = np.exp(results.params)
print("Odds Ratios:")
print(odds_ratios.round(3))
# Make predictions
data['predicted_prob'] = results.predict(X)
data['predicted_class'] = (data['predicted_prob'] > 0.5).astype(int)
# Calculate ROC and AUC
fpr, tpr, _ = roc_curve(y, data['predicted_prob'])
auc_score = roc_auc_score(y, data['predicted_prob'])
print(f"AUC: {auc_score:.3f}")
# Confusion matrix
conf_matrix = confusion_matrix(y, data['predicted_class'])
print("Confusion Matrix:")
print(conf_matrix)
# Accuracy
accuracy = accuracy_score(y, data['predicted_class'])
print(f"Accuracy: {accuracy:.3f}")
# Create visualization
plt.figure(figsize=(8, 6))
colors = ['red' if x == 0 else 'green' for x in data['admitted']]
plt.scatter(data['exam_score'], data['admitted'], c=colors, s=50, alpha=0.7)
sns.regplot(x='exam_score', y='admitted', data=data, logistic=True,
scatter=False, color='blue', ci=95)
plt.title('Logistic Regression: Exam Score vs Admission')
plt.xlabel('Exam Score')
plt.ylabel('Probability of Admission')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()This Logistic Regression Calculator helps you analyze binary outcome data and make classifications or predictions. It fits data to the model , providing comprehensive analysis including model coefficients, odds ratios, and performance metrics. Logistic regression is widely used in various fields including medicine (disease diagnosis), marketing (customer conversion), and finance (credit scoring). You can analyze both simple and multiple logistic regression models with one or more predictor variables. To learn about the data format required and test this calculator, .
No variables available. Please enter data in the table above.
Logistic Regression is a statistical method used to model the probability of a binary outcome based on one or more predictor variables. Unlike linear regression, logistic regression models the log-odds of an event as a linear combination of predictors, which constrains the predicted probabilities between 0 and 1.
Logistic Model (Probability):
Logit Transformation (Log-odds):
Odds Ratio:
Decision Boundary (for classification):
where c is the probability cutoff (typically 0.5)
Logistic Model (Probability):
Logit Transformation (Log-odds):
Odds Ratio:
For the i-th predictor, representing the change in odds when increases by one unit, holding other predictors constant
Decision Boundary (for classification):
where c is the probability cutoff (typically 0.5)
(simplified for c = 0.5)
Consider a dataset of student exam scores and admission outcomes (1 = admitted, 0 = rejected):
| Exam Score (X) | Admitted (Y) |
|---|---|
| 35 | 0 |
| 42 | 0 |
| 57 | 0 |
| ⋮ | ⋮ |
| 78 | 1 |
| 93 | 1 |
After fitting a logistic regression model, we get:
The coefficient β₁ = 0.15 means that for each one-point increase in exam score, the log-odds of admission increase by 0.15.
Converting to odds ratio: OR = e^0.15 = 1.16
This means that for each one-point increase in exam score, the odds of admission increase by 16%.
For a student with an exam score of 70:
This student has a 45% probability of being admitted.
At what exam score is the probability of admission exactly 0.5?
Students scoring above 71.2 are more likely to be admitted than rejected.
A table comparing actual vs. predicted classifications:
| Actual Positive | Actual Negative | |
|---|---|---|
| Predicted Positive | True Positive (TP) | False Positive (FP) |
| Predicted Negative | False Negative (FN) | True Negative (TN) |
Proportion of correct predictions: (TP + TN) / (TP + FP + FN + TN)
Proportion of actual positives correctly identified: TP / (TP + FN)
Proportion of actual negatives correctly identified: TN / (TN + FP)
Measures the model's ability to distinguish between classes; ranges from 0.5 (no discrimination) to 1 (perfect discrimination)
# Load required libraries
library(tidyverse)
library(pROC)
# Sample data - Predicting admission based on multiple factors
data <- tibble(
exam_score = c(42, 48, 51, 55, 58, 60, 62, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 89, 91, 93, 38, 45, 52, 64, 70, 76, 82, 88),
gpa = c(2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 3.7, 3.8, 3.9, 4.0, 3.9, 4.0, 2.1, 2.4, 2.8, 3.2, 3.5, 3.6, 3.8, 3.9),
study_hours = c(5, 8, 10, 12, 14, 15, 16, 18, 20, 22, 24, 25, 26, 28, 30,
32, 28, 30, 32, 35, 33, 36, 3, 6, 11, 17, 23, 27, 31, 34),
admitted = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1)
)
# Fit logistic regression model
model <- glm(admitted ~ exam_score + gpa + study_hours,
data = data, family = binomial)
# Display model summary
summary(model)
# Calculate and display odds ratios
odds_ratios <- exp(coef(model))
print("Odds Ratios:")
print(round(odds_ratios, 3))
# Make predictions
data$predicted_prob <- predict(model, type = "response")
data$predicted_class <- ifelse(data$predicted_prob > 0.5, 1, 0)
# Calculate ROC and AUC
roc_obj <- roc(data$admitted, data$predicted_prob)
print(paste("AUC:", round(auc(roc_obj), 3)))
# Confusion matrix
conf_matrix <- table(Predicted = data$predicted_class, Actual = data$admitted)
print("Confusion Matrix:")
print(conf_matrix)
# Accuracy
accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)
print(paste("Accuracy:", round(accuracy, 3)))
# Create visualization
ggplot(data, aes(x = exam_score, y = admitted)) +
geom_point(aes(color = factor(admitted)), size = 3) +
stat_smooth(method = "glm", method.args = list(family = "binomial"),
se = TRUE, color = "blue") +
labs(title = "Logistic Regression: Exam Score vs Admission",
x = "Exam Score",
y = "Probability of Admission",
color = "Admitted") +
scale_color_manual(values = c("0" = "red", "1" = "green")) +
theme_minimal()import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.discrete.discrete_model import Logit
from sklearn.metrics import roc_curve, roc_auc_score, confusion_matrix, accuracy_score
import seaborn as sns
# Sample data - Predicting admission based on multiple factors
data = pd.DataFrame({
'exam_score': [42, 48, 51, 55, 58, 60, 62, 65, 67, 69, 71, 73, 75, 77, 79,
81, 83, 85, 87, 89, 91, 93, 38, 45, 52, 64, 70, 76, 82, 88],
'gpa': [2.3, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0, 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 3.7, 3.8, 3.9, 4.0, 3.9, 4.0, 2.1, 2.4, 2.8, 3.2, 3.5, 3.6, 3.8, 3.9],
'study_hours': [5, 8, 10, 12, 14, 15, 16, 18, 20, 22, 24, 25, 26, 28, 30,
32, 28, 30, 32, 35, 33, 36, 3, 6, 11, 17, 23, 27, 31, 34],
'admitted': [0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1]
})
# Prepare data for model
X = sm.add_constant(data[['exam_score', 'gpa', 'study_hours']])
y = data['admitted']
# Fit logistic regression model
model = Logit(y, X)
results = model.fit()
# Display model summary
print(results.summary())
# Calculate odds ratios
odds_ratios = np.exp(results.params)
print("Odds Ratios:")
print(odds_ratios.round(3))
# Make predictions
data['predicted_prob'] = results.predict(X)
data['predicted_class'] = (data['predicted_prob'] > 0.5).astype(int)
# Calculate ROC and AUC
fpr, tpr, _ = roc_curve(y, data['predicted_prob'])
auc_score = roc_auc_score(y, data['predicted_prob'])
print(f"AUC: {auc_score:.3f}")
# Confusion matrix
conf_matrix = confusion_matrix(y, data['predicted_class'])
print("Confusion Matrix:")
print(conf_matrix)
# Accuracy
accuracy = accuracy_score(y, data['predicted_class'])
print(f"Accuracy: {accuracy:.3f}")
# Create visualization
plt.figure(figsize=(8, 6))
colors = ['red' if x == 0 else 'green' for x in data['admitted']]
plt.scatter(data['exam_score'], data['admitted'], c=colors, s=50, alpha=0.7)
sns.regplot(x='exam_score', y='admitted', data=data, logistic=True,
scatter=False, color='blue', ci=95)
plt.title('Logistic Regression: Exam Score vs Admission')
plt.xlabel('Exam Score')
plt.ylabel('Probability of Admission')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()