This calculator performs comprehensive Discriminant Analysis, including both Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA). These are powerful classification techniques used to predict group membership based on predictor variables and to understand which variables best discriminate between groups.
💡 Pro Tip: LDA assumes equal covariance matrices across groups (homogeneity of variance-covariance), while QDA allows different covariances. Use LDA when groups have similar spread, and QDA when they differ. LDA is more stable with small samples, while QDA is more flexible but requires more data. For regression-like problems, consider Logistic Regression.
Ready to classify your data? (Iris flower species classification) to see discriminant analysis in action, or upload your own data to predict group membership and understand what distinguishes your groups.
The variable that defines the groups you want to classify (e.g., species, diagnosis, category)
Selected: 0 of 0 features
Proportion of data to use for testing (0.1 - 0.5)
How to weight group sizes in classification
For reproducible results (default: 42)
Discriminant Analysis is a multivariate statistical technique used to classify observations into predefined groups based on predictor variables. It finds linear (LDA) or quadratic (QDA) combinations of features that best separate the groups, making it ideal for prediction and understanding which variables distinguish between groups.
Use Discriminant Analysis when you want to:
Linear Discriminant Analysis (LDA):
Quadratic Discriminant Analysis (QDA):
💡 Recommendation: Run both and compare! This calculator provides both analyses so you can see which performs better for your data.
library(MASS)
library(tidyverse)
# Iris dataset (species classification)
data <- iris
# Linear Discriminant Analysis
lda_model <- lda(Species ~ Sepal.Length + Sepal.Width +
Petal.Length + Petal.Width, data = data)
# View results
print(lda_model)
# Predictions
predictions <- predict(lda_model, data)
table(Predicted = predictions$class, Actual = data$Species)
# Quadratic Discriminant Analysis
qda_model <- qda(Species ~ Sepal.Length + Sepal.Width +
Petal.Length + Petal.Width, data = data)
# View results
print(qda_model)
# Predictions
qda_predictions <- predict(qda_model, data)
table(Predicted = qda_predictions$class, Actual = data$Species)
# Plot LDA
plot(lda_model)import pandas as pd
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# Load iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
# Split data (random_state for reproducibility)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Linear Discriminant Analysis
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)
# Predictions
y_pred_lda = lda.predict(X_test)
# Accuracy
print(f"LDA Accuracy: {lda.score(X_test, y_test):.3f}")
# Cross-validation (with reproducible seed)
cv_scores = cross_val_score(
lda, X_train, y_train,
cv=KFold(n_splits=5, shuffle=True, random_state=42)
)
print(f"LDA CV Accuracy: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")
# Confusion Matrix
print("\nLDA Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_lda))
# Classification Report
print("\nLDA Classification Report:")
print(classification_report(y_test, y_pred_lda,
target_names=iris.target_names))
# Quadratic Discriminant Analysis
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)
# Predictions
y_pred_qda = qda.predict(X_test)
# Accuracy
print(f"\nQDA Accuracy: {qda.score(X_test, y_test):.3f}")
# Confusion Matrix
print("\nQDA Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_qda))
# Plot decision boundaries (for 2D projection)
X_lda = lda.transform(X)
plt.figure(figsize=(10, 6))
for i, target_name in enumerate(iris.target_names):
plt.scatter(X_lda[y == i, 0], X_lda[y == i, 1],
label=target_name, alpha=0.8)
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.title('LDA Projection')
plt.legend()
plt.show()This calculator performs comprehensive Discriminant Analysis, including both Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA). These are powerful classification techniques used to predict group membership based on predictor variables and to understand which variables best discriminate between groups.
💡 Pro Tip: LDA assumes equal covariance matrices across groups (homogeneity of variance-covariance), while QDA allows different covariances. Use LDA when groups have similar spread, and QDA when they differ. LDA is more stable with small samples, while QDA is more flexible but requires more data. For regression-like problems, consider Logistic Regression.
Ready to classify your data? (Iris flower species classification) to see discriminant analysis in action, or upload your own data to predict group membership and understand what distinguishes your groups.
The variable that defines the groups you want to classify (e.g., species, diagnosis, category)
Selected: 0 of 0 features
Proportion of data to use for testing (0.1 - 0.5)
How to weight group sizes in classification
For reproducible results (default: 42)
Discriminant Analysis is a multivariate statistical technique used to classify observations into predefined groups based on predictor variables. It finds linear (LDA) or quadratic (QDA) combinations of features that best separate the groups, making it ideal for prediction and understanding which variables distinguish between groups.
Use Discriminant Analysis when you want to:
Linear Discriminant Analysis (LDA):
Quadratic Discriminant Analysis (QDA):
💡 Recommendation: Run both and compare! This calculator provides both analyses so you can see which performs better for your data.
library(MASS)
library(tidyverse)
# Iris dataset (species classification)
data <- iris
# Linear Discriminant Analysis
lda_model <- lda(Species ~ Sepal.Length + Sepal.Width +
Petal.Length + Petal.Width, data = data)
# View results
print(lda_model)
# Predictions
predictions <- predict(lda_model, data)
table(Predicted = predictions$class, Actual = data$Species)
# Quadratic Discriminant Analysis
qda_model <- qda(Species ~ Sepal.Length + Sepal.Width +
Petal.Length + Petal.Width, data = data)
# View results
print(qda_model)
# Predictions
qda_predictions <- predict(qda_model, data)
table(Predicted = qda_predictions$class, Actual = data$Species)
# Plot LDA
plot(lda_model)import pandas as pd
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.metrics import classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns
# Load iris dataset
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
# Split data (random_state for reproducibility)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
# Linear Discriminant Analysis
lda = LinearDiscriminantAnalysis()
lda.fit(X_train, y_train)
# Predictions
y_pred_lda = lda.predict(X_test)
# Accuracy
print(f"LDA Accuracy: {lda.score(X_test, y_test):.3f}")
# Cross-validation (with reproducible seed)
cv_scores = cross_val_score(
lda, X_train, y_train,
cv=KFold(n_splits=5, shuffle=True, random_state=42)
)
print(f"LDA CV Accuracy: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")
# Confusion Matrix
print("\nLDA Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_lda))
# Classification Report
print("\nLDA Classification Report:")
print(classification_report(y_test, y_pred_lda,
target_names=iris.target_names))
# Quadratic Discriminant Analysis
qda = QuadraticDiscriminantAnalysis()
qda.fit(X_train, y_train)
# Predictions
y_pred_qda = qda.predict(X_test)
# Accuracy
print(f"\nQDA Accuracy: {qda.score(X_test, y_test):.3f}")
# Confusion Matrix
print("\nQDA Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_qda))
# Plot decision boundaries (for 2D projection)
X_lda = lda.transform(X)
plt.figure(figsize=(10, 6))
for i, target_name in enumerate(iris.target_names):
plt.scatter(X_lda[y == i, 0], X_lda[y == i, 1],
label=target_name, alpha=0.8)
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.title('LDA Projection')
plt.legend()
plt.show()