This calculator performs comprehensive Canonical Correlation Analysis (CCA), a multivariate statistical method used to identify and measure the associations between two sets of variables. CCA finds linear combinations of variables in each set that have maximum correlation with each other.
Pro Tip: CCA is useful when you want to understand relationships between two groups of variables (e.g., academic performance vs. psychological factors, physiological measures vs. behavioral outcomes). Each set should contain at least 2 variables. Larger sample sizes (n ≥ 20 per variable) provide more reliable results.
Ready to explore multivariate relationships? (academic performance and psychological factors) to see CCA in action, or upload your own data to discover relationships between your variable sets.
Important: Assign variables to Set 1 and Set 2 based on your research question. Each variable can only belong to one set. Each set should contain at least 2 variables for meaningful analysis.
Select variables for the first set (e.g., academic performance measures)
Select variables for the second set (e.g., psychological factors)
Canonical Correlation Analysis (CCA) is a multivariate statistical technique that identifies and quantifies the associations between two sets of variables. It finds pairs of linear combinations (canonical variates) from each set that have maximum correlation with each other, allowing researchers to understand complex relationships between multiple predictors and multiple outcomes.
Use Canonical Correlation Analysis when you want to:
Minimum n ≥ 20 per variable in the larger set. For reliable results, aim for n ≥ 10 × (total number of variables). Smaller samples may produce unstable estimates.
Focus on the first 1-3 canonical functions, as later functions often explain trivial variance and may not be practically meaningful even if statistically significant.
library(CCA)
library(CCP)
library(tidyverse)
# Academic performance and psychological factors data
df <- tibble(
math = c(85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93),
reading = c(88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95),
writing = c(82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91),
motivation = c(7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9),
anxiety = c(3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0),
self_efficacy = c(8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3)
)
# Define variable sets
# Set 1: Academic Performance (math, reading, writing)
X <- df |> select(math, reading, writing)
# Set 2: Psychological Factors (motivation, anxiety, self_efficacy)
Y <- df |> select(motivation, anxiety, self_efficacy)
# Perform Canonical Correlation Analysis
cc_result <- cc(as.matrix(X), as.matrix(Y))
# Display canonical correlations
print("Canonical Correlations:")
print(cc_result$cor)
# Canonical coefficients (standardized)
print("Canonical Coefficients for Set 1 (X):")
print(cc_result$xcoef)
print("Canonical Coefficients for Set 2 (Y):")
print(cc_result$ycoef)
# Calculate canonical loadings (structure coefficients)
X_std <- scale(X)
Y_std <- scale(Y)
# Canonical variates
U <- X_std %*% cc_result$xcoef
V <- Y_std %*% cc_result$ycoef
# Loadings (correlations between original variables and canonical variates)
loadings_X <- cor(X_std, U)
loadings_Y <- cor(Y_std, V)
print("Canonical Loadings for Set 1 (X):")
print(loadings_X)
print("Canonical Loadings for Set 2 (Y):")
print(loadings_Y)
# Test significance using Wilks' Lambda
n <- nrow(df)
p <- ncol(X)
q <- ncol(Y)
# Use CCP package for significance testing
sig_test <- p.asym(cc_result$cor, n, p, q)
print("Significance Tests:")
print(sig_test)
# Redundancy analysis
redundancy_X <- colSums(loadings_X^2) / p * cc_result$cor^2
redundancy_Y <- colSums(loadings_Y^2) / q * cc_result$cor^2
print("Redundancy (variance in X explained by Y's canonical variates):")
print(redundancy_X)
print("Redundancy (variance in Y explained by X's canonical variates):")
print(redundancy_Y)import pandas as pd
import numpy as np
from sklearn.cross_decomposition import CCA
from scipy import stats
import matplotlib.pyplot as plt
# Academic performance and psychological factors data
data = pd.DataFrame({
'math': [85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93],
'reading': [88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95],
'writing': [82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91],
'motivation': [7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9],
'anxiety': [3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0],
'self_efficacy': [8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3]
})
# Define variable sets
# Set 1: Academic Performance
X = data[['math', 'reading', 'writing']].values
# Set 2: Psychological Factors
Y = data[['motivation', 'anxiety', 'self_efficacy']].values
# Standardize the data
from sklearn.preprocessing import StandardScaler
scaler_X = StandardScaler()
scaler_Y = StandardScaler()
X_std = scaler_X.fit_transform(X)
Y_std = scaler_Y.fit_transform(Y)
# Perform Canonical Correlation Analysis
n_components = min(X.shape[1], Y.shape[1])
cca = CCA(n_components=n_components)
cca.fit(X_std, Y_std)
# Transform to canonical variates
X_c, Y_c = cca.transform(X_std, Y_std)
# Calculate canonical correlations
canonical_correlations = [np.corrcoef(X_c[:, i], Y_c[:, i])[0, 1]
for i in range(n_components)]
print("Canonical Correlations:")
print(canonical_correlations)
# Canonical coefficients
print("\nCanonical Coefficients for Set 1 (X):")
print(cca.x_weights_)
print("\nCanonical Coefficients for Set 2 (Y):")
print(cca.y_weights_)
# Calculate canonical loadings (structure coefficients)
loadings_X = np.corrcoef(X_std.T, X_c.T)[:X.shape[1], X.shape[1]:]
loadings_Y = np.corrcoef(Y_std.T, Y_c.T)[:Y.shape[1], Y.shape[1]:]
print("\nCanonical Loadings for Set 1 (X):")
print(loadings_X)
print("\nCanonical Loadings for Set 2 (Y):")
print(loadings_Y)
# Wilks' Lambda test for significance
def wilks_lambda_test(canonical_corrs, n, p, q):
"""
Test significance of canonical correlations using Wilks' Lambda
n: sample size
p: number of variables in set 1
q: number of variables in set 2
"""
m = min(p, q)
lambda_vals = []
for k in range(m):
# Product of (1 - r^2) for remaining correlations
lambda_k = np.prod([1 - r**2 for r in canonical_corrs[k:]])
lambda_vals.append(lambda_k)
# Chi-square approximation
df = (p - k) * (q - k)
chi_sq = -(n - 1 - (p + q + 1) / 2) * np.log(lambda_k)
p_value = 1 - stats.chi2.cdf(chi_sq, df)
print(f"\nFunction {k+1}:")
print(f" Wilks' Lambda: {lambda_k:.4f}")
print(f" Chi-square: {chi_sq:.4f}")
print(f" df: {df}")
print(f" p-value: {p_value:.4f}")
print("\nSignificance Tests:")
wilks_lambda_test(canonical_correlations, len(data), X.shape[1], Y.shape[1])This calculator performs comprehensive Canonical Correlation Analysis (CCA), a multivariate statistical method used to identify and measure the associations between two sets of variables. CCA finds linear combinations of variables in each set that have maximum correlation with each other.
Pro Tip: CCA is useful when you want to understand relationships between two groups of variables (e.g., academic performance vs. psychological factors, physiological measures vs. behavioral outcomes). Each set should contain at least 2 variables. Larger sample sizes (n ≥ 20 per variable) provide more reliable results.
Ready to explore multivariate relationships? (academic performance and psychological factors) to see CCA in action, or upload your own data to discover relationships between your variable sets.
Important: Assign variables to Set 1 and Set 2 based on your research question. Each variable can only belong to one set. Each set should contain at least 2 variables for meaningful analysis.
Select variables for the first set (e.g., academic performance measures)
Select variables for the second set (e.g., psychological factors)
Canonical Correlation Analysis (CCA) is a multivariate statistical technique that identifies and quantifies the associations between two sets of variables. It finds pairs of linear combinations (canonical variates) from each set that have maximum correlation with each other, allowing researchers to understand complex relationships between multiple predictors and multiple outcomes.
Use Canonical Correlation Analysis when you want to:
Minimum n ≥ 20 per variable in the larger set. For reliable results, aim for n ≥ 10 × (total number of variables). Smaller samples may produce unstable estimates.
Focus on the first 1-3 canonical functions, as later functions often explain trivial variance and may not be practically meaningful even if statistically significant.
library(CCA)
library(CCP)
library(tidyverse)
# Academic performance and psychological factors data
df <- tibble(
math = c(85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93),
reading = c(88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95),
writing = c(82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91),
motivation = c(7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9),
anxiety = c(3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0),
self_efficacy = c(8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3)
)
# Define variable sets
# Set 1: Academic Performance (math, reading, writing)
X <- df |> select(math, reading, writing)
# Set 2: Psychological Factors (motivation, anxiety, self_efficacy)
Y <- df |> select(motivation, anxiety, self_efficacy)
# Perform Canonical Correlation Analysis
cc_result <- cc(as.matrix(X), as.matrix(Y))
# Display canonical correlations
print("Canonical Correlations:")
print(cc_result$cor)
# Canonical coefficients (standardized)
print("Canonical Coefficients for Set 1 (X):")
print(cc_result$xcoef)
print("Canonical Coefficients for Set 2 (Y):")
print(cc_result$ycoef)
# Calculate canonical loadings (structure coefficients)
X_std <- scale(X)
Y_std <- scale(Y)
# Canonical variates
U <- X_std %*% cc_result$xcoef
V <- Y_std %*% cc_result$ycoef
# Loadings (correlations between original variables and canonical variates)
loadings_X <- cor(X_std, U)
loadings_Y <- cor(Y_std, V)
print("Canonical Loadings for Set 1 (X):")
print(loadings_X)
print("Canonical Loadings for Set 2 (Y):")
print(loadings_Y)
# Test significance using Wilks' Lambda
n <- nrow(df)
p <- ncol(X)
q <- ncol(Y)
# Use CCP package for significance testing
sig_test <- p.asym(cc_result$cor, n, p, q)
print("Significance Tests:")
print(sig_test)
# Redundancy analysis
redundancy_X <- colSums(loadings_X^2) / p * cc_result$cor^2
redundancy_Y <- colSums(loadings_Y^2) / q * cc_result$cor^2
print("Redundancy (variance in X explained by Y's canonical variates):")
print(redundancy_X)
print("Redundancy (variance in Y explained by X's canonical variates):")
print(redundancy_Y)import pandas as pd
import numpy as np
from sklearn.cross_decomposition import CCA
from scipy import stats
import matplotlib.pyplot as plt
# Academic performance and psychological factors data
data = pd.DataFrame({
'math': [85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93],
'reading': [88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95],
'writing': [82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91],
'motivation': [7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9],
'anxiety': [3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0],
'self_efficacy': [8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3]
})
# Define variable sets
# Set 1: Academic Performance
X = data[['math', 'reading', 'writing']].values
# Set 2: Psychological Factors
Y = data[['motivation', 'anxiety', 'self_efficacy']].values
# Standardize the data
from sklearn.preprocessing import StandardScaler
scaler_X = StandardScaler()
scaler_Y = StandardScaler()
X_std = scaler_X.fit_transform(X)
Y_std = scaler_Y.fit_transform(Y)
# Perform Canonical Correlation Analysis
n_components = min(X.shape[1], Y.shape[1])
cca = CCA(n_components=n_components)
cca.fit(X_std, Y_std)
# Transform to canonical variates
X_c, Y_c = cca.transform(X_std, Y_std)
# Calculate canonical correlations
canonical_correlations = [np.corrcoef(X_c[:, i], Y_c[:, i])[0, 1]
for i in range(n_components)]
print("Canonical Correlations:")
print(canonical_correlations)
# Canonical coefficients
print("\nCanonical Coefficients for Set 1 (X):")
print(cca.x_weights_)
print("\nCanonical Coefficients for Set 2 (Y):")
print(cca.y_weights_)
# Calculate canonical loadings (structure coefficients)
loadings_X = np.corrcoef(X_std.T, X_c.T)[:X.shape[1], X.shape[1]:]
loadings_Y = np.corrcoef(Y_std.T, Y_c.T)[:Y.shape[1], Y.shape[1]:]
print("\nCanonical Loadings for Set 1 (X):")
print(loadings_X)
print("\nCanonical Loadings for Set 2 (Y):")
print(loadings_Y)
# Wilks' Lambda test for significance
def wilks_lambda_test(canonical_corrs, n, p, q):
"""
Test significance of canonical correlations using Wilks' Lambda
n: sample size
p: number of variables in set 1
q: number of variables in set 2
"""
m = min(p, q)
lambda_vals = []
for k in range(m):
# Product of (1 - r^2) for remaining correlations
lambda_k = np.prod([1 - r**2 for r in canonical_corrs[k:]])
lambda_vals.append(lambda_k)
# Chi-square approximation
df = (p - k) * (q - k)
chi_sq = -(n - 1 - (p + q + 1) / 2) * np.log(lambda_k)
p_value = 1 - stats.chi2.cdf(chi_sq, df)
print(f"\nFunction {k+1}:")
print(f" Wilks' Lambda: {lambda_k:.4f}")
print(f" Chi-square: {chi_sq:.4f}")
print(f" df: {df}")
print(f" p-value: {p_value:.4f}")
print("\nSignificance Tests:")
wilks_lambda_test(canonical_correlations, len(data), X.shape[1], Y.shape[1])