Canonical Correlation Analysis (CCA)

Created:December 24, 2025

Last Updated:December 24, 2025

This calculator performs comprehensive Canonical Correlation Analysis (CCA), a multivariate statistical method used to identify and measure the associations between two sets of variables. CCA finds linear combinations of variables in each set that have maximum correlation with each other.

What You'll Get:

Canonical Correlations: Correlation coefficients between canonical variates
Canonical Coefficients: Weights for creating canonical variates from original variables
Canonical Loadings: Structure coefficients showing relationships between original variables and canonical variates
Significance Tests: Wilks' Lambda test for each canonical function
Redundancy Analysis: Proportion of variance in one set explained by the other set
Visualizations: Heatmaps and bar charts showing canonical loadings and correlations
APA-Formatted Report: Professional statistical reporting ready for publication

Pro Tip: CCA is useful when you want to understand relationships between two groups of variables (e.g., academic performance vs. psychological factors, physiological measures vs. behavioral outcomes). Each set should contain at least 2 variables. Larger sample sizes (n ≥ 20 per variable) provide more reliable results.

Ready to explore multivariate relationships? (academic performance and psychological factors) to see CCA in action, or upload your own data to discover relationships between your variable sets.

Calculator

1. Load Your Data

2. Define Variable Sets

Important: Assign variables to Set 1 and Set 2 based on your research question. Each variable can only belong to one set. Each set should contain at least 2 variables for meaningful analysis.

Set 1 Variables (0)

Select variables for the first set (e.g., academic performance measures)

Set 2 Variables (0)

Select variables for the second set (e.g., psychological factors)

3. Analysis Options

Standardize Data (Recommended)

Related Calculators

Correlation Coefficient Calculator

Multiple Regression Calculator

Principal Component Analysis Calculator

Confirmatory Factor Analysis Calculator

Learn More

Definition

Canonical Correlation Analysis (CCA) is a multivariate statistical technique that identifies and quantifies the associations between two sets of variables. It finds pairs of linear combinations (canonical variates) from each set that have maximum correlation with each other, allowing researchers to understand complex relationships between multiple predictors and multiple outcomes.

When to Use CCA

Use Canonical Correlation Analysis when you want to:

Explore multivariate relationships: Understand how two sets of variables relate to each other
Reduce dimensionality: Summarize complex relationships using fewer canonical dimensions
Test theoretical models: Examine relationships between constructs measured by multiple indicators
Predict multiple outcomes: Analyze how multiple predictors relate to multiple dependent variables
Compare variable sets: Determine which variables contribute most to relationships between sets

Understanding CCA Output

Canonical Correlations: Range from 0 to 1, with higher values indicating stronger relationships between variable sets
Canonical Coefficients: Weights used to create canonical variates (similar to regression coefficients)
Canonical Loadings: Correlations between original variables and canonical variates (easier to interpret than coefficients)
Wilks' Lambda: Tests statistical significance; values closer to 0 indicate stronger relationships (p < .05 suggests significance)

Interpreting CCA Results

Canonical Correlations

≥ 0.70: Strong relationship
0.40 - 0.70: Moderate relationship
< 0.40: Weak relationship
First canonical correlation is always the largest; subsequent ones are progressively smaller

Canonical Loadings

|r| ≥ 0.45: Variable substantially contributes to the canonical variate
|r| 0.30 - 0.45: Moderate contribution
|r| < 0.30: Minimal contribution
Loadings are easier to interpret than canonical coefficients

Significance Testing

Wilks' Lambda: Values closer to 0 indicate stronger relationships
p < .05: Canonical function is statistically significant
Test each canonical function sequentially; stop when non-significant

Redundancy Analysis

Indicates proportion of variance in one set explained by the other set
≥ 10%: Meaningful redundancy
Low redundancy suggests sets share little common variance

Assumptions and Considerations

Key Assumptions:

Linear relationships between variables
Multivariate normality (especially for significance tests)
Homoscedasticity of residuals
No extreme multicollinearity within sets

Sample Size Requirements:

Minimum n ≥ 20 per variable in the larger set. For reliable results, aim for n ≥ 10 × (total number of variables). Smaller samples may produce unstable estimates.

Note on Interpretation:

Focus on the first 1-3 canonical functions, as later functions often explain trivial variance and may not be practically meaningful even if statistically significant.

Example Code

R Code (CCA & CCP packages)

library(CCA)
library(CCP)
library(tidyverse)

# Academic performance and psychological factors data
df <- tibble(
  math = c(85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93),
  reading = c(88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95),
  writing = c(82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91),
  motivation = c(7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9),
  anxiety = c(3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0),
  self_efficacy = c(8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3)
)

# Define variable sets
# Set 1: Academic Performance (math, reading, writing)
X <- df |> select(math, reading, writing)

# Set 2: Psychological Factors (motivation, anxiety, self_efficacy)
Y <- df |> select(motivation, anxiety, self_efficacy)

# Perform Canonical Correlation Analysis
cc_result <- cc(as.matrix(X), as.matrix(Y))

# Display canonical correlations
print("Canonical Correlations:")
print(cc_result$cor)

# Canonical coefficients (standardized)
print("Canonical Coefficients for Set 1 (X):")
print(cc_result$xcoef)

print("Canonical Coefficients for Set 2 (Y):")
print(cc_result$ycoef)

# Calculate canonical loadings (structure coefficients)
X_std <- scale(X)
Y_std <- scale(Y)

# Canonical variates
U <- X_std %*% cc_result$xcoef
V <- Y_std %*% cc_result$ycoef

# Loadings (correlations between original variables and canonical variates)
loadings_X <- cor(X_std, U)
loadings_Y <- cor(Y_std, V)

print("Canonical Loadings for Set 1 (X):")
print(loadings_X)

print("Canonical Loadings for Set 2 (Y):")
print(loadings_Y)

# Test significance using Wilks' Lambda
n <- nrow(df)
p <- ncol(X)
q <- ncol(Y)

# Use CCP package for significance testing
sig_test <- p.asym(cc_result$cor, n, p, q)

print("Significance Tests:")
print(sig_test)

# Redundancy analysis
redundancy_X <- colSums(loadings_X^2) / p * cc_result$cor^2
redundancy_Y <- colSums(loadings_Y^2) / q * cc_result$cor^2

print("Redundancy (variance in X explained by Y's canonical variates):")
print(redundancy_X)

print("Redundancy (variance in Y explained by X's canonical variates):")
print(redundancy_Y)

Python Code (scikit-learn)

Python

import pandas as pd
import numpy as np
from sklearn.cross_decomposition import CCA
from scipy import stats
import matplotlib.pyplot as plt

# Academic performance and psychological factors data
data = pd.DataFrame({
    'math': [85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93],
    'reading': [88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95],
    'writing': [82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91],
    'motivation': [7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9],
    'anxiety': [3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0],
    'self_efficacy': [8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3]
})

# Define variable sets
# Set 1: Academic Performance
X = data[['math', 'reading', 'writing']].values

# Set 2: Psychological Factors
Y = data[['motivation', 'anxiety', 'self_efficacy']].values

# Standardize the data
from sklearn.preprocessing import StandardScaler
scaler_X = StandardScaler()
scaler_Y = StandardScaler()
X_std = scaler_X.fit_transform(X)
Y_std = scaler_Y.fit_transform(Y)

# Perform Canonical Correlation Analysis
n_components = min(X.shape[1], Y.shape[1])
cca = CCA(n_components=n_components)
cca.fit(X_std, Y_std)

# Transform to canonical variates
X_c, Y_c = cca.transform(X_std, Y_std)

# Calculate canonical correlations
canonical_correlations = [np.corrcoef(X_c[:, i], Y_c[:, i])[0, 1]
                         for i in range(n_components)]

print("Canonical Correlations:")
print(canonical_correlations)

# Canonical coefficients
print("\nCanonical Coefficients for Set 1 (X):")
print(cca.x_weights_)

print("\nCanonical Coefficients for Set 2 (Y):")
print(cca.y_weights_)

# Calculate canonical loadings (structure coefficients)
loadings_X = np.corrcoef(X_std.T, X_c.T)[:X.shape[1], X.shape[1]:]
loadings_Y = np.corrcoef(Y_std.T, Y_c.T)[:Y.shape[1], Y.shape[1]:]

print("\nCanonical Loadings for Set 1 (X):")
print(loadings_X)

print("\nCanonical Loadings for Set 2 (Y):")
print(loadings_Y)

# Wilks' Lambda test for significance
def wilks_lambda_test(canonical_corrs, n, p, q):
    """
    Test significance of canonical correlations using Wilks' Lambda
    n: sample size
    p: number of variables in set 1
    q: number of variables in set 2
    """
    m = min(p, q)
    lambda_vals = []

    for k in range(m):
        # Product of (1 - r^2) for remaining correlations
        lambda_k = np.prod([1 - r**2 for r in canonical_corrs[k:]])
        lambda_vals.append(lambda_k)

        # Chi-square approximation
        df = (p - k) * (q - k)
        chi_sq = -(n - 1 - (p + q + 1) / 2) * np.log(lambda_k)
        p_value = 1 - stats.chi2.cdf(chi_sq, df)

        print(f"\nFunction {k+1}:")
        print(f"  Wilks' Lambda: {lambda_k:.4f}")
        print(f"  Chi-square: {chi_sq:.4f}")
        print(f"  df: {df}")
        print(f"  p-value: {p_value:.4f}")

print("\nSignificance Tests:")
wilks_lambda_test(canonical_correlations, len(data), X.shape[1], Y.shape[1])

Verification

Canonical Correlation Analysis (CCA)

Created:December 24, 2025

Last Updated:December 24, 2025

What You'll Get:

Canonical Correlations: Correlation coefficients between canonical variates
Canonical Coefficients: Weights for creating canonical variates from original variables
Canonical Loadings: Structure coefficients showing relationships between original variables and canonical variates
Significance Tests: Wilks' Lambda test for each canonical function
Redundancy Analysis: Proportion of variance in one set explained by the other set
Visualizations: Heatmaps and bar charts showing canonical loadings and correlations
APA-Formatted Report: Professional statistical reporting ready for publication

Ready to explore multivariate relationships? (academic performance and psychological factors) to see CCA in action, or upload your own data to discover relationships between your variable sets.

Calculator

1. Load Your Data

2. Define Variable Sets

Important: Assign variables to Set 1 and Set 2 based on your research question. Each variable can only belong to one set. Each set should contain at least 2 variables for meaningful analysis.

Set 1 Variables (0)

Select variables for the first set (e.g., academic performance measures)

Set 2 Variables (0)

Select variables for the second set (e.g., psychological factors)

3. Analysis Options

Standardize Data (Recommended)

Related Calculators

Correlation Coefficient Calculator

Multiple Regression Calculator

Principal Component Analysis Calculator

Confirmatory Factor Analysis Calculator

Learn More

Definition

When to Use CCA

Use Canonical Correlation Analysis when you want to:

Explore multivariate relationships: Understand how two sets of variables relate to each other
Reduce dimensionality: Summarize complex relationships using fewer canonical dimensions
Test theoretical models: Examine relationships between constructs measured by multiple indicators
Predict multiple outcomes: Analyze how multiple predictors relate to multiple dependent variables
Compare variable sets: Determine which variables contribute most to relationships between sets

Understanding CCA Output

Canonical Correlations: Range from 0 to 1, with higher values indicating stronger relationships between variable sets
Canonical Coefficients: Weights used to create canonical variates (similar to regression coefficients)
Canonical Loadings: Correlations between original variables and canonical variates (easier to interpret than coefficients)
Wilks' Lambda: Tests statistical significance; values closer to 0 indicate stronger relationships (p < .05 suggests significance)

Interpreting CCA Results

Canonical Correlations

≥ 0.70: Strong relationship
0.40 - 0.70: Moderate relationship
< 0.40: Weak relationship
First canonical correlation is always the largest; subsequent ones are progressively smaller

Canonical Loadings

|r| ≥ 0.45: Variable substantially contributes to the canonical variate
|r| 0.30 - 0.45: Moderate contribution
|r| < 0.30: Minimal contribution
Loadings are easier to interpret than canonical coefficients

Significance Testing

Wilks' Lambda: Values closer to 0 indicate stronger relationships
p < .05: Canonical function is statistically significant
Test each canonical function sequentially; stop when non-significant

Redundancy Analysis

Indicates proportion of variance in one set explained by the other set
≥ 10%: Meaningful redundancy
Low redundancy suggests sets share little common variance

Assumptions and Considerations

Key Assumptions:

Linear relationships between variables
Multivariate normality (especially for significance tests)
Homoscedasticity of residuals
No extreme multicollinearity within sets

Sample Size Requirements:

Minimum n ≥ 20 per variable in the larger set. For reliable results, aim for n ≥ 10 × (total number of variables). Smaller samples may produce unstable estimates.

Note on Interpretation:

Focus on the first 1-3 canonical functions, as later functions often explain trivial variance and may not be practically meaningful even if statistically significant.

Example Code

R Code (CCA & CCP packages)

library(CCA)
library(CCP)
library(tidyverse)

# Academic performance and psychological factors data
df <- tibble(
  math = c(85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93),
  reading = c(88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95),
  writing = c(82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91),
  motivation = c(7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9),
  anxiety = c(3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0),
  self_efficacy = c(8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3)
)

# Define variable sets
# Set 1: Academic Performance (math, reading, writing)
X <- df |> select(math, reading, writing)

# Set 2: Psychological Factors (motivation, anxiety, self_efficacy)
Y <- df |> select(motivation, anxiety, self_efficacy)

# Perform Canonical Correlation Analysis
cc_result <- cc(as.matrix(X), as.matrix(Y))

# Display canonical correlations
print("Canonical Correlations:")
print(cc_result$cor)

# Canonical coefficients (standardized)
print("Canonical Coefficients for Set 1 (X):")
print(cc_result$xcoef)

print("Canonical Coefficients for Set 2 (Y):")
print(cc_result$ycoef)

# Calculate canonical loadings (structure coefficients)
X_std <- scale(X)
Y_std <- scale(Y)

# Canonical variates
U <- X_std %*% cc_result$xcoef
V <- Y_std %*% cc_result$ycoef

# Loadings (correlations between original variables and canonical variates)
loadings_X <- cor(X_std, U)
loadings_Y <- cor(Y_std, V)

print("Canonical Loadings for Set 1 (X):")
print(loadings_X)

print("Canonical Loadings for Set 2 (Y):")
print(loadings_Y)

# Test significance using Wilks' Lambda
n <- nrow(df)
p <- ncol(X)
q <- ncol(Y)

# Use CCP package for significance testing
sig_test <- p.asym(cc_result$cor, n, p, q)

print("Significance Tests:")
print(sig_test)

# Redundancy analysis
redundancy_X <- colSums(loadings_X^2) / p * cc_result$cor^2
redundancy_Y <- colSums(loadings_Y^2) / q * cc_result$cor^2

print("Redundancy (variance in X explained by Y's canonical variates):")
print(redundancy_X)

print("Redundancy (variance in Y explained by X's canonical variates):")
print(redundancy_Y)

Python Code (scikit-learn)

Python

import pandas as pd
import numpy as np
from sklearn.cross_decomposition import CCA
from scipy import stats
import matplotlib.pyplot as plt

# Academic performance and psychological factors data
data = pd.DataFrame({
    'math': [85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93],
    'reading': [88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95],
    'writing': [82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91],
    'motivation': [7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9],
    'anxiety': [3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0],
    'self_efficacy': [8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3]
})

# Define variable sets
# Set 1: Academic Performance
X = data[['math', 'reading', 'writing']].values

# Set 2: Psychological Factors
Y = data[['motivation', 'anxiety', 'self_efficacy']].values

# Standardize the data
from sklearn.preprocessing import StandardScaler
scaler_X = StandardScaler()
scaler_Y = StandardScaler()
X_std = scaler_X.fit_transform(X)
Y_std = scaler_Y.fit_transform(Y)

# Perform Canonical Correlation Analysis
n_components = min(X.shape[1], Y.shape[1])
cca = CCA(n_components=n_components)
cca.fit(X_std, Y_std)

# Transform to canonical variates
X_c, Y_c = cca.transform(X_std, Y_std)

# Calculate canonical correlations
canonical_correlations = [np.corrcoef(X_c[:, i], Y_c[:, i])[0, 1]
                         for i in range(n_components)]

print("Canonical Correlations:")
print(canonical_correlations)

# Canonical coefficients
print("\nCanonical Coefficients for Set 1 (X):")
print(cca.x_weights_)

print("\nCanonical Coefficients for Set 2 (Y):")
print(cca.y_weights_)

# Calculate canonical loadings (structure coefficients)
loadings_X = np.corrcoef(X_std.T, X_c.T)[:X.shape[1], X.shape[1]:]
loadings_Y = np.corrcoef(Y_std.T, Y_c.T)[:Y.shape[1], Y.shape[1]:]

print("\nCanonical Loadings for Set 1 (X):")
print(loadings_X)

print("\nCanonical Loadings for Set 2 (Y):")
print(loadings_Y)

# Wilks' Lambda test for significance
def wilks_lambda_test(canonical_corrs, n, p, q):
    """
    Test significance of canonical correlations using Wilks' Lambda
    n: sample size
    p: number of variables in set 1
    q: number of variables in set 2
    """
    m = min(p, q)
    lambda_vals = []

    for k in range(m):
        # Product of (1 - r^2) for remaining correlations
        lambda_k = np.prod([1 - r**2 for r in canonical_corrs[k:]])
        lambda_vals.append(lambda_k)

        # Chi-square approximation
        df = (p - k) * (q - k)
        chi_sq = -(n - 1 - (p + q + 1) / 2) * np.log(lambda_k)
        p_value = 1 - stats.chi2.cdf(chi_sq, df)

        print(f"\nFunction {k+1}:")
        print(f"  Wilks' Lambda: {lambda_k:.4f}")
        print(f"  Chi-square: {chi_sq:.4f}")
        print(f"  df: {df}")
        print(f"  p-value: {p_value:.4f}")

print("\nSignificance Tests:")
wilks_lambda_test(canonical_correlations, len(data), X.shape[1], Y.shape[1])

Verification

library(CCA) library(CCP) library(tidyverse) # Academic performance and psychological factors data df <- tibble( math = c(85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93), reading = c(88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95), writing = c(82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91), motivation = c(7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9), anxiety = c(3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0), self_efficacy = c(8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3) ) # Define variable sets # Set 1: Academic Performance (math, reading, writing) X <- df |> select(math, reading, writing) # Set 2: Psychological Factors (motivation, anxiety, self_efficacy) Y <- df |> select(motivation, anxiety, self_efficacy) # Perform Canonical Correlation Analysis cc_result <- cc(as.matrix(X), as.matrix(Y)) # Display canonical correlations print("Canonical Correlations:") print(cc_result$cor) # Canonical coefficients (standardized) print("Canonical Coefficients for Set 1 (X):") print(cc_result$xcoef) print("Canonical Coefficients for Set 2 (Y):") print(cc_result$ycoef) # Calculate canonical loadings (structure coefficients) X_std <- scale(X) Y_std <- scale(Y) # Canonical variates U <- X_std %*% cc_result$xcoef V <- Y_std %*% cc_result$ycoef # Loadings (correlations between original variables and canonical variates) loadings_X <- cor(X_std, U) loadings_Y <- cor(Y_std, V) print("Canonical Loadings for Set 1 (X):") print(loadings_X) print("Canonical Loadings for Set 2 (Y):") print(loadings_Y) # Test significance using Wilks' Lambda n <- nrow(df) p <- ncol(X) q <- ncol(Y) # Use CCP package for significance testing sig_test <- p.asym(cc_result$cor, n, p, q) print("Significance Tests:") print(sig_test) # Redundancy analysis redundancy_X <- colSums(loadings_X^2) / p * cc_result$cor^2 redundancy_Y <- colSums(loadings_Y^2) / q * cc_result$cor^2 print("Redundancy (variance in X explained by Y's canonical variates):") print(redundancy_X) print("Redundancy (variance in Y explained by X's canonical variates):") print(redundancy_Y)

import pandas as pd import numpy as np from sklearn.cross_decomposition import CCA from scipy import stats import matplotlib.pyplot as plt # Academic performance and psychological factors data data = pd.DataFrame({ 'math': [85, 78, 92, 88, 76, 90, 82, 87, 91, 79, 84, 89, 77, 86, 93], 'reading': [88, 82, 90, 85, 79, 92, 84, 89, 93, 81, 86, 91, 80, 87, 95], 'writing': [82, 76, 88, 84, 74, 89, 80, 85, 90, 77, 83, 87, 75, 84, 91], 'motivation': [7.5, 6.8, 8.2, 7.8, 6.5, 8.5, 7.2, 7.9, 8.8, 6.9, 7.6, 8.3, 6.7, 7.7, 8.9], 'anxiety': [3.2, 4.1, 2.5, 3.0, 4.5, 2.3, 3.8, 2.9, 2.1, 4.2, 3.3, 2.6, 4.3, 3.1, 2.0], 'self_efficacy': [8.1, 7.2, 8.9, 8.3, 7.0, 9.0, 7.8, 8.5, 9.2, 7.3, 8.0, 8.7, 7.1, 8.2, 9.3] }) # Define variable sets # Set 1: Academic Performance X = data[['math', 'reading', 'writing']].values # Set 2: Psychological Factors Y = data[['motivation', 'anxiety', 'self_efficacy']].values # Standardize the data from sklearn.preprocessing import StandardScaler scaler_X = StandardScaler() scaler_Y = StandardScaler() X_std = scaler_X.fit_transform(X) Y_std = scaler_Y.fit_transform(Y) # Perform Canonical Correlation Analysis n_components = min(X.shape[1], Y.shape[1]) cca = CCA(n_components=n_components) cca.fit(X_std, Y_std) # Transform to canonical variates X_c, Y_c = cca.transform(X_std, Y_std) # Calculate canonical correlations canonical_correlations = [np.corrcoef(X_c[:, i], Y_c[:, i])[0, 1] for i in range(n_components)] print("Canonical Correlations:") print(canonical_correlations) # Canonical coefficients print("\nCanonical Coefficients for Set 1 (X):") print(cca.x_weights_) print("\nCanonical Coefficients for Set 2 (Y):") print(cca.y_weights_) # Calculate canonical loadings (structure coefficients) loadings_X = np.corrcoef(X_std.T, X_c.T)[:X.shape[1], X.shape[1]:] loadings_Y = np.corrcoef(Y_std.T, Y_c.T)[:Y.shape[1], Y.shape[1]:] print("\nCanonical Loadings for Set 1 (X):") print(loadings_X) print("\nCanonical Loadings for Set 2 (Y):") print(loadings_Y) # Wilks' Lambda test for significance def wilks_lambda_test(canonical_corrs, n, p, q): """ Test significance of canonical correlations using Wilks' Lambda n: sample size p: number of variables in set 1 q: number of variables in set 2 """ m = min(p, q) lambda_vals = [] for k in range(m): # Product of (1 - r^2) for remaining correlations lambda_k = np.prod([1 - r**2 for r in canonical_corrs[k:]]) lambda_vals.append(lambda_k) # Chi-square approximation df = (p - k) * (q - k) chi_sq = -(n - 1 - (p + q + 1) / 2) * np.log(lambda_k) p_value = 1 - stats.chi2.cdf(chi_sq, df) print(f"\nFunction {k+1}:") print(f" Wilks' Lambda: {lambda_k:.4f}") print(f" Chi-square: {chi_sq:.4f}") print(f" df: {df}") print(f" p-value: {p_value:.4f}") print("\nSignificance Tests:") wilks_lambda_test(canonical_correlations, len(data), X.shape[1], Y.shape[1])

Canonical Correlation Analysis (CCA)

What You'll Get:

Calculator

1. Load Your Data

2. Define Variable Sets

3. Analysis Options

Related Calculators

Correlation Coefficient Calculator

Multiple Regression Calculator

Principal Component Analysis Calculator

Confirmatory Factor Analysis Calculator

Learn More

Definition

When to Use CCA

Understanding CCA Output

Interpreting CCA Results

Canonical Correlations

Canonical Loadings

Significance Testing

Redundancy Analysis

Assumptions and Considerations

Example Code

R Code (CCA & CCP packages)

Python Code (scikit-learn)

Verification

View Verification Details

Canonical Correlation Analysis (CCA)

What You'll Get:

Calculator

1. Load Your Data

2. Define Variable Sets

3. Analysis Options

Related Calculators

Correlation Coefficient Calculator

Multiple Regression Calculator

Principal Component Analysis Calculator

Confirmatory Factor Analysis Calculator

Learn More

Definition

When to Use CCA

Understanding CCA Output

Interpreting CCA Results

Canonical Correlations

Canonical Loadings

Significance Testing

Redundancy Analysis

Assumptions and Considerations

Example Code

R Code (CCA & CCP packages)

Python Code (scikit-learn)

Verification

View Verification Details

Canonical Correlation Analysis (CCA)

What You'll Get:

Calculator

1. Load Your Data

2. Define Variable Sets

3. Analysis Options

Related Calculators

Correlation Coefficient Calculator

Multiple Regression Calculator

Principal Component Analysis Calculator

Confirmatory Factor Analysis Calculator

Learn More

Definition

When to Use CCA

Understanding CCA Output

Interpreting CCA Results

Canonical Correlations

Canonical Loadings

Significance Testing

Redundancy Analysis

Assumptions and Considerations

Example Code

R Code (CCA & CCP packages)

Python Code (scikit-learn)

Verification

View Verification Details

Canonical Correlation Analysis (CCA)

What You'll Get: