This calculator performs comprehensive Confirmatory Factor Analysis (CFA), a statistical method used to test and validate hypothesized factor structures in measurement models. Unlike exploratory factor analysis, CFA requires you to specify which variables load on which factors based on theory or prior research.
What You'll Get:
- Model Fit Indices: Chi-square, CFI, TLI, RMSEA, and SRMR to evaluate model fit
- Standardized Factor Loadings: Strength of relationships between observed variables and latent factors
- Significance Tests: Statistical significance of all parameter estimates
- Factor Correlations: Relationships between latent factors
- Modification Indices: Suggestions for improving model fit
- APA-Formatted Report: Professional statistical reporting ready for publication
Pro Tip: CFA requires a theoretical basis for your factor structure. If you're exploring factor structures without prior hypotheses, use Exploratory Factor Analysis instead. Good model fit is indicated by CFI/TLI e 0.90, RMSEA d 0.08, and SRMR d 0.08.
Ready to test your measurement model? Load our sample dataset (2-factor cognitive ability model) to see CFA in action, or upload your own data to validate your theoretical factor structure.
Software Implementation Differences
CFA models can be identified using different parameterization strategies, which produce equivalent models but different parameter estimates:
- Marker Variable Approach (default): First indicator of each factor fixed to 1.0, latent variances freely estimated. Used by
lavaan(R) andsemopy(Python) by default - Standardized Latent Variable Approach: Latent variances fixed to 1.0, all loadings freely estimated. Use
std.lv=TRUEin R lavaan - Important: Both approaches produce identical fit indices (CFI, TLI, RMSEA, SRMR, χ²) and represent the same underlying model
- This calculator uses the marker variable approach. To match results in R, use
cfa(model, data)withoutstd.lv=TRUE - SPSS Amos, Mplus, EQS: May use different default identification strategies
- SRMR Calculation: Minor differences (typically < 0.005) may occur across software due to different formulas for converting model-implied covariances to correlations. This does not affect model interpretation.
Calculator
1. Load Your Data
2. Select Variables
Selected: 0 of 0 variables
3. Define Factor Structure
Important: Assign each variable to exactly one factor based on your theoretical model. Each factor should have at least 2-3 indicators for model identification.
4. Analysis Options
Related Calculators
Learn More
Definition
Confirmatory Factor Analysis (CFA) is a multivariate statistical technique used to test whether a hypothesized factor structure fits the observed data. Unlike EFA, CFA requires researchers to specify in advance which variables load on which factors based on theory, making it a hypothesis-testing approach ideal for scale validation and measurement model assessment.
When to Use CFA
Use Confirmatory Factor Analysis when you want to:
- Test theoretical models: Verify that your data supports a specific factor structure derived from theory
- Validate measurement instruments: Confirm that a questionnaire or scale measures the intended constructs
- Compare competing models: Evaluate which of several theoretical models best fits your data
- Assess construct validity: Demonstrate that variables cluster as expected based on theory
- Prepare for SEM: CFA is often the first step before structural equation modeling
EFA vs CFA vs PCA: Key Differences
| Aspect | EFA | CFA | PCA |
|---|---|---|---|
| Purpose | Seeks to explain correlations among variables using underlying latent factors. Separates shared variance from unique variance. | Tests a pre-specified factor structure based on theory. Confirms hypotheses about relationships between observed variables and latent factors. | Focuses on explaining total variance and creating orthogonal components. Does not distinguish shared from unique variance. |
| Approach | Exploratory - discovers underlying structure without prior hypotheses | Confirmatory - tests specific hypothesized factor structures | Descriptive - reduces dimensionality for data simplification |
| When to Use | Theory building, scale development, understanding construct validity | Theory testing, validating measurement models, assessing model fit to data | Data reduction, feature extraction, eliminating multicollinearity |
Interpreting Model Fit Indices
Comparative Fit Index (CFI) & Tucker-Lewis Index (TLI)
- e 0.95: Excellent fit
- 0.90 - 0.95: Acceptable fit
- < 0.90: Poor fit
Root Mean Square Error of Approximation (RMSEA)
- d 0.05: Excellent fit
- 0.05 - 0.08: Acceptable fit
- 0.08 - 0.10: Mediocre fit
- > 0.10: Poor fit
Standardized Root Mean Square Residual (SRMR)
- d 0.08: Good fit
- > 0.08: Poor fit
Factor Loadings & Model Modification
- Standardized loadings e 0.7 indicate strong relationships
- Loadings 0.5-0.7 are moderate and generally acceptable
- Loadings < 0.5 may indicate weak indicators
- All loadings should be statistically significant (p < .05)
MI values > 3.84 suggest parameters that could improve model fit if freed. However, modifications should only be made if theoretically justified. Data-driven modifications without theoretical rationale can lead to overfitting and non-generalizable results.
Example Code
R Code (lavaan)
library(lavaan)
library(tidyverse)
library(semPlot)
# Psychological test scores data
data <- tibble(
verbal = c(65, 72, 58, 68, 75, 62, 70, 66, 71, 69, 63, 74, 60, 67, 73),
numerical = c(62, 68, 55, 65, 71, 60, 68, 63, 70, 66, 61, 72, 58, 64, 70),
logical = c(68, 74, 60, 70, 78, 65, 73, 69, 75, 71, 66, 76, 62, 69, 77),
spatial = c(58, 65, 52, 62, 68, 56, 64, 60, 66, 63, 58, 67, 54, 61, 69),
memory = c(70, 76, 63, 72, 80, 68, 75, 71, 77, 73, 69, 79, 65, 72, 81)
)
# Define CFA model (2-factor structure)
# Factor 1: Cognitive reasoning (verbal, numerical, logical)
# Factor 2: Spatial-memory abilities (spatial, memory)
model <- '
# Latent factors
Cognitive =~ verbal + numerical + logical
Spatial_Memory =~ spatial + memory
'
# Fit the CFA model
fit <- cfa(model, data = data, std.lv = TRUE)
# View results
summary(fit, fit.measures = TRUE, standardized = TRUE)
# Get fit indices
fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr"))
# Standardized loadings
standardizedSolution(fit)
# Modification indices
modindices(fit, sort = TRUE, minimum.value = 3.84)
# Path diagram
semPaths(fit, what = "std", edge.label.cex = 0.8,
curvePivot = TRUE, layout = "tree")Python Code (semopy)
import pandas as pd
import numpy as np
from semopy import Model, semplot
import matplotlib.pyplot as plt
import semopy
# Psychological test scores data
data = pd.DataFrame({
'verbal': [65, 72, 58, 68, 75, 62, 70, 66, 71, 69, 63, 74, 60, 67, 73],
'numerical': [62, 68, 55, 65, 71, 60, 68, 63, 70, 66, 61, 72, 58, 64, 70],
'logical': [68, 74, 60, 70, 78, 65, 73, 69, 75, 71, 66, 76, 62, 69, 77],
'spatial': [58, 65, 52, 62, 68, 56, 64, 60, 66, 63, 58, 67, 54, 61, 69],
'memory': [70, 76, 63, 72, 80, 68, 75, 71, 77, 73, 69, 79, 65, 72, 81]
})
# Define CFA model (2-factor structure)
# Factor 1: Cognitive reasoning (verbal, numerical, logical)
# Factor 2: Spatial-memory abilities (spatial, memory)
model_spec = """
# Latent factors
Cognitive =~ verbal + numerical + logical
Spatial_Memory =~ spatial + memory
"""
# Fit the CFA model
model = Model(model_spec)
model.fit(data)
# Model fit statistics
stats = semopy.calc_stats(model)
print("Model Fit Statistics:")
print(stats)
# Parameter estimates
estimates = model.inspect()
print("Parameter Estimates:")
print(estimates)
# Factor loadings
print("Factor Loadings:")
loadings = estimates[estimates['op'] == '~'][['lval', 'rval', 'Estimate', 'Std. Err', 'z-value', 'p-value']]
print(loadings)