This calculator performs comprehensive Confirmatory Factor Analysis (CFA), a statistical method used to test and validate hypothesized factor structures in measurement models. Unlike exploratory factor analysis, CFA requires you to specify which variables load on which factors based on theory or prior research.
Pro Tip: CFA requires a theoretical basis for your factor structure. If you're exploring factor structures without prior hypotheses, use Exploratory Factor Analysis instead. Good model fit is indicated by CFI/TLI e 0.90, RMSEA d 0.08, and SRMR d 0.08.
Ready to test your measurement model? (2-factor cognitive ability model) to see CFA in action, or upload your own data to validate your theoretical factor structure.
CFA models can be identified using different parameterization strategies, which produce equivalent models but different parameter estimates:
lavaan (R) and semopy (Python) by defaultstd.lv=TRUE in R lavaancfa(model, data) without std.lv=TRUESelected: 0 of 0 variables
Important: Assign each variable to exactly one factor based on your theoretical model. Each factor should have at least 2-3 indicators for model identification.
Note: Please avoid using spaces in factor names. Use underscores instead (e.g., Cognitive_Ability).
Confirmatory Factor Analysis (CFA) is a multivariate statistical technique used to test whether a hypothesized factor structure fits the observed data. Unlike EFA, CFA requires researchers to specify in advance which variables load on which factors based on theory, making it a hypothesis-testing approach ideal for scale validation and measurement model assessment.
Use Confirmatory Factor Analysis when you want to:
| Aspect | EFA | CFA | PCA |
|---|---|---|---|
| Purpose | Seeks to explain correlations among variables using underlying latent factors. Separates shared variance from unique variance. | Tests a pre-specified factor structure based on theory. Confirms hypotheses about relationships between observed variables and latent factors. | Focuses on explaining total variance and creating orthogonal components. Does not distinguish shared from unique variance. |
| Approach | Exploratory - discovers underlying structure without prior hypotheses | Confirmatory - tests specific hypothesized factor structures | Descriptive - reduces dimensionality for data simplification |
| When to Use | Theory building, scale development, understanding construct validity | Theory testing, validating measurement models, assessing model fit to data | Data reduction, feature extraction, eliminating multicollinearity |
MI values > 3.84 suggest parameters that could improve model fit if freed. However, modifications should only be made if theoretically justified. Data-driven modifications without theoretical rationale can lead to overfitting and non-generalizable results.
library(lavaan)
library(tidyverse)
library(semPlot)
# Psychological test scores data
data <- tibble(
verbal = c(65, 72, 58, 68, 75, 62, 70, 66, 71, 69, 63, 74, 60, 67, 73),
numerical = c(62, 68, 55, 65, 71, 60, 68, 63, 70, 66, 61, 72, 58, 64, 70),
logical = c(68, 74, 60, 70, 78, 65, 73, 69, 75, 71, 66, 76, 62, 69, 77),
spatial = c(58, 65, 52, 62, 68, 56, 64, 60, 66, 63, 58, 67, 54, 61, 69),
memory = c(70, 76, 63, 72, 80, 68, 75, 71, 77, 73, 69, 79, 65, 72, 81)
)
# Define CFA model (2-factor structure)
# Factor 1: Cognitive reasoning (verbal, numerical, logical)
# Factor 2: Spatial-memory abilities (spatial, memory)
model <- '
# Latent factors
Cognitive =~ verbal + numerical + logical
Spatial_Memory =~ spatial + memory
'
# Fit the CFA model
fit <- cfa(model, data = data, std.lv = TRUE)
# View results
summary(fit, fit.measures = TRUE, standardized = TRUE)
# Get fit indices
fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr"))
# Standardized loadings
standardizedSolution(fit)
# Modification indices
modindices(fit, sort = TRUE, minimum.value = 3.84)
# Path diagram
semPaths(fit, what = "std", edge.label.cex = 0.8,
curvePivot = TRUE, layout = "tree")import pandas as pd
import numpy as np
from semopy import Model, semplot
import matplotlib.pyplot as plt
import semopy
# Psychological test scores data
data = pd.DataFrame({
'verbal': [65, 72, 58, 68, 75, 62, 70, 66, 71, 69, 63, 74, 60, 67, 73],
'numerical': [62, 68, 55, 65, 71, 60, 68, 63, 70, 66, 61, 72, 58, 64, 70],
'logical': [68, 74, 60, 70, 78, 65, 73, 69, 75, 71, 66, 76, 62, 69, 77],
'spatial': [58, 65, 52, 62, 68, 56, 64, 60, 66, 63, 58, 67, 54, 61, 69],
'memory': [70, 76, 63, 72, 80, 68, 75, 71, 77, 73, 69, 79, 65, 72, 81]
})
# Define CFA model (2-factor structure)
# Factor 1: Cognitive reasoning (verbal, numerical, logical)
# Factor 2: Spatial-memory abilities (spatial, memory)
model_spec = """
# Latent factors
Cognitive =~ verbal + numerical + logical
Spatial_Memory =~ spatial + memory
"""
# Fit the CFA model
model = Model(model_spec)
model.fit(data)
# Model fit statistics
stats = semopy.calc_stats(model)
print("Model Fit Statistics:")
print(stats)
# Parameter estimates
estimates = model.inspect()
print("Parameter Estimates:")
print(estimates)
# Factor loadings
print("Factor Loadings:")
loadings = estimates[estimates['op'] == '~'][['lval', 'rval', 'Estimate', 'Std. Err', 'z-value', 'p-value']]
print(loadings)This calculator performs comprehensive Confirmatory Factor Analysis (CFA), a statistical method used to test and validate hypothesized factor structures in measurement models. Unlike exploratory factor analysis, CFA requires you to specify which variables load on which factors based on theory or prior research.
Pro Tip: CFA requires a theoretical basis for your factor structure. If you're exploring factor structures without prior hypotheses, use Exploratory Factor Analysis instead. Good model fit is indicated by CFI/TLI e 0.90, RMSEA d 0.08, and SRMR d 0.08.
Ready to test your measurement model? (2-factor cognitive ability model) to see CFA in action, or upload your own data to validate your theoretical factor structure.
CFA models can be identified using different parameterization strategies, which produce equivalent models but different parameter estimates:
lavaan (R) and semopy (Python) by defaultstd.lv=TRUE in R lavaancfa(model, data) without std.lv=TRUESelected: 0 of 0 variables
Important: Assign each variable to exactly one factor based on your theoretical model. Each factor should have at least 2-3 indicators for model identification.
Note: Please avoid using spaces in factor names. Use underscores instead (e.g., Cognitive_Ability).
Confirmatory Factor Analysis (CFA) is a multivariate statistical technique used to test whether a hypothesized factor structure fits the observed data. Unlike EFA, CFA requires researchers to specify in advance which variables load on which factors based on theory, making it a hypothesis-testing approach ideal for scale validation and measurement model assessment.
Use Confirmatory Factor Analysis when you want to:
| Aspect | EFA | CFA | PCA |
|---|---|---|---|
| Purpose | Seeks to explain correlations among variables using underlying latent factors. Separates shared variance from unique variance. | Tests a pre-specified factor structure based on theory. Confirms hypotheses about relationships between observed variables and latent factors. | Focuses on explaining total variance and creating orthogonal components. Does not distinguish shared from unique variance. |
| Approach | Exploratory - discovers underlying structure without prior hypotheses | Confirmatory - tests specific hypothesized factor structures | Descriptive - reduces dimensionality for data simplification |
| When to Use | Theory building, scale development, understanding construct validity | Theory testing, validating measurement models, assessing model fit to data | Data reduction, feature extraction, eliminating multicollinearity |
MI values > 3.84 suggest parameters that could improve model fit if freed. However, modifications should only be made if theoretically justified. Data-driven modifications without theoretical rationale can lead to overfitting and non-generalizable results.
library(lavaan)
library(tidyverse)
library(semPlot)
# Psychological test scores data
data <- tibble(
verbal = c(65, 72, 58, 68, 75, 62, 70, 66, 71, 69, 63, 74, 60, 67, 73),
numerical = c(62, 68, 55, 65, 71, 60, 68, 63, 70, 66, 61, 72, 58, 64, 70),
logical = c(68, 74, 60, 70, 78, 65, 73, 69, 75, 71, 66, 76, 62, 69, 77),
spatial = c(58, 65, 52, 62, 68, 56, 64, 60, 66, 63, 58, 67, 54, 61, 69),
memory = c(70, 76, 63, 72, 80, 68, 75, 71, 77, 73, 69, 79, 65, 72, 81)
)
# Define CFA model (2-factor structure)
# Factor 1: Cognitive reasoning (verbal, numerical, logical)
# Factor 2: Spatial-memory abilities (spatial, memory)
model <- '
# Latent factors
Cognitive =~ verbal + numerical + logical
Spatial_Memory =~ spatial + memory
'
# Fit the CFA model
fit <- cfa(model, data = data, std.lv = TRUE)
# View results
summary(fit, fit.measures = TRUE, standardized = TRUE)
# Get fit indices
fitMeasures(fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "srmr"))
# Standardized loadings
standardizedSolution(fit)
# Modification indices
modindices(fit, sort = TRUE, minimum.value = 3.84)
# Path diagram
semPaths(fit, what = "std", edge.label.cex = 0.8,
curvePivot = TRUE, layout = "tree")import pandas as pd
import numpy as np
from semopy import Model, semplot
import matplotlib.pyplot as plt
import semopy
# Psychological test scores data
data = pd.DataFrame({
'verbal': [65, 72, 58, 68, 75, 62, 70, 66, 71, 69, 63, 74, 60, 67, 73],
'numerical': [62, 68, 55, 65, 71, 60, 68, 63, 70, 66, 61, 72, 58, 64, 70],
'logical': [68, 74, 60, 70, 78, 65, 73, 69, 75, 71, 66, 76, 62, 69, 77],
'spatial': [58, 65, 52, 62, 68, 56, 64, 60, 66, 63, 58, 67, 54, 61, 69],
'memory': [70, 76, 63, 72, 80, 68, 75, 71, 77, 73, 69, 79, 65, 72, 81]
})
# Define CFA model (2-factor structure)
# Factor 1: Cognitive reasoning (verbal, numerical, logical)
# Factor 2: Spatial-memory abilities (spatial, memory)
model_spec = """
# Latent factors
Cognitive =~ verbal + numerical + logical
Spatial_Memory =~ spatial + memory
"""
# Fit the CFA model
model = Model(model_spec)
model.fit(data)
# Model fit statistics
stats = semopy.calc_stats(model)
print("Model Fit Statistics:")
print(stats)
# Parameter estimates
estimates = model.inspect()
print("Parameter Estimates:")
print(estimates)
# Factor loadings
print("Factor Loadings:")
loadings = estimates[estimates['op'] == '~'][['lval', 'rval', 'Estimate', 'Std. Err', 'z-value', 'p-value']]
print(loadings)