This calculator helps you measure the relationship between two variables while controlling for the effect of one or more additional variables. Partial correlation is essential when you want to isolate the direct relationship between two variables by removing the influence of confounding factors. Perfect for research studies, statistical analysis, and understanding complex multivariate relationships.
For simple correlation between two variables without controls, use our Correlation Coefficient Calculator. Not sure when to use partial correlation? to see how controlling for prior GPA reveals the true relationship between study hours and exam scores (it demonstrates how high-GPA students can score well with less studying, while the raw correlation might suggest studying alone predicts scores).
Please select X and Y variables first to see available control variables.
Partial Correlation measures the relationship between two variables while controlling for the effect of one or more additional variables. It helps isolate the direct relationship between variables by removing the influence of confounding factors. The partial correlation coefficient ranges from -1 to +1, just like regular correlation.
Partial Correlation (controlling for one variable Z):
Where rXY is the correlation between X and Y, rXZ is the correlation between X and Z, and rYZ is the correlation between Y and Z
A partial correlation coefficient represents the strength and direction of the relationship between two variables after removing the linear effect of the control variable(s):
Let's say we want to examine the relationship between ice cream sales (X) and drowning deaths (Y), controlling for temperature (Z):
Simple Correlation:
rXY = 0.85 (strong positive correlation)
This suggests ice cream sales and drowning deaths are strongly related!
Partial Correlation (controlling for temperature):
rXY·Z = 0.05 (very weak correlation)
After controlling for temperature, the relationship nearly disappears!
Interpretation: The apparent relationship between ice cream sales and drowning deaths is spurious - it's actually driven by temperature. Hot weather increases both ice cream sales (people want cold treats) and drowning deaths (more people swimming). Once we control for temperature, there's no direct relationship between the two variables.
Use the pcor() function from the ppcor package:
# Install and load the ppcor package
install.packages("ppcor")
library(ppcor)
# Load example data (same as sample dataset above)
data <- data.frame(
study_hours = c(5, 8, 3, 10, 6, 4, 9, 5, 7, 11, 6, 9, 4, 12, 7, 5, 10, 6, 8, 11),
exam_score = c(72, 85, 68, 92, 75, 70, 88, 73, 80, 95, 78, 90, 71, 96, 82, 74, 91, 77, 86, 94),
prior_gpa = c(2.8, 3.4, 2.5, 3.8, 3.0, 2.7, 3.5, 2.9, 3.2, 3.9, 3.1, 3.6, 2.6, 3.9, 3.3, 2.8, 3.7, 3.0, 3.4, 3.8),
sleep_hours = c(6.5, 7.0, 6.0, 7.5, 6.5, 6.0, 7.5, 6.5, 7.0, 8.0, 7.0, 7.5, 6.0, 8.0, 7.0, 6.5, 7.5, 6.5, 7.0, 8.0)
)
# Calculate partial correlation between study_hours and exam_score,
# controlling for prior_gpa
result <- pcor.test(data$study_hours, data$exam_score, data$prior_gpa)
print(result)
# Shows: estimate (partial correlation), p.value, statistic, n
# For multiple control variables:
# pcor(data[, c("study_hours", "exam_score", "prior_gpa", "sleep_hours")])$estimate[1,2]Use pingouin.partial_corr() or calculate manually using correlation matrices:
import pandas as pd
import numpy as np
import pingouin as pg
from scipy import stats
# Load example data (same as sample dataset above)
data = pd.DataFrame({
'study_hours': [5, 8, 3, 10, 6, 4, 9, 5, 7, 11, 6, 9, 4, 12, 7, 5, 10, 6, 8, 11],
'exam_score': [72, 85, 68, 92, 75, 70, 88, 73, 80, 95, 78, 90, 71, 96, 82, 74, 91, 77, 86, 94],
'prior_gpa': [2.8, 3.4, 2.5, 3.8, 3.0, 2.7, 3.5, 2.9, 3.2, 3.9, 3.1, 3.6, 2.6, 3.9, 3.3, 2.8, 3.7, 3.0, 3.4, 3.8],
'sleep_hours': [6.5, 7.0, 6.0, 7.5, 6.5, 6.0, 7.5, 6.5, 7.0, 8.0, 7.0, 7.5, 6.0, 8.0, 7.0, 6.5, 7.5, 6.5, 7.0, 8.0]
})
# Method 1: Using pingouin
partial_corr = pg.partial_corr(
data=data,
x='study_hours',
y='exam_score',
covar='prior_gpa' # or covar=['prior_gpa', 'sleep_hours'] for multiple
)
print(partial_corr)
# Method 2: Manual calculation for one control variable
def partial_correlation(df, x, y, z):
"""Calculate partial correlation between x and y controlling for z"""
# Calculate simple correlations
r_xy = df[[x, y]].corr().iloc[0, 1]
r_xz = df[[x, z]].corr().iloc[0, 1]
r_yz = df[[y, z]].corr().iloc[0, 1]
# Calculate partial correlation
numerator = r_xy - (r_xz * r_yz)
denominator = np.sqrt((1 - r_xz**2) * (1 - r_yz**2))
partial_r = numerator / denominator
# Calculate significance
n = len(df)
df_resid = n - 3 # degrees of freedom
t_stat = partial_r * np.sqrt(df_resid) / np.sqrt(1 - partial_r**2)
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df_resid))
return partial_r, p_value
r, p = partial_correlation(data, 'study_hours', 'exam_score', 'prior_gpa')
print(f"Partial correlation: {r:.4f}, p-value: {p:.4f}")Use the PARTIAL CORR command:
* GUI Method:
1. Click Analyze > Correlate > Partial...
2. Move variables X and Y to "Variables:" box
3. Move control variable(s) to "Controlling for:" box
4. Click Options to select significance tests and display options
5. Click OK
* Syntax Method:
PARTIAL CORR
/VARIABLES=ice_cream_sales drowning_deaths BY temperature
/SIGNIFICANCE=TWOTAIL
/MISSING=LISTWISE.This calculator helps you measure the relationship between two variables while controlling for the effect of one or more additional variables. Partial correlation is essential when you want to isolate the direct relationship between two variables by removing the influence of confounding factors. Perfect for research studies, statistical analysis, and understanding complex multivariate relationships.
For simple correlation between two variables without controls, use our Correlation Coefficient Calculator. Not sure when to use partial correlation? to see how controlling for prior GPA reveals the true relationship between study hours and exam scores (it demonstrates how high-GPA students can score well with less studying, while the raw correlation might suggest studying alone predicts scores).
Please select X and Y variables first to see available control variables.
Partial Correlation measures the relationship between two variables while controlling for the effect of one or more additional variables. It helps isolate the direct relationship between variables by removing the influence of confounding factors. The partial correlation coefficient ranges from -1 to +1, just like regular correlation.
Partial Correlation (controlling for one variable Z):
Where rXY is the correlation between X and Y, rXZ is the correlation between X and Z, and rYZ is the correlation between Y and Z
A partial correlation coefficient represents the strength and direction of the relationship between two variables after removing the linear effect of the control variable(s):
Let's say we want to examine the relationship between ice cream sales (X) and drowning deaths (Y), controlling for temperature (Z):
Simple Correlation:
rXY = 0.85 (strong positive correlation)
This suggests ice cream sales and drowning deaths are strongly related!
Partial Correlation (controlling for temperature):
rXY·Z = 0.05 (very weak correlation)
After controlling for temperature, the relationship nearly disappears!
Interpretation: The apparent relationship between ice cream sales and drowning deaths is spurious - it's actually driven by temperature. Hot weather increases both ice cream sales (people want cold treats) and drowning deaths (more people swimming). Once we control for temperature, there's no direct relationship between the two variables.
Use the pcor() function from the ppcor package:
# Install and load the ppcor package
install.packages("ppcor")
library(ppcor)
# Load example data (same as sample dataset above)
data <- data.frame(
study_hours = c(5, 8, 3, 10, 6, 4, 9, 5, 7, 11, 6, 9, 4, 12, 7, 5, 10, 6, 8, 11),
exam_score = c(72, 85, 68, 92, 75, 70, 88, 73, 80, 95, 78, 90, 71, 96, 82, 74, 91, 77, 86, 94),
prior_gpa = c(2.8, 3.4, 2.5, 3.8, 3.0, 2.7, 3.5, 2.9, 3.2, 3.9, 3.1, 3.6, 2.6, 3.9, 3.3, 2.8, 3.7, 3.0, 3.4, 3.8),
sleep_hours = c(6.5, 7.0, 6.0, 7.5, 6.5, 6.0, 7.5, 6.5, 7.0, 8.0, 7.0, 7.5, 6.0, 8.0, 7.0, 6.5, 7.5, 6.5, 7.0, 8.0)
)
# Calculate partial correlation between study_hours and exam_score,
# controlling for prior_gpa
result <- pcor.test(data$study_hours, data$exam_score, data$prior_gpa)
print(result)
# Shows: estimate (partial correlation), p.value, statistic, n
# For multiple control variables:
# pcor(data[, c("study_hours", "exam_score", "prior_gpa", "sleep_hours")])$estimate[1,2]Use pingouin.partial_corr() or calculate manually using correlation matrices:
import pandas as pd
import numpy as np
import pingouin as pg
from scipy import stats
# Load example data (same as sample dataset above)
data = pd.DataFrame({
'study_hours': [5, 8, 3, 10, 6, 4, 9, 5, 7, 11, 6, 9, 4, 12, 7, 5, 10, 6, 8, 11],
'exam_score': [72, 85, 68, 92, 75, 70, 88, 73, 80, 95, 78, 90, 71, 96, 82, 74, 91, 77, 86, 94],
'prior_gpa': [2.8, 3.4, 2.5, 3.8, 3.0, 2.7, 3.5, 2.9, 3.2, 3.9, 3.1, 3.6, 2.6, 3.9, 3.3, 2.8, 3.7, 3.0, 3.4, 3.8],
'sleep_hours': [6.5, 7.0, 6.0, 7.5, 6.5, 6.0, 7.5, 6.5, 7.0, 8.0, 7.0, 7.5, 6.0, 8.0, 7.0, 6.5, 7.5, 6.5, 7.0, 8.0]
})
# Method 1: Using pingouin
partial_corr = pg.partial_corr(
data=data,
x='study_hours',
y='exam_score',
covar='prior_gpa' # or covar=['prior_gpa', 'sleep_hours'] for multiple
)
print(partial_corr)
# Method 2: Manual calculation for one control variable
def partial_correlation(df, x, y, z):
"""Calculate partial correlation between x and y controlling for z"""
# Calculate simple correlations
r_xy = df[[x, y]].corr().iloc[0, 1]
r_xz = df[[x, z]].corr().iloc[0, 1]
r_yz = df[[y, z]].corr().iloc[0, 1]
# Calculate partial correlation
numerator = r_xy - (r_xz * r_yz)
denominator = np.sqrt((1 - r_xz**2) * (1 - r_yz**2))
partial_r = numerator / denominator
# Calculate significance
n = len(df)
df_resid = n - 3 # degrees of freedom
t_stat = partial_r * np.sqrt(df_resid) / np.sqrt(1 - partial_r**2)
p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df_resid))
return partial_r, p_value
r, p = partial_correlation(data, 'study_hours', 'exam_score', 'prior_gpa')
print(f"Partial correlation: {r:.4f}, p-value: {p:.4f}")Use the PARTIAL CORR command:
* GUI Method:
1. Click Analyze > Correlate > Partial...
2. Move variables X and Y to "Variables:" box
3. Move control variable(s) to "Controlling for:" box
4. Click Options to select significance tests and display options
5. Click OK
* Syntax Method:
PARTIAL CORR
/VARIABLES=ice_cream_sales drowning_deaths BY temperature
/SIGNIFICANCE=TWOTAIL
/MISSING=LISTWISE.