Covariance

Created:October 15, 2024

Last Updated:July 30, 2025

This calculator helps you measure how two variables change together by calculating their covariance. Covariance indicates the direction of the linear relationship between variables: positive values mean the variables tend to increase and decrease together, negative values mean they move in opposite directions, and values near zero suggest little linear relationship. Unlike correlation coefficients, covariance values are not standardized and depend on the units of measurement, making the magnitude less interpretable across different datasets. The calculator provides sample covariance along with a visual scatter plot to help you understand the relationship between your variables.

Quick Calculator

Need a quick calculation? Enter your numbers below:

X Values:

Y Values:

Calculator

1. Load Your Data

2. Select Two Columns

Select Column 1:

Select Column 2:

Related Calculators

Correlation Coefficient Calculator

Simple Linear Regression Calculator

Multiple Linear Regression Calculator

Principal Component Analysis (PCA) Calculator

Learn More

Understanding Covariance

Definition

Covariance is a measure of the joint variability of two variables. It indicates how two variables change together and quantifies the strength and direction of their linear relationship.

Formula

Sample Covariance:

cov(X,Y) = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{n-1}

Where:

$n$ = sample size
$x_i, y_i$ = individual values of variables X and Y
\ar{x}, \ar{y} = sample means of X and Y

Interpretation Guidelines

Positive covariance indicates variables tend to move in the same direction

Negative covariance indicates variables tend to move in opposite directions

Zero covariance suggests no linear relationship between variables

Important Considerations

The magnitude of covariance depends on the units of measurement
Covariance is sensitive to outliers and scale changes
Only measures linear relationships; may miss non-linear patterns

Step-by-Step Practical Example

Let's calculate the covariance between hours studied and exam scores for 5 students:

StudentId	Hours Studied (X)	Exam Score (Y)
1	2	75
2	3	80
3	4	85
4	5	90
5	6	95

Step 1: Calculate the means: $\bar x = \frac{2 + 3 + 4 + 5 + 6}{5} = 4$ $\bar y = \frac{75 + 80 + 85 + 90 + 95}{5} = 85$

Step 2: Calculate $(x_i - \bar x)(y_i - \bar y)$ for each pair:

$(2 - 4)(75 - 85) = 20$
$(3 - 4)(80 - 85) = 5$
$(4 - 4)(85 - 85) = 0$
$(5 - 4)(90 - 85) = 5$
$(6 - 4)(95 - 85) = 20$

Step 3:Sum the results and divide by ( $n - 1$ ): $cov(X,Y) = \frac{20 + 5 + 0 + 5 + 20}{5 - 1} = \frac{50}{4} = 12.5$

Interpretation: The positive covariance $(12.5)$ indicates that there's a positive relationship between hours studied and exam scores. As the number of hours studied increases, exam scores tend to increase as well.

How to Calculate Covariance in R

Use the cov() function to calculate covariance between two variables:

library(tidyverse)

tips <- read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Sample covariance (default)
cov(tips$total_bill, tips$tip) # 8.323502

# Population covariance 
cov(tips$total_bill, tips$tip) * (nrow(tips) - 1) / nrow(tips) # 8.289388

# Covariance matrix for multiple variables
cov(tips[c("total_bill", "tip", "size")])

# Visualize the relationship
ggplot(tips, aes(x = total_bill, y = tip)) +
  geom_point(color = "steelblue", alpha = 0.7) + 
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(
    title = "Scatter Plot: Total Bill vs. Tip",
    subtitle = paste("Covariance:", round(cov(tips$total_bill, tips$tip), 3)),
    x = "Total Bill ($)",
    y = "Tip Amount ($)"
  ) +
  theme_minimal()

How to Calculate Covariance in Python

Use numpy.cov() or pandas.cov() to calculate covariance:

Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
tips = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Sample covariance using pandas
sample_cov = tips['total_bill'].cov(tips['tip'])
print(f"Sample covariance: {sample_cov:.6f}")  # 8.323502

# Using numpy (returns covariance matrix)
cov_matrix = np.cov(tips['total_bill'], tips['tip'])
print(f"Covariance matrix:\n{cov_matrix}")

# Population covariance
pop_cov = tips['total_bill'].cov(tips['tip']) * (len(tips) - 1) / len(tips)
print(f"Population covariance: {pop_cov:.6f}")

# Create scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(tips['total_bill'], tips['tip'], alpha=0.7, color='steelblue')
plt.plot(np.unique(tips['total_bill']), 
         np.poly1d(np.polyfit(tips['total_bill'], tips['tip'], 1))(np.unique(tips['total_bill'])), 
         color='red')
plt.title(f'Scatter Plot: Total Bill vs. Tip\nCovariance: {sample_cov:.3f}')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
plt.grid(True, alpha=0.3)
plt.show()

How to Calculate Covariance in Excel

Use COVAR.S() for sample covariance or COVAR.P() for population covariance:

Excel

# Assuming data in columns A (Total Bill) and B (Tip)

# Sample covariance (most common)
=COVAR.S(A2:A245, B2:B245)
# Result: 8.323502

# Population covariance
=COVAR.P(A2:A245, B2:B245)
# Result: 8.289388

# Alternative: Using older COVAR function (equivalent to COVAR.P)
=COVAR(A2:A245, B2:B245)

# Create descriptive statistics table:
Variable 1 (Total Bill):    =AVERAGE(A2:A245)
Variable 2 (Tip):          =AVERAGE(B2:B245)
Standard Dev 1:            =STDEV.S(A2:A245)
Standard Dev 2:            =STDEV.S(B2:B245)
Sample Covariance:         =COVAR.S(A2:A245, B2:B245)

# To create a scatter plot:
1. Select both data columns (A2:B245)
2. Insert → Charts → Scatter Chart
3. Add trendline: Right-click points → Add Trendline → Linear
4. Format chart title to include covariance value