StatsCalculators.com

Scatter Plot Maker

Created:October 10, 2024
Last Updated:May 21, 2025

A scatter plot is a powerful visualization tool that displays the relationship between two numerical variables. Each point on the plot represents an observation in your dataset, with its position determined by the values of the selected X and Y columns. Scatter plots are ideal for spotting trends, patterns, clusters, and outliers, and are commonly used to assess correlation and linearity between variables.

To get started, upload your data or use our sample datasets, then select the columns you want to visualize. You can also color or size points by additional variables, add trend lines, or facet by categories for deeper insights. Not sure how to begin? See our step-by-step tutorial below.

Calculator

1. Load Your Data

Note: Column names will be converted to snake_case (e.g., "Product ID" → "product_id") for processing.

2. Select Columns & Options

Related Calculators

Learn More About Scatter Plots

What is a Scatter Plot?

A scatter plot (also called a scattergram) is a graph that shows the relationship between two continuous variables. Each point represents an individual data point, with its position determined by its x and y values. Scatter plots are ideal for visualizing correlation between variables and identifying patterns in your data.

Scatter plot anatomy showing axes, points, trend line, and correlation

The diagram above illustrates the components of a scatter plot. Each point represents a data pair, while the trend line shows the general relationship direction. The spread of points indicates the strength of correlation.

How to Read a Scatter Plot

Understanding scatter plots becomes intuitive once you know what to look for:

Key Components

  • X-axis: The horizontal axis representing one variable
  • Y-axis: The vertical axis representing another variable
  • Data Points: Each point represents a pair of values
  • Trend Line: Optional line showing the general relationship

What to Look For

  • Direction: Is the relationship positive, negative, or neutral?
  • Strength: How closely do the points follow a pattern?
  • Outliers: Are there any points far from the general pattern?
  • Clusters: Do points group together in certain areas?

How to Make a Scatter Plot with Our Calculator

  1. Click Sample Data and select Restaurant Tips
  2. For X-Axis column, select total_bill
  3. For Y-Axis column, select tip
  4. For Color By column, select day or leave it as None
  5. For Size By column, select size or leave it as None
  6. For Facet Column or Row, select day or leave it as None
  7. For Trend Line, check the box to show a trend line
  8. Click Generate Scatter Plot to visualize the data

Correlation in Scatter Plots

One of the main purposes of scatter plots is to visualize correlation between variables. Here's how to interpret different correlation patterns:

Positive Correlation

As X increases, Y tends to increase

Negative Correlation

As X increases, Y tends to decrease

No Correlation

No clear relationship between X and Y

Correlation Coefficient (r)

The correlation coefficient quantifies the strength and direction of a linear relationship:

  • r = 1: Perfect positive correlation
  • r = 0: No correlation
  • r = -1: Perfect negative correlation
  • 0.7 ≤ |r| ≤ 1.0: Strong correlation
  • 0.3 ≤ |r| < 0.7: Moderate correlation
  • 0.0 ≤ |r| < 0.3: Weak correlation

Advanced Scatter Plot Techniques

Multiple Groups

Use different colors or shapes to represent different groups or categories within your data, making it easier to spot group-specific patterns.

Bubble Charts

Vary the size of points to represent a third variable, transforming a scatter plot into a bubble chart that visualizes three dimensions of data.

Trend Lines

Add regression lines or curves (linear, logarithmic, exponential) to visualize the general relationship between variables and make predictions.

Matrix Scatter Plots

Create a grid of scatter plots showing relationships between multiple variables simultaneously for comprehensive multivariate analysis.

Scatter Plot Quick Reference

Key Formulas

Correlation (r) = Σ[(xi - x̄)(yi - ȳ)] / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]

Linear Regression: y = mx + b

where m = Σ[(xi - x̄)(yi - ȳ)] / Σ(xi - x̄)²

Coefficient of Determination (R²) = r²

Standard Error = √[Σ(yi - ŷi)² / (n-2)]

Frequently Asked Questions

When should I use a scatter plot instead of other charts?

Use scatter plots when you want to investigate the relationship between two continuous variables, identify potential correlations, detect outliers, or visualize the distribution of paired data points. They're especially useful when you want to see if one variable might be influencing another.

What's the difference between correlation and causation?

Correlation shows that two variables change together, but doesn't prove that one causes the other. Causation means one variable directly affects the other. A scatter plot can reveal correlation, but establishing causation requires controlled experiments and additional analysis.

How do I interpret outliers in a scatter plot?

Outliers are points that deviate significantly from the overall pattern. They may represent data errors, special cases, or important insights. Investigate outliers to determine if they should be removed (if they're errors) or highlighted (if they reveal something important about your data).

Can scatter plots show non-linear relationships?

Yes. While a simple linear trend line won't capture non-linear patterns, the scatter of points themselves will reveal curved or complex relationships. You can add non-linear trend lines (polynomial, logarithmic, exponential) to better fit such data patterns.

Creating Scatter Plots in Excel

Microsoft Excel is one of the most popular tools for creating scatter plots. Here's how to make a scatter plot in Excel:

  1. Select your data (two columns: X-values and Y-values)
  2. Go to the Insert tab
  3. In the Charts group, click Scatter
  4. Select your preferred scatter plot style
  5. To add a trend line, right-click on any data point and select Add Trendline
  6. For Excel 365 or Excel 2019, you can click on the chart, then use the Chart Design and Format tabs to customize further

Excel Tips:

  • Use CORREL(array1,array2) to calculate correlation coefficient
  • Use LINEST(y_values,x_values) for detailed regression statistics
  • To handle dates on axes, ensure they're formatted as dates in your data
  • Add a secondary axis with right-click → Format Axis → Axis Options

Creating Scatter Plots in Python

Python offers powerful libraries for creating scatter plots, with matplotlib and seaborn being the most popular choices.

Python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Load the data
tips = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Set style for better-looking plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Basic scatter plot with matplotlib
plt.figure(figsize=(10, 6))
plt.scatter(tips['total_bill'], tips['tip'], alpha=0.7)
plt.title('Basic Scatter Plot: Tips vs Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
plt.grid(True, alpha=0.3)
plt.show()

# Calculate correlation
correlation = np.corrcoef(tips['total_bill'], tips['tip'])[0, 1]
print(f"Correlation coefficient: {correlation:.4f}")

# Scatter plot with regression line using seaborn
plt.figure(figsize=(10, 6))
sns.regplot(x='total_bill', y='tip', data=tips, scatter_kws={'alpha':0.5}, line_kws={'color':'red'})
plt.title(f'Scatter Plot with Regression Line (r = {correlation:.4f})')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
plt.show()

# Create a scatter plot with time of day and party size
plt.figure(figsize=(12, 8))
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='time', size='size', sizes=(20, 200),
                palette='viridis', alpha=0.7)
plt.title('Restaurant Tips by Total Bill, Time, and Party Size')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip Amount ($)')
plt.legend(title='Time of Day')
plt.show()
Scatter Plot in PythonScatter Plot in Python

Creating Scatter Plots in R

R is a statistical programming language that excels at creating publication-quality scatter plots, especially with the ggplot2 package.

R
# Load necessary libraries
library(tidyverse)

# Load tips dataset
tips <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/tips.csv")

# Basic scatter plot
ggplot(tips, aes(x = total_bill, y = tip)) +
  geom_point(size = 3, alpha = 0.7) +
  labs(title = "Restaurant Tips vs Total Bill",
       subtitle = "Source: tips dataset",
       x = "Total Bill ($)",
       y = "Tip Amount ($)") +
  theme_minimal()

# Add a regression line and confidence interval
ggplot(tips, aes(x = total_bill, y = tip)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", color = "blue", fill = "lightblue") +
  labs(title = "Tips vs Total Bill with Linear Regression",
       x = "Total Bill ($)",
       y = "Tip Amount ($)") +
  theme_minimal()

# Calculate correlation coefficient
cor_value <- cor(tips$total_bill, tips$tip)
print(paste("Correlation coefficient:", round(cor_value, 4)))

# Scatter plot with time of day as color
ggplot(tips, aes(x = total_bill, y = tip, color = time)) +
  geom_point(size = 3, alpha = 0.8) +
  scale_color_viridis_d(name = "Time of Day") +
  labs(title = "Restaurant Tips by Time of Day",
       x = "Total Bill ($)",
       y = "Tip Amount ($)") +
  theme_minimal()

# Advanced scatter plot with multiple variables
ggplot(tips, aes(x = total_bill, y = tip, color = day)) +
  geom_point(aes(size = size), alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 0.5) +
  facet_wrap(~time, labeller = labeller(time = c("Lunch" = "Lunch", "Dinner" = "Dinner"))) +
  scale_color_brewer(palette = "Set1", name = "Day") +
  scale_size_continuous(name = "Party Size") +
  labs(title = "Restaurant Tips Analysis",
       subtitle = "Grouped by time of day, colored by day of week, sized by party size",
       x = "Total Bill ($)",
       y = "Tip Amount ($)") +
  theme_minimal() +
  theme(legend.position = "bottom")
Scatter Plot in RScatter Plot in R