Scatter plots are one of the most important tools in data visualization. They show the relationship between two variables by representing each observation as a point on the 2D plane.
In this chapter, we’ll cover:
- What scatter plots are and when to use them
- Creating basic scatter plots
- Customizing colors, sizes, and markers
- Adding multiple scatter plots
- Adding transparency and colormaps
- Using scatter plots for data analysis
1. What is a Scatter Plot?
A scatter plot uses points to represent values of two different variables:
- x-axis → Independent variable
- y-axis → Dependent variable
- Each point → A single observation in the dataset
Scatter plots are useful for:
- Identifying trends, clusters, or outliers
- Showing correlations between variables
- Comparing groups of data
2. Creating a Basic Scatter Plot
import matplotlib.pyplot as plt
# Data
x = [5, 7, 8, 7, 6, 9, 5, 6, 7, 8]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]
# Create scatter plot
plt.scatter(x, y)
# Labels and title
plt.xlabel("X values")
plt.ylabel("Y values")
plt.title("Basic Scatter Plot")
plt.show()
✅ Each point represents an (x, y)
pair.
3. Customizing Markers and Colors
You can change the color, shape, and size of points:
plt.scatter(x, y, color='red', marker='o', s=100, edgecolor='black')
plt.title("Customized Scatter Plot")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()
color
→ fill color of pointsmarker
→'o'
,'s'
,'^'
,'*'
,'D'
etc.s
→ size of pointsedgecolor
→ outline color
Another Example of IRIS Dataset from Seaborn Library
import seaborn as sns
# iris dataset
iris = sns.load_dataset("iris")
plt.scatter(iris.sepal_length, iris.sepal_width, color='orange', marker='*', s=100, edgecolor='black')
plt.title("sepal_length vs sepal_width")
plt.xlabel("sepal_length")
plt.ylabel("sepal_width")
plt.show()
4. Plotting Multiple Scatter Plots
You can compare different groups by plotting multiple scatter datasets:
x1 = [5, 7, 8, 7, 6]
y1 = [99, 86, 87, 88, 100]
x2 = [6, 9, 5, 6, 7]
y2 = [103, 87, 94, 78, 85]
plt.scatter(x1, y1, color='blue', label="Group 1")
plt.scatter(x2, y2, color='green', label="Group 2")
plt.title("Multiple Scatter Plots")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.legend()
plt.show()
5. Adding Transparency (alpha)
Transparency helps when points overlap:
plt.scatter(x, y, color='purple', alpha=0.5, s=120)
plt.title("Scatter Plot with Transparency")
plt.show()
alpha
ranges from 0 (fully transparent) to 1 (opaque).
6. Using Colormap to Show a Third Variable
Scatter plots can represent a third variable using color:
import numpy as np
# Example data
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50) # third variable
plt.scatter(x, y, c=colors, cmap='viridis', s=100)
plt.colorbar(label="Color scale")
plt.title("Scatter Plot with Colormap")
plt.show()
c
→ array of values for color mappingcmap
→ colormap ('viridis'
,'plasma'
,'cool'
, etc.)plt.colorbar()
→ adds a color scale legend
7. Bubble Plot (Using Size as a Variable)
You can use marker size to represent a fourth variable:
sizes = np.random.randint(50, 500, size=50)
plt.scatter(x, y, c=colors, s=sizes, cmap='plasma', alpha=0.6, edgecolor='black')
plt.colorbar(label="Color scale")
plt.title("Bubble Plot")
plt.show()
Here:
s
represents the bubble size- Larger points mean higher values in the size variable
8. Adding Annotations
You can highlight specific points:
plt.scatter(x, y, color='orange', s=80)
plt.annotate("Important Point", xy=(x[0], y[0]), xytext=(0.2, 0.9),
arrowprops=dict(facecolor='black', shrink=0.05))
plt.title("Scatter Plot with Annotation")
plt.show()
✅ Summary
In this chapter, you learned how to:
- Create basic scatter plots
- Customize color, size, and markers
- Plot multiple groups
- Add transparency and colormaps
- Create bubble plots
- Annotate specific points
Scatter plots are essential for exploring relationships and correlations in data.