Scatter plots are one of the most important tools in data visualization. They show the relationship between two variables by representing each observation as a point on the 2D plane.

In this chapter, we’ll cover:

  • What scatter plots are and when to use them
  • Creating basic scatter plots
  • Customizing colors, sizes, and markers
  • Adding multiple scatter plots
  • Adding transparency and colormaps
  • Using scatter plots for data analysis

1. What is a Scatter Plot?

A scatter plot uses points to represent values of two different variables:

  • x-axis → Independent variable
  • y-axis → Dependent variable
  • Each point → A single observation in the dataset

Scatter plots are useful for:

  • Identifying trends, clusters, or outliers
  • Showing correlations between variables
  • Comparing groups of data

2. Creating a Basic Scatter Plot

import matplotlib.pyplot as plt

# Data
x = [5, 7, 8, 7, 6, 9, 5, 6, 7, 8]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]

# Create scatter plot
plt.scatter(x, y)

# Labels and title
plt.xlabel("X values")
plt.ylabel("Y values")
plt.title("Basic Scatter Plot")

plt.show()

✅ Each point represents an (x, y) pair.


3. Customizing Markers and Colors

You can change the color, shape, and size of points:

plt.scatter(x, y, color='red', marker='o', s=100, edgecolor='black')
plt.title("Customized Scatter Plot")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()
  • color → fill color of points
  • marker'o', 's', '^', '*', 'D' etc.
  • s → size of points
  • edgecolor → outline color

Another Example of IRIS Dataset from Seaborn Library

import seaborn as sns
# iris dataset 

iris = sns.load_dataset("iris") 
plt.scatter(iris.sepal_length, iris.sepal_width, color='orange', marker='*', s=100, edgecolor='black')
plt.title("sepal_length vs sepal_width")
plt.xlabel("sepal_length")
plt.ylabel("sepal_width")
plt.show()

4. Plotting Multiple Scatter Plots

You can compare different groups by plotting multiple scatter datasets:

x1 = [5, 7, 8, 7, 6]
y1 = [99, 86, 87, 88, 100]

x2 = [6, 9, 5, 6, 7]
y2 = [103, 87, 94, 78, 85]

plt.scatter(x1, y1, color='blue', label="Group 1")
plt.scatter(x2, y2, color='green', label="Group 2")

plt.title("Multiple Scatter Plots")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.legend()
plt.show()

5. Adding Transparency (alpha)

Transparency helps when points overlap:

plt.scatter(x, y, color='purple', alpha=0.5, s=120)
plt.title("Scatter Plot with Transparency")
plt.show()
  • alpha ranges from 0 (fully transparent) to 1 (opaque).

6. Using Colormap to Show a Third Variable

Scatter plots can represent a third variable using color:

import numpy as np

# Example data
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50)   # third variable

plt.scatter(x, y, c=colors, cmap='viridis', s=100)
plt.colorbar(label="Color scale")
plt.title("Scatter Plot with Colormap")
plt.show()
  • c → array of values for color mapping
  • cmap → colormap ('viridis', 'plasma', 'cool', etc.)
  • plt.colorbar() → adds a color scale legend

7. Bubble Plot (Using Size as a Variable)

You can use marker size to represent a fourth variable:

sizes = np.random.randint(50, 500, size=50)

plt.scatter(x, y, c=colors, s=sizes, cmap='plasma', alpha=0.6, edgecolor='black')
plt.colorbar(label="Color scale")
plt.title("Bubble Plot")
plt.show()

Here:

  • s represents the bubble size
  • Larger points mean higher values in the size variable

8. Adding Annotations

You can highlight specific points:

plt.scatter(x, y, color='orange', s=80)
plt.annotate("Important Point", xy=(x[0], y[0]), xytext=(0.2, 0.9),
             arrowprops=dict(facecolor='black', shrink=0.05))
plt.title("Scatter Plot with Annotation")
plt.show()

✅ Summary

In this chapter, you learned how to:

  • Create basic scatter plots
  • Customize color, size, and markers
  • Plot multiple groups
  • Add transparency and colormaps
  • Create bubble plots
  • Annotate specific points

Scatter plots are essential for exploring relationships and correlations in data.

Scatter plots are one of the most important tools in data visualization. They show the relationship between two variables by representing each observation as a point on the 2D plane.

In this chapter, we’ll cover:

  • What scatter plots are and when to use them
  • Creating basic scatter plots
  • Customizing colors, sizes, and markers
  • Adding multiple scatter plots
  • Adding transparency and colormaps
  • Using scatter plots for data analysis

1. What is a Scatter Plot?

A scatter plot uses points to represent values of two different variables:

  • x-axis → Independent variable
  • y-axis → Dependent variable
  • Each point → A single observation in the dataset

Scatter plots are useful for:

  • Identifying trends, clusters, or outliers
  • Showing correlations between variables
  • Comparing groups of data

2. Creating a Basic Scatter Plot

import matplotlib.pyplot as plt

# Data
x = [5, 7, 8, 7, 6, 9, 5, 6, 7, 8]
y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78]

# Create scatter plot
plt.scatter(x, y)

# Labels and title
plt.xlabel("X values")
plt.ylabel("Y values")
plt.title("Basic Scatter Plot")

plt.show()

✅ Each point represents an (x, y) pair.


3. Customizing Markers and Colors

You can change the color, shape, and size of points:

plt.scatter(x, y, color='red', marker='o', s=100, edgecolor='black')
plt.title("Customized Scatter Plot")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.show()
  • color → fill color of points
  • marker'o', 's', '^', '*', 'D' etc.
  • s → size of points
  • edgecolor → outline color

Another Example of IRIS Dataset from Seaborn Library

import seaborn as sns
# iris dataset 

iris = sns.load_dataset("iris") 
plt.scatter(iris.sepal_length, iris.sepal_width, color='orange', marker='*', s=100, edgecolor='black')
plt.title("sepal_length vs sepal_width")
plt.xlabel("sepal_length")
plt.ylabel("sepal_width")
plt.show()

4. Plotting Multiple Scatter Plots

You can compare different groups by plotting multiple scatter datasets:

x1 = [5, 7, 8, 7, 6]
y1 = [99, 86, 87, 88, 100]

x2 = [6, 9, 5, 6, 7]
y2 = [103, 87, 94, 78, 85]

plt.scatter(x1, y1, color='blue', label="Group 1")
plt.scatter(x2, y2, color='green', label="Group 2")

plt.title("Multiple Scatter Plots")
plt.xlabel("X values")
plt.ylabel("Y values")
plt.legend()
plt.show()

5. Adding Transparency (alpha)

Transparency helps when points overlap:

plt.scatter(x, y, color='purple', alpha=0.5, s=120)
plt.title("Scatter Plot with Transparency")
plt.show()
  • alpha ranges from 0 (fully transparent) to 1 (opaque).

6. Using Colormap to Show a Third Variable

Scatter plots can represent a third variable using color:

import numpy as np

# Example data
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50)   # third variable

plt.scatter(x, y, c=colors, cmap='viridis', s=100)
plt.colorbar(label="Color scale")
plt.title("Scatter Plot with Colormap")
plt.show()
  • c → array of values for color mapping
  • cmap → colormap ('viridis', 'plasma', 'cool', etc.)
  • plt.colorbar() → adds a color scale legend

7. Bubble Plot (Using Size as a Variable)

You can use marker size to represent a fourth variable:

sizes = np.random.randint(50, 500, size=50)

plt.scatter(x, y, c=colors, s=sizes, cmap='plasma', alpha=0.6, edgecolor='black')
plt.colorbar(label="Color scale")
plt.title("Bubble Plot")
plt.show()

Here:

  • s represents the bubble size
  • Larger points mean higher values in the size variable

8. Adding Annotations

You can highlight specific points:

plt.scatter(x, y, color='orange', s=80)
plt.annotate("Important Point", xy=(x[0], y[0]), xytext=(0.2, 0.9),
             arrowprops=dict(facecolor='black', shrink=0.05))
plt.title("Scatter Plot with Annotation")
plt.show()

✅ Summary

In this chapter, you learned how to:

  • Create basic scatter plots
  • Customize color, size, and markers
  • Plot multiple groups
  • Add transparency and colormaps
  • Create bubble plots
  • Annotate specific points

Scatter plots are essential for exploring relationships and correlations in data.