Histograms are a type of plot used to visualize the distribution of a dataset. Unlike bar plots, which show categorical comparisons, histograms group continuous numerical data into bins and show how many values fall into each bin.
In this chapter, we’ll cover:
- What histograms are and when to use them
- Creating basic histograms
- Customizing bins, colors, and ranges
- Adding labels and density plots
1. What is a Histogram?
A histogram is a graphical representation of the distribution of numerical data.
- The x-axis represents intervals (bins) of data values
- The y-axis represents the frequency (count) of values in each bin
Histograms are useful for:
- Understanding data distribution (normal, skewed, uniform)
- Detecting outliers
- Comparing datasets
2. Creating a Basic Histogram
import matplotlib.pyplot as plt
# Sample data
data = [12, 15, 13, 17, 19, 15, 12, 16, 18, 14, 15, 13, 12, 16]
# Create histogram
plt.hist(data)
# Add labels and title
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Basic Histogram")
# Show plot
plt.show()
✅ The plot will automatically divide the data into 10 bins by default.
3. Customizing Number of Bins
You can control how many bins are used to group the data:
plt.hist(data, bins=5, color='skyblue', edgecolor='black')
plt.title("Histogram with 5 Bins")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
bins
determines how finely the data is dividededgecolor
adds a border to each bar for clarity
4. Changing Range of Histogram
You can specify the range of values to include:
plt.hist(data, bins=5, range=(12, 18), color='orange', edgecolor='black')
plt.title("Histogram with Specified Range")
plt.show()
Only values within 12–18 are considered; others are ignored.
Another Example of Histogram Plot
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/slidescope/Fitness-Health-Tracking-Dataset/refs/heads/main/fitness_health_tracking.csv')
# read fitness health tracking data
df.head()
ds = df.Daily_Steps
plt.hist(ds, bins=20, edgecolor="#eee")
5. Multiple Histograms in One Plot
To compare two datasets, you can overlay histograms:
data1 = [12, 15, 13, 17, 19, 15, 12, 16]
data2 = [14, 16, 15, 18, 20, 14, 17, 19]
plt.hist(data1, bins=5, alpha=0.5, label='Dataset 1', color='blue')
plt.hist(data2, bins=5, alpha=0.5, label='Dataset 2', color='green')
plt.title("Multiple Histograms")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()
alpha
controls transparency so overlapping histograms can be seenlabel
adds legend entries
6. Normalized Histogram (Density Plot)
You can normalize the histogram to represent probability density instead of frequency:
plt.hist(data, bins=5, density=True, color='purple', edgecolor='black')
plt.title("Normalized Histogram (Density Plot)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()
This is useful when comparing distributions of datasets with different sizes.
7. Customizing Histogram Appearance
You can customize:
histtype
:'bar'
,'barstacked'
,'step'
,'stepfilled'
color
,edgecolor
linewidth
plt.hist(data, bins=5, color='cyan', edgecolor='black', linewidth=1.2, histtype='stepfilled')
plt.title("Customized Histogram")
plt.show()
✅ Summary
In this chapter, you learned how to:
- Create basic histograms
- Adjust bins and ranges
- Overlay multiple histograms for comparison
- Normalize data using density plots
- Customize colors, edges, and styles
Histograms are a powerful tool to understand data distribution, identify patterns, and prepare datasets for further analysis.