Histograms are a type of plot used to visualize the distribution of a dataset. Unlike bar plots, which show categorical comparisons, histograms group continuous numerical data into bins and show how many values fall into each bin.

In this chapter, we’ll cover:

  • What histograms are and when to use them
  • Creating basic histograms
  • Customizing bins, colors, and ranges
  • Adding labels and density plots

1. What is a Histogram?

A histogram is a graphical representation of the distribution of numerical data.

  • The x-axis represents intervals (bins) of data values
  • The y-axis represents the frequency (count) of values in each bin

Histograms are useful for:

  • Understanding data distribution (normal, skewed, uniform)
  • Detecting outliers
  • Comparing datasets

2. Creating a Basic Histogram

import matplotlib.pyplot as plt

# Sample data
data = [12, 15, 13, 17, 19, 15, 12, 16, 18, 14, 15, 13, 12, 16]

# Create histogram
plt.hist(data)

# Add labels and title
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Basic Histogram")

# Show plot
plt.show()
basic histogram plot

✅ The plot will automatically divide the data into 10 bins by default.


3. Customizing Number of Bins

You can control how many bins are used to group the data:

plt.hist(data, bins=5, color='skyblue', edgecolor='black')
plt.title("Histogram with 5 Bins")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
  • bins determines how finely the data is divided
  • edgecolor adds a border to each bar for clarity
histogram with bins

4. Changing Range of Histogram

You can specify the range of values to include:

plt.hist(data, bins=5, range=(12, 18), color='orange', edgecolor='black')
plt.title("Histogram with Specified Range")
plt.show()

Only values within 12–18 are considered; others are ignored.

Another Example of Histogram Plot

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/slidescope/Fitness-Health-Tracking-Dataset/refs/heads/main/fitness_health_tracking.csv')

# read fitness health tracking data 
df.head()

ds = df.Daily_Steps

plt.hist(ds, bins=20, edgecolor="#eee")

 read fitness health tracking data histogram

5. Multiple Histograms in One Plot

To compare two datasets, you can overlay histograms:

data1 = [12, 15, 13, 17, 19, 15, 12, 16]
data2 = [14, 16, 15, 18, 20, 14, 17, 19]

plt.hist(data1, bins=5, alpha=0.5, label='Dataset 1', color='blue')
plt.hist(data2, bins=5, alpha=0.5, label='Dataset 2', color='green')

plt.title("Multiple Histograms")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()
  • alpha controls transparency so overlapping histograms can be seen
  • label adds legend entries

6. Normalized Histogram (Density Plot)

You can normalize the histogram to represent probability density instead of frequency:

plt.hist(data, bins=5, density=True, color='purple', edgecolor='black')
plt.title("Normalized Histogram (Density Plot)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

This is useful when comparing distributions of datasets with different sizes.


7. Customizing Histogram Appearance

You can customize:

  • histtype: 'bar', 'barstacked', 'step', 'stepfilled'
  • color, edgecolor
  • linewidth
plt.hist(data, bins=5, color='cyan', edgecolor='black', linewidth=1.2, histtype='stepfilled')
plt.title("Customized Histogram")
plt.show()

✅ Summary

In this chapter, you learned how to:

  • Create basic histograms
  • Adjust bins and ranges
  • Overlay multiple histograms for comparison
  • Normalize data using density plots
  • Customize colors, edges, and styles

Histograms are a powerful tool to understand data distribution, identify patterns, and prepare datasets for further analysis.

Histograms are a type of plot used to visualize the distribution of a dataset. Unlike bar plots, which show categorical comparisons, histograms group continuous numerical data into bins and show how many values fall into each bin.

In this chapter, we’ll cover:

  • What histograms are and when to use them
  • Creating basic histograms
  • Customizing bins, colors, and ranges
  • Adding labels and density plots

1. What is a Histogram?

A histogram is a graphical representation of the distribution of numerical data.

  • The x-axis represents intervals (bins) of data values
  • The y-axis represents the frequency (count) of values in each bin

Histograms are useful for:

  • Understanding data distribution (normal, skewed, uniform)
  • Detecting outliers
  • Comparing datasets

2. Creating a Basic Histogram

import matplotlib.pyplot as plt

# Sample data
data = [12, 15, 13, 17, 19, 15, 12, 16, 18, 14, 15, 13, 12, 16]

# Create histogram
plt.hist(data)

# Add labels and title
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.title("Basic Histogram")

# Show plot
plt.show()
basic histogram plot

✅ The plot will automatically divide the data into 10 bins by default.


3. Customizing Number of Bins

You can control how many bins are used to group the data:

plt.hist(data, bins=5, color='skyblue', edgecolor='black')
plt.title("Histogram with 5 Bins")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
  • bins determines how finely the data is divided
  • edgecolor adds a border to each bar for clarity
histogram with bins

4. Changing Range of Histogram

You can specify the range of values to include:

plt.hist(data, bins=5, range=(12, 18), color='orange', edgecolor='black')
plt.title("Histogram with Specified Range")
plt.show()

Only values within 12–18 are considered; others are ignored.

Another Example of Histogram Plot

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/slidescope/Fitness-Health-Tracking-Dataset/refs/heads/main/fitness_health_tracking.csv')

# read fitness health tracking data 
df.head()

ds = df.Daily_Steps

plt.hist(ds, bins=20, edgecolor="#eee")

 read fitness health tracking data histogram

5. Multiple Histograms in One Plot

To compare two datasets, you can overlay histograms:

data1 = [12, 15, 13, 17, 19, 15, 12, 16]
data2 = [14, 16, 15, 18, 20, 14, 17, 19]

plt.hist(data1, bins=5, alpha=0.5, label='Dataset 1', color='blue')
plt.hist(data2, bins=5, alpha=0.5, label='Dataset 2', color='green')

plt.title("Multiple Histograms")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()
  • alpha controls transparency so overlapping histograms can be seen
  • label adds legend entries

6. Normalized Histogram (Density Plot)

You can normalize the histogram to represent probability density instead of frequency:

plt.hist(data, bins=5, density=True, color='purple', edgecolor='black')
plt.title("Normalized Histogram (Density Plot)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.show()

This is useful when comparing distributions of datasets with different sizes.


7. Customizing Histogram Appearance

You can customize:

  • histtype: 'bar', 'barstacked', 'step', 'stepfilled'
  • color, edgecolor
  • linewidth
plt.hist(data, bins=5, color='cyan', edgecolor='black', linewidth=1.2, histtype='stepfilled')
plt.title("Customized Histogram")
plt.show()

✅ Summary

In this chapter, you learned how to:

  • Create basic histograms
  • Adjust bins and ranges
  • Overlay multiple histograms for comparison
  • Normalize data using density plots
  • Customize colors, edges, and styles

Histograms are a powerful tool to understand data distribution, identify patterns, and prepare datasets for further analysis.