🧩 What is Crosstab?

pandas.crosstab() is a frequency table tool that summarizes the relationship between two or more categorical variables.
It counts occurrences of combinations of categories β€” similar to a pivot table, but specifically for categorical comparison.

Think of it as a quick way to analyze relationships between variables like gender vs. survival, class vs. embarkation point, or smoker vs. day in a restaurant dataset.


πŸ”Ή Syntax

pd.crosstab(index, columns, values=None, aggfunc=None, margins=False, normalize=False)

Parameters:

  • index: array or Series β€” values to group by on the rows
  • columns: array or Series β€” values to group by on the columns
  • values: array or Series β€” optional values to aggregate
  • aggfunc: function (like np.sum, np.mean)
  • margins: True/False β€” adds row and column totals
  • normalize: Normalizes counts (proportions or percentages)

🧩 When Should You Use Crosstab?

Use crosstab() when you want to:

βœ… Summarize categorical data
βœ… Compare distributions between variables
βœ… Analyze relationships between categories (e.g., gender and survival rate)
βœ… Quickly compute two-way or multi-way frequency tables

Common use cases:

  • Gender vs. Survival β€” Were women more likely to survive?
  • Class vs. Embarkation Port β€” Which class boarded from which port?
  • Age Group vs. Survival β€” Which age group survived more?

🧩 Example: Titanic Dataset

Let’s load the Titanic dataset from Seaborn and explore.

import pandas as pd
import seaborn as sns

# Load dataset
titanic = sns.load_dataset('titanic')

# Display first few rows
print(titanic.head())

🧠 Dataset Overview

ColumnDescription
survived0 = No, 1 = Yes
pclassPassenger class (1st, 2nd, 3rd)
sexGender
ageAge in years
sibsp# of siblings/spouses aboard
parch# of parents/children aboard
fareTicket fare
embarkedPort of embarkation (C, Q, S)
classClass label (First, Second, Third)
whoMan, woman, or child
deckDeck letter
embark_townTown of embarkation
aloneTrue/False β€” if traveling alone

🧩 Example 1 β€” Count of Survivors by Gender

pd.crosstab(titanic['sex'], titanic['survived'])

Output:

survived01
female81233
male468109

βœ… Interpretation:
Out of all females, 233 survived while 81 did not.
Among males, survival count was much lower (only 109 survived).
β†’ This confirms that females had a much higher survival rate.


🧩 Example 2 β€” Add Margins (Totals)

pd.crosstab(titanic['sex'], titanic['survived'], margins=True)

Output:

survived01All
female81233314
male468109577
All549342891

βœ… Adds totals for both rows and columns β€” similar to β€œGrand Totals” in Excel Pivot Tables.


🧩 Example 3 β€” Crosstab Between Class and Survival

pd.crosstab(titanic['class'], titanic['survived'])
survived01
First80136
Second9787
Third372119

βœ… Clearly, first-class passengers had the highest survival rates.


🧩 Example 4 β€” Normalized Crosstab (Proportions)

You can normalize results to get proportions instead of counts.

pd.crosstab(titanic['sex'], titanic['survived'], normalize='index')
survived01
female0.25790.7421
male0.81170.1883

βœ… Interpretation:

  • 74% of females survived
  • Only 18% of males survived

🧩 Example 5 β€” Multi-variable Crosstab

Let’s analyze survival based on both gender and class.

pd.crosstab([titanic['sex'], titanic['class']], titanic['survived'])
sexclass01
femaleFirst391
femaleSecond670
femaleThird7272
maleFirst7745
maleSecond9117
maleThird30047

βœ… Shows a detailed breakdown β€” e.g., 91 first-class females survived, only 47 third-class males survived.


🧩 Example 6 β€” Crosstab with an Aggregation Function

You can include a numeric column (values) and an aggregation function (aggfunc) to compute statistics.

Example: Average fare by class and survival.

pd.crosstab(
    titanic['class'],
    titanic['survived'],
    values=titanic['fare'],
    aggfunc='mean'
)
survived01
First64.6895.12
Second19.4922.05
Third13.3013.68

βœ… Interpretation:
Survivors generally paid higher fares, especially in first class.


🧩 Example 7 β€” Crosstab with normalize='columns'

Normalize column-wise to see proportions per survival outcome.

pd.crosstab(titanic['sex'], titanic['survived'], normalize='columns')
survived01
female0.14750.6813
male0.85250.3187

βœ… 68% of all survivors were female.


🧩 Example 8 β€” Using Multiple Columns for Index and Columns

Let’s explore how class and embark_town relate to survival.

pd.crosstab(
    [titanic['class'], titanic['embark_town']],
    titanic['survived'],
    margins=True
)
classembark_town01All
FirstCherbourg106070
FirstQueenstown123
FirstSouthampton6974143
SecondCherbourg81523
SecondQueenstown7310
SecondSouthampton8269151
ThirdCherbourg282149
ThirdQueenstown47754
ThirdSouthampton29791388
AllAll549342891

βœ… This gives a detailed breakdown of survival per class and boarding port.


🧩 Example 9 β€” Crosstab Visualization

Crosstabs can be visualized as heatmaps for better understanding.

import seaborn as sns
import matplotlib.pyplot as plt

ct = pd.crosstab(titanic['class'], titanic['survived'], normalize='index')

sns.heatmap(ct, annot=True, cmap='coolwarm')
plt.title("Survival Rate by Passenger Class")
plt.ylabel("Passenger Class")
plt.xlabel("Survived")
plt.show()

βœ… The heatmap visually shows which classes had higher survival rates.


🧩 Summary

ConceptDescriptionExample
pd.crosstab()Create frequency tablespd.crosstab(df['A'], df['B'])
margins=TrueAdds totalspd.crosstab(A,B,margins=True)
normalize='index'Row-wise percentagenormalize='index'
normalize='columns'Column-wise percentagenormalize='columns'
values + aggfuncAggregate numeric datavalues=df['fare'], aggfunc='mean'
Multiple levelsMulti-variable groupingpd.crosstab([A,B],C)

βœ… Key Takeaways

  • pd.crosstab() is perfect for categorical analysis and summarization.
  • Use it for comparing variables like gender, class, or survival.
  • Combine it with margins and normalize for deeper insights.
  • You can even aggregate numeric values (e.g., mean fare).
  • Great for exploratory data analysis (EDA) and reporting.

🧩 What is Crosstab?

pandas.crosstab() is a frequency table tool that summarizes the relationship between two or more categorical variables.
It counts occurrences of combinations of categories β€” similar to a pivot table, but specifically for categorical comparison.

Think of it as a quick way to analyze relationships between variables like gender vs. survival, class vs. embarkation point, or smoker vs. day in a restaurant dataset.


πŸ”Ή Syntax

pd.crosstab(index, columns, values=None, aggfunc=None, margins=False, normalize=False)

Parameters:

  • index: array or Series β€” values to group by on the rows
  • columns: array or Series β€” values to group by on the columns
  • values: array or Series β€” optional values to aggregate
  • aggfunc: function (like np.sum, np.mean)
  • margins: True/False β€” adds row and column totals
  • normalize: Normalizes counts (proportions or percentages)

🧩 When Should You Use Crosstab?

Use crosstab() when you want to:

βœ… Summarize categorical data
βœ… Compare distributions between variables
βœ… Analyze relationships between categories (e.g., gender and survival rate)
βœ… Quickly compute two-way or multi-way frequency tables

Common use cases:

  • Gender vs. Survival β€” Were women more likely to survive?
  • Class vs. Embarkation Port β€” Which class boarded from which port?
  • Age Group vs. Survival β€” Which age group survived more?

🧩 Example: Titanic Dataset

Let’s load the Titanic dataset from Seaborn and explore.

import pandas as pd
import seaborn as sns

# Load dataset
titanic = sns.load_dataset('titanic')

# Display first few rows
print(titanic.head())

🧠 Dataset Overview

ColumnDescription
survived0 = No, 1 = Yes
pclassPassenger class (1st, 2nd, 3rd)
sexGender
ageAge in years
sibsp# of siblings/spouses aboard
parch# of parents/children aboard
fareTicket fare
embarkedPort of embarkation (C, Q, S)
classClass label (First, Second, Third)
whoMan, woman, or child
deckDeck letter
embark_townTown of embarkation
aloneTrue/False β€” if traveling alone

🧩 Example 1 β€” Count of Survivors by Gender

pd.crosstab(titanic['sex'], titanic['survived'])

Output:

survived01
female81233
male468109

βœ… Interpretation:
Out of all females, 233 survived while 81 did not.
Among males, survival count was much lower (only 109 survived).
β†’ This confirms that females had a much higher survival rate.


🧩 Example 2 β€” Add Margins (Totals)

pd.crosstab(titanic['sex'], titanic['survived'], margins=True)

Output:

survived01All
female81233314
male468109577
All549342891

βœ… Adds totals for both rows and columns β€” similar to β€œGrand Totals” in Excel Pivot Tables.


🧩 Example 3 β€” Crosstab Between Class and Survival

pd.crosstab(titanic['class'], titanic['survived'])
survived01
First80136
Second9787
Third372119

βœ… Clearly, first-class passengers had the highest survival rates.


🧩 Example 4 β€” Normalized Crosstab (Proportions)

You can normalize results to get proportions instead of counts.

pd.crosstab(titanic['sex'], titanic['survived'], normalize='index')
survived01
female0.25790.7421
male0.81170.1883

βœ… Interpretation:

  • 74% of females survived
  • Only 18% of males survived

🧩 Example 5 β€” Multi-variable Crosstab

Let’s analyze survival based on both gender and class.

pd.crosstab([titanic['sex'], titanic['class']], titanic['survived'])
sexclass01
femaleFirst391
femaleSecond670
femaleThird7272
maleFirst7745
maleSecond9117
maleThird30047

βœ… Shows a detailed breakdown β€” e.g., 91 first-class females survived, only 47 third-class males survived.


🧩 Example 6 β€” Crosstab with an Aggregation Function

You can include a numeric column (values) and an aggregation function (aggfunc) to compute statistics.

Example: Average fare by class and survival.

pd.crosstab(
    titanic['class'],
    titanic['survived'],
    values=titanic['fare'],
    aggfunc='mean'
)
survived01
First64.6895.12
Second19.4922.05
Third13.3013.68

βœ… Interpretation:
Survivors generally paid higher fares, especially in first class.


🧩 Example 7 β€” Crosstab with normalize='columns'

Normalize column-wise to see proportions per survival outcome.

pd.crosstab(titanic['sex'], titanic['survived'], normalize='columns')
survived01
female0.14750.6813
male0.85250.3187

βœ… 68% of all survivors were female.


🧩 Example 8 β€” Using Multiple Columns for Index and Columns

Let’s explore how class and embark_town relate to survival.

pd.crosstab(
    [titanic['class'], titanic['embark_town']],
    titanic['survived'],
    margins=True
)
classembark_town01All
FirstCherbourg106070
FirstQueenstown123
FirstSouthampton6974143
SecondCherbourg81523
SecondQueenstown7310
SecondSouthampton8269151
ThirdCherbourg282149
ThirdQueenstown47754
ThirdSouthampton29791388
AllAll549342891

βœ… This gives a detailed breakdown of survival per class and boarding port.


🧩 Example 9 β€” Crosstab Visualization

Crosstabs can be visualized as heatmaps for better understanding.

import seaborn as sns
import matplotlib.pyplot as plt

ct = pd.crosstab(titanic['class'], titanic['survived'], normalize='index')

sns.heatmap(ct, annot=True, cmap='coolwarm')
plt.title("Survival Rate by Passenger Class")
plt.ylabel("Passenger Class")
plt.xlabel("Survived")
plt.show()

βœ… The heatmap visually shows which classes had higher survival rates.


🧩 Summary

ConceptDescriptionExample
pd.crosstab()Create frequency tablespd.crosstab(df['A'], df['B'])
margins=TrueAdds totalspd.crosstab(A,B,margins=True)
normalize='index'Row-wise percentagenormalize='index'
normalize='columns'Column-wise percentagenormalize='columns'
values + aggfuncAggregate numeric datavalues=df['fare'], aggfunc='mean'
Multiple levelsMulti-variable groupingpd.crosstab([A,B],C)

βœ… Key Takeaways

  • pd.crosstab() is perfect for categorical analysis and summarization.
  • Use it for comparing variables like gender, class, or survival.
  • Combine it with margins and normalize for deeper insights.
  • You can even aggregate numeric values (e.g., mean fare).
  • Great for exploratory data analysis (EDA) and reporting.