Pivoting and Pivot Tables in Pandas

Chapters

Python Pandas Tutorial

Pivoting and pivot tables are powerful tools in Pandas that help reshape and summarize data. They allow us to transform rows into columns, calculate aggregates, and group by specific categories.

1. Pivot in Pandas

The pivot() function reshapes data by turning unique values from one column into multiple columns.

Example: Simple Pivot

data = {
    "date": ["6/1/2023", "6/2/2023", "6/3/2023",
             "6/1/2023", "6/2/2023", "6/3/2023",
             "6/1/2023", "6/2/2023", "6/3/2023"],
    "city": ["new york", "new york", "new york",
             "mumbai", "mumbai", "mumbai",
             "paris", "paris", "paris"],
    "temperature": [72, 74, 73, 85, 88, 90, 65, 67, 70],
    "humidity": [60, 62, 65, 75, 78, 80, 55, 58, 60]
}

df = pd.DataFrame(data)
print(df)
# Try 
df.pivot(index='city',columns='date')

# Then 
df.pivot(index='city',columns='date',values="humidity")

# then
df.pivot(index='date',columns='city')

# then 
df.pivot(index='humidity',columns='city')

Another Example

import pandas as pd

# Sample sales data
data = {
    "date": ["2023-01-01", "2023-01-01", "2023-01-02", "2023-01-02"],
    "city": ["New York", "Los Angeles", "New York", "Los Angeles"],
    "sales": [250, 200, 300, 220],
    "expenses": [150, 120, 180, 130]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Pivot sales by city
pivot_df = df.pivot(index="date", columns="city", values="sales")
print("\nPivot Table (Sales by City and Date):")
print(pivot_df)

2. Pivot Table with Aggregations

Unlike pivot(), pivot_table() allows aggregations (like sum, mean, count).

Example: Pivot Table

# Pivot table with sum of sales
pivot_tbl = df.pivot_table(index="date", columns="city", values="sales", aggfunc="sum")
print("\nPivot Table with Sum of Sales:")
print(pivot_tbl)

# Pivot table with both sales and expenses
pivot_tbl_multi = df.pivot_table(index="date", columns="city", values=["sales", "expenses"], aggfunc="sum")
print("\nPivot Table with Sales & Expenses:")
print(pivot_tbl_multi)

3. Pivot Table with Margins (Totals)

Adding margins=True gives row and column totals.

pivot_with_margins = df.pivot_table(
    index="date", columns="city", values="sales", aggfunc="sum", margins=True
)
print("\nPivot Table with Margins (Total):")
print(pivot_with_margins)

4. Using Grouper with Pivot Tables

The pd.Grouper() function helps group data by time frequency (daily, monthly, yearly, etc.).

Example: Grouping by Month with Cities

# Sample extended dataset
data2 = {
    "date": pd.date_range(start="2023-01-01", periods=8, freq="D"),
    "city": ["New York", "Los Angeles"] * 4,
    "sales": [250, 200, 300, 220, 270, 210, 350, 230]
}

df2 = pd.DataFrame(data2)

# Pivot table grouped by month
pivot_grouped = df2.pivot_table(
    index=pd.Grouper(freq="M", key="date"),
    columns="city",
    values="sales",
    aggfunc="sum",
    margins=True
)
print("\nPivot Table Grouped by Month with Margins:")
print(pivot_grouped)

🔑 Key Takeaways

pivot() → Reshapes data, but does not allow aggregation.
pivot_table() → Adds flexibility with aggregations like sum, mean, count.
margins=True → Adds totals for rows and columns.
pd.Grouper() → Helps group time-series data by frequency (daily, monthly, yearly).