Pivoting and pivot tables are powerful tools in Pandas that help reshape and summarize data. They allow us to transform rows into columns, calculate aggregates, and group by specific categories.
1. Pivot in Pandas
The pivot()
function reshapes data by turning unique values from one column into multiple columns.
Example: Simple Pivot
data = {
"date": ["6/1/2023", "6/2/2023", "6/3/2023",
"6/1/2023", "6/2/2023", "6/3/2023",
"6/1/2023", "6/2/2023", "6/3/2023"],
"city": ["new york", "new york", "new york",
"mumbai", "mumbai", "mumbai",
"paris", "paris", "paris"],
"temperature": [72, 74, 73, 85, 88, 90, 65, 67, 70],
"humidity": [60, 62, 65, 75, 78, 80, 55, 58, 60]
}
df = pd.DataFrame(data)
print(df)
# Try
df.pivot(index='city',columns='date')
# Then
df.pivot(index='city',columns='date',values="humidity")
# then
df.pivot(index='date',columns='city')
# then
df.pivot(index='humidity',columns='city')
Another Example
import pandas as pd
# Sample sales data
data = {
"date": ["2023-01-01", "2023-01-01", "2023-01-02", "2023-01-02"],
"city": ["New York", "Los Angeles", "New York", "Los Angeles"],
"sales": [250, 200, 300, 220],
"expenses": [150, 120, 180, 130]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Pivot sales by city
pivot_df = df.pivot(index="date", columns="city", values="sales")
print("\nPivot Table (Sales by City and Date):")
print(pivot_df)
2. Pivot Table with Aggregations
Unlike pivot()
, pivot_table()
allows aggregations (like sum, mean, count).
Example: Pivot Table
# Pivot table with sum of sales
pivot_tbl = df.pivot_table(index="date", columns="city", values="sales", aggfunc="sum")
print("\nPivot Table with Sum of Sales:")
print(pivot_tbl)
# Pivot table with both sales and expenses
pivot_tbl_multi = df.pivot_table(index="date", columns="city", values=["sales", "expenses"], aggfunc="sum")
print("\nPivot Table with Sales & Expenses:")
print(pivot_tbl_multi)
3. Pivot Table with Margins (Totals)
Adding margins=True
gives row and column totals.
pivot_with_margins = df.pivot_table(
index="date", columns="city", values="sales", aggfunc="sum", margins=True
)
print("\nPivot Table with Margins (Total):")
print(pivot_with_margins)
4. Using Grouper with Pivot Tables
The pd.Grouper()
function helps group data by time frequency (daily, monthly, yearly, etc.).
Example: Grouping by Month with Cities
# Sample extended dataset
data2 = {
"date": pd.date_range(start="2023-01-01", periods=8, freq="D"),
"city": ["New York", "Los Angeles"] * 4,
"sales": [250, 200, 300, 220, 270, 210, 350, 230]
}
df2 = pd.DataFrame(data2)
# Pivot table grouped by month
pivot_grouped = df2.pivot_table(
index=pd.Grouper(freq="M", key="date"),
columns="city",
values="sales",
aggfunc="sum",
margins=True
)
print("\nPivot Table Grouped by Month with Margins:")
print(pivot_grouped)
🔑 Key Takeaways
pivot()
→ Reshapes data, but does not allow aggregation.pivot_table()
→ Adds flexibility with aggregations like sum, mean, count.margins=True
→ Adds totals for rows and columns.pd.Grouper()
→ Helps group time-series data by frequency (daily, monthly, yearly).