1. Introduction
In real-world datasets, information is often stored in a wide format, where each column represents a variable. While this format is good for human readability, it is not always the best for data analysis and visualization.
The melt()
function in Pandas helps convert wide data into long (tidy) format, making it easier for analysis and integration with libraries like Seaborn or Matplotlib.
2. Syntax of melt()
pd.melt(frame,
id_vars=None,
value_vars=None,
var_name=None,
value_name='value')
frame
→ DataFrame to reshapeid_vars
→ Columns to keep fixed (identifiers)value_vars
→ Columns to unpivot (default: all exceptid_vars
)var_name
→ Name of the “variable” column (default:variable
)value_name
→ Name of the “value” column (default:value
)
3. Example Dataset
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Math': [85, 90, 95],
'Science': [88, 92, 96],
'English': [80, 85, 89]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Math Science English
0 Alice 85 88 80
1 Bob 90 92 85
2 Charlie 95 96 89
4. Applying melt()
(a) Basic Melt
df_melted = pd.melt(df, id_vars=['Name'])
print(df_melted)
Output:
Name variable value
0 Alice Math 85
1 Bob Math 90
2 Charlie Math 95
3 Alice Science 88
4 Bob Science 92
5 Charlie Science 96
6 Alice English 80
7 Bob English 85
8 Charlie English 89
(b) Custom Column Names
df_melted = pd.melt(df,
id_vars=['Name'],
var_name='Subject',
value_name='Score')
print(df_melted)
Output:
Name Subject Score
0 Alice Math 85
1 Bob Math 90
2 Charlie Math 95
3 Alice Science 88
4 Bob Science 92
5 Charlie Science 96
6 Alice English 80
7 Bob English 85
8 Charlie English 89
5. Why Use melt()
?
- Makes datasets tidy (each row = 1 observation, each column = 1 variable).
- Easier to plot using Seaborn (
sns.barplot
,sns.lineplot
). - Simplifies statistical analysis.
6. Example with Multiple Identifiers
data = {
'Student': ['Alice', 'Bob', 'Charlie'],
'Class': ['A', 'B', 'A'],
'Math': [85, 90, 95],
'Science': [88, 92, 96]
}
df = pd.DataFrame(data)
df_melted = pd.melt(df,
id_vars=['Student', 'Class'],
var_name='Subject',
value_name='Score')
print(df_melted)
Output:
Student Class Subject Score
0 Alice A Math 85
1 Bob B Math 90
2 Charlie A Math 95
3 Alice A Science 88
4 Bob B Science 92
5 Charlie A Science 96
7. Conclusion
The melt()
function is powerful for reshaping wide-format data into long-format data. This transformation is crucial for tidy data principles and ensures compatibility with advanced analysis and visualization techniques.