Excel files (.xls
, .xlsx
) are one of the most common formats for sharing tabular data. Pandas makes it very simple to read from and write to Excel files.
1. Requirements
- Install
openpyxl
(for.xlsx
) orxlrd
(for.xls
):
pip install openpyxl xlrd
By default:
- pandas ≥ 1.2 uses openpyxl to read/write
.xlsx
. .xls
(old format) requires xlrd (but newer versions of xlrd support only.xls
, not.xlsx
).
2. Reading Excel Files
Basic read
import pandas as pd
df = pd.read_excel("data.xlsx") # reads first sheet by default
print(df.head())
Read a specific sheet
df = pd.read_excel("data.xlsx", sheet_name="Sales")
sheet_name
can be a string (sheet name), int (sheet index, starting from 0), or list of names/indices.
sheets = pd.read_excel("data.xlsx", sheet_name=["Sales","Expenses"])
print(sheets["Sales"].head())
Read only selected columns
df = pd.read_excel("data.xlsx", usecols=["A","C","F"]) # by Excel columns
# or by column names
df = pd.read_excel("data.xlsx", usecols=["Name","Salary"])
Read only specific rows
df = pd.read_excel("data.xlsx", skiprows=2, nrows=10)
skiprows=2
→ skip first 2 rows.nrows=10
→ read only 10 rows.
3. Writing to Excel
Write a DataFrame to Excel
df.to_excel("output.xlsx", index=False)
index=False
→ prevents writing row numbers.
Write to specific sheet
df.to_excel("output.xlsx", sheet_name="Summary", index=False)
Write multiple DataFrames to one Excel file
Use ExcelWriter:
with pd.ExcelWriter("report.xlsx") as writer:
df1.to_excel(writer, sheet_name="Sales", index=False)
df2.to_excel(writer, sheet_name="Expenses", index=False)
Append new sheet to an existing Excel file
with pd.ExcelWriter("report.xlsx", mode="a", engine="openpyxl") as writer:
df3.to_excel(writer, sheet_name="NewData", index=False)
4. Controlling Output
Change column order and formatting
df.to_excel("output.xlsx", columns=["Name","Salary","Age"], index=False)
Add styling (optional with Styler)
styled = df.style.highlight_max(color="lightgreen")
styled.to_excel("styled.xlsx", engine="openpyxl")
5. Reading/Writing Excel with multiple sheets
- Read all sheets into a dictionary:
sheets_dict = pd.read_excel("report.xlsx", sheet_name=None)
for sheet, data in sheets_dict.items():
print(f"Sheet: {sheet}")
print(data.head())
- Write dictionary of DataFrames to multiple sheets:
dfs = {"Sales": df1, "Expenses": df2, "Summary": df3}
with pd.ExcelWriter("output.xlsx") as writer:
for sheet, data in dfs.items():
data.to_excel(writer, sheet_name=sheet, index=False)
6. Small End-to-End Example
import pandas as pd
# Load data
df = pd.read_excel("employees.xlsx", sheet_name="EmployeeData")
# Clean: fill missing salaries with average
df['Salary'].fillna(df['Salary'].mean(), inplace=True)
# Transform: add yearly salary
df['YearlySalary'] = df['Salary'] * 12
# Save cleaned data to new Excel
df.to_excel("employees_cleaned.xlsx", sheet_name="CleanedData", index=False)
Quick Exercises
FIles to use: https://github.com/slidescope/Pandas-Read-or-write-to-excel
- Read the Excel file
sales.xlsx
from the sheet 2024_Q1 and display the first 5 rows. - Read only the
Name
andRevenue
columns fromsales.xlsx
. - Write a DataFrame of student marks to
marks.xlsx
under sheet Semester1. - Write two DataFrames (
sales
andexpenses
) into the same filereport.xlsx
in separate sheets. - Append new data (
df_new
) into an existing Excel file as a new sheet.