Python Pandas Tutorial — Chapter 3: DataFrames Basics
In the last chapter, we explored Series (1D labeled data). Now we move to the core structure of pandas: the DataFrame. This is where pandas shines — enabling us to work with tabular data (rows × columns), much like an Excel sheet or SQL table.
What is a DataFrame?
A DataFrame is a two-dimensional labeled data structure with:
- Rows (index labels)
- Columns (column labels)
- Data values (can be numeric, string, datetime, or mixed types)
Think of it as a dictionary of Series objects sharing the same index.
Creating a DataFrame
1. From a dictionary of lists
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 22],
'score': [88, 92, 79]
}
df = pd.DataFrame(data)
print(df)
Output:
name age score
0 Alice 25 88
1 Bob 30 92
2 Charlie 22 79
2. From a dictionary of Series
df = pd.DataFrame({
'Math': pd.Series([90, 85, 78], index=['Alice','Bob','Charlie']),
'Science': pd.Series([88, 95, 82], index=['Alice','Bob','Charlie'])
})
print(df)
3. From a list of dictionaries
data = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30, 'city': 'NY'}
]
df = pd.DataFrame(data)
print(df)
- Missing values will be filled with
NaN
.
4. From NumPy arrays
import numpy as np
arr = np.arange(9).reshape(3,3)
df = pd.DataFrame(arr, columns=['A','B','C'])
print(df)
Inspecting a DataFrame
print(df.head()) # first 5 rows
print(df.tail(3)) # last 3 rows
print(df.info()) # summary (dtypes, non-null counts)
print(df.describe()) # stats for numeric columns
print(df.shape) # (rows, cols)
print(df.columns) # list of column names
print(df.index) # row index labels
Accessing Data
Columns
print(df['name']) # single column (Series)
print(df[['name','age']]) # multiple columns (DataFrame)
Rows by index label / position
print(df.loc[0]) # row by label (index 0)
print(df.iloc[1]) # row by position (2nd row)
Row + column selection
print(df.loc[1, 'name']) # value at row 1, column 'name'
print(df.iloc[2, 1]) # value at 3rd row, 2nd column
Slicing
print(df[0:2]) # first 2 rows
Adding, Modifying, Deleting Data
Add new column
df['passed'] = df['score'] >= 80
Modify column
df['age'] = df['age'] + 1
Delete column
df.drop('city', axis=1, inplace=True)
Add new row
new_row = {'name':'David', 'age':28, 'score':85}
df.loc[len(df)] = new_row
Filtering Data
print(df[df['score'] > 85]) # condition
print(df[(df['age'] > 25) & (df['score'] > 80)]) # multiple conditions
Sorting
print(df.sort_values('age')) # ascending
print(df.sort_values('score', ascending=False)) # descending
Handling Missing Data
df = pd.DataFrame({
'name': ['Alice','Bob','Charlie'],
'age': [25, None, 22],
'score': [88, 92, None]
})
print(df.isna()) # check missing
print(df.dropna()) # drop rows with NaN
print(df.fillna(0)) # replace NaN with 0
Useful Methods
- Basic info
df.dtypes # data types df.count() # non-null values per column df.nunique() # unique count per column
- Statistics
df.mean() # mean of numeric columns df.corr() # correlation matrix df['score'].max() # max value in a column
- Value counts
print(df['age'].value_counts())
Example: Small Analysis
import pandas as pd
# Sales data
data = {
'Product': ['A','B','A','C','B','A'],
'Units': [10,20,15,5,7,12],
'Price': [100, 200, 100, 150, 200, 100]
}
df = pd.DataFrame(data)
# Add revenue column
df['Revenue'] = df['Units'] * df['Price']
# Filter: only Product A
product_a = df[df['Product'] == 'A']
# Group by Product and sum revenue
summary = df.groupby('Product')['Revenue'].sum()
print("Product A Sales:\n", product_a)
print("Revenue by Product:\n", summary)
Quick Exercises
- Create a DataFrame with 5 students (
Name
,Age
,Math Score
,English Score
). - Add a new column
Total Score
= sum of both subjects. - Show students who scored more than 80 in English.
- Sort students by
Total Score
in descending order. - Replace any missing
Age
values with the average age.
✅ In the next chapter, we’ll cover DataFrame Operations — grouping, merging, reshaping, and applying functions to data.