Pandas DataFrames Basics

Chapters

Python Pandas Tutorial

Python Pandas Tutorial — Chapter 3: DataFrames Basics

In the last chapter, we explored Series (1D labeled data). Now we move to the core structure of pandas: the DataFrame. This is where pandas shines — enabling us to work with tabular data (rows × columns), much like an Excel sheet or SQL table.

What is a DataFrame?

A DataFrame is a two-dimensional labeled data structure with:

Rows (index labels)
Columns (column labels)
Data values (can be numeric, string, datetime, or mixed types)

Think of it as a dictionary of Series objects sharing the same index.

Creating a DataFrame

1. From a dictionary of lists

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 22],
    'score': [88, 92, 79]
}
df = pd.DataFrame(data)
print(df)

Output:

      name  age  score
0    Alice   25     88
1      Bob   30     92
2  Charlie   22     79

2. From a dictionary of Series

df = pd.DataFrame({
    'Math': pd.Series([90, 85, 78], index=['Alice','Bob','Charlie']),
    'Science': pd.Series([88, 95, 82], index=['Alice','Bob','Charlie'])
})
print(df)

3. From a list of dictionaries

data = [
    {'name': 'Alice', 'age': 25},
    {'name': 'Bob', 'age': 30, 'city': 'NY'}
]
df = pd.DataFrame(data)
print(df)

Missing values will be filled with NaN.

4. From NumPy arrays

import numpy as np

arr = np.arange(9).reshape(3,3)
df = pd.DataFrame(arr, columns=['A','B','C'])
print(df)

Inspecting a DataFrame

print(df.head())       # first 5 rows
print(df.tail(3))      # last 3 rows
print(df.info())       # summary (dtypes, non-null counts)
print(df.describe())   # stats for numeric columns
print(df.shape)        # (rows, cols)
print(df.columns)      # list of column names
print(df.index)        # row index labels

Accessing Data

Columns

print(df['name'])       # single column (Series)
print(df[['name','age']]) # multiple columns (DataFrame)

Rows by index label / position

print(df.loc[0])   # row by label (index 0)
print(df.iloc[1])  # row by position (2nd row)

Row + column selection

print(df.loc[1, 'name'])   # value at row 1, column 'name'
print(df.iloc[2, 1])       # value at 3rd row, 2nd column

Slicing

print(df[0:2])   # first 2 rows

Adding, Modifying, Deleting Data

Add new column

df['passed'] = df['score'] >= 80

Modify column

df['age'] = df['age'] + 1

Delete column

df.drop('city', axis=1, inplace=True)

Add new row

new_row = {'name':'David', 'age':28, 'score':85}
df.loc[len(df)] = new_row

Filtering Data

print(df[df['score'] > 85])       # condition
print(df[(df['age'] > 25) & (df['score'] > 80)]) # multiple conditions

Sorting

print(df.sort_values('age'))              # ascending
print(df.sort_values('score', ascending=False))  # descending

Handling Missing Data

df = pd.DataFrame({
    'name': ['Alice','Bob','Charlie'],
    'age': [25, None, 22],
    'score': [88, 92, None]
})

print(df.isna())        # check missing
print(df.dropna())      # drop rows with NaN
print(df.fillna(0))     # replace NaN with 0

Useful Methods

Basic info df.dtypes # data types df.count() # non-null values per column df.nunique() # unique count per column
Statistics df.mean() # mean of numeric columns df.corr() # correlation matrix df['score'].max() # max value in a column
Value counts print(df['age'].value_counts())

Example: Small Analysis

import pandas as pd

# Sales data
data = {
    'Product': ['A','B','A','C','B','A'],
    'Units': [10,20,15,5,7,12],
    'Price': [100, 200, 100, 150, 200, 100]
}
df = pd.DataFrame(data)

# Add revenue column
df['Revenue'] = df['Units'] * df['Price']

# Filter: only Product A
product_a = df[df['Product'] == 'A']

# Group by Product and sum revenue
summary = df.groupby('Product')['Revenue'].sum()

print("Product A Sales:\n", product_a)
print("Revenue by Product:\n", summary)

Quick Exercises

Create a DataFrame with 5 students (Name, Age, Math Score, English Score).
Add a new column Total Score = sum of both subjects.
Show students who scored more than 80 in English.
Sort students by Total Score in descending order.
Replace any missing Age values with the average age.

✅ In the next chapter, we’ll cover DataFrame Operations — grouping, merging, reshaping, and applying functions to data.