Python Pandas Tutorial — Chapter 3: DataFrames Basics

In the last chapter, we explored Series (1D labeled data). Now we move to the core structure of pandas: the DataFrame. This is where pandas shines — enabling us to work with tabular data (rows × columns), much like an Excel sheet or SQL table.


What is a DataFrame?

A DataFrame is a two-dimensional labeled data structure with:

  • Rows (index labels)
  • Columns (column labels)
  • Data values (can be numeric, string, datetime, or mixed types)

Think of it as a dictionary of Series objects sharing the same index.


Creating a DataFrame

1. From a dictionary of lists

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 22],
    'score': [88, 92, 79]
}
df = pd.DataFrame(data)
print(df)

Output:

      name  age  score
0    Alice   25     88
1      Bob   30     92
2  Charlie   22     79

2. From a dictionary of Series

df = pd.DataFrame({
    'Math': pd.Series([90, 85, 78], index=['Alice','Bob','Charlie']),
    'Science': pd.Series([88, 95, 82], index=['Alice','Bob','Charlie'])
})
print(df)

3. From a list of dictionaries

data = [
    {'name': 'Alice', 'age': 25},
    {'name': 'Bob', 'age': 30, 'city': 'NY'}
]
df = pd.DataFrame(data)
print(df)
  • Missing values will be filled with NaN.

4. From NumPy arrays

import numpy as np

arr = np.arange(9).reshape(3,3)
df = pd.DataFrame(arr, columns=['A','B','C'])
print(df)

Inspecting a DataFrame

print(df.head())       # first 5 rows
print(df.tail(3))      # last 3 rows
print(df.info())       # summary (dtypes, non-null counts)
print(df.describe())   # stats for numeric columns
print(df.shape)        # (rows, cols)
print(df.columns)      # list of column names
print(df.index)        # row index labels

Accessing Data

Columns

print(df['name'])       # single column (Series)
print(df[['name','age']]) # multiple columns (DataFrame)

Rows by index label / position

print(df.loc[0])   # row by label (index 0)
print(df.iloc[1])  # row by position (2nd row)

Row + column selection

print(df.loc[1, 'name'])   # value at row 1, column 'name'
print(df.iloc[2, 1])       # value at 3rd row, 2nd column

Slicing

print(df[0:2])   # first 2 rows

Adding, Modifying, Deleting Data

Add new column

df['passed'] = df['score'] >= 80

Modify column

df['age'] = df['age'] + 1

Delete column

df.drop('city', axis=1, inplace=True)

Add new row

new_row = {'name':'David', 'age':28, 'score':85}
df.loc[len(df)] = new_row

Filtering Data

print(df[df['score'] > 85])       # condition
print(df[(df['age'] > 25) & (df['score'] > 80)]) # multiple conditions

Sorting

print(df.sort_values('age'))              # ascending
print(df.sort_values('score', ascending=False))  # descending

Handling Missing Data

df = pd.DataFrame({
    'name': ['Alice','Bob','Charlie'],
    'age': [25, None, 22],
    'score': [88, 92, None]
})

print(df.isna())        # check missing
print(df.dropna())      # drop rows with NaN
print(df.fillna(0))     # replace NaN with 0

Useful Methods

  • Basic info df.dtypes # data types df.count() # non-null values per column df.nunique() # unique count per column
  • Statistics df.mean() # mean of numeric columns df.corr() # correlation matrix df['score'].max() # max value in a column
  • Value counts print(df['age'].value_counts())

Example: Small Analysis

import pandas as pd

# Sales data
data = {
    'Product': ['A','B','A','C','B','A'],
    'Units': [10,20,15,5,7,12],
    'Price': [100, 200, 100, 150, 200, 100]
}
df = pd.DataFrame(data)

# Add revenue column
df['Revenue'] = df['Units'] * df['Price']

# Filter: only Product A
product_a = df[df['Product'] == 'A']

# Group by Product and sum revenue
summary = df.groupby('Product')['Revenue'].sum()

print("Product A Sales:\n", product_a)
print("Revenue by Product:\n", summary)

Quick Exercises

  1. Create a DataFrame with 5 students (Name, Age, Math Score, English Score).
  2. Add a new column Total Score = sum of both subjects.
  3. Show students who scored more than 80 in English.
  4. Sort students by Total Score in descending order.
  5. Replace any missing Age values with the average age.

✅ In the next chapter, we’ll cover DataFrame Operations — grouping, merging, reshaping, and applying functions to data.

Python Pandas Tutorial — Chapter 3: DataFrames Basics

In the last chapter, we explored Series (1D labeled data). Now we move to the core structure of pandas: the DataFrame. This is where pandas shines — enabling us to work with tabular data (rows × columns), much like an Excel sheet or SQL table.


What is a DataFrame?

A DataFrame is a two-dimensional labeled data structure with:

  • Rows (index labels)
  • Columns (column labels)
  • Data values (can be numeric, string, datetime, or mixed types)

Think of it as a dictionary of Series objects sharing the same index.


Creating a DataFrame

1. From a dictionary of lists

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 22],
    'score': [88, 92, 79]
}
df = pd.DataFrame(data)
print(df)

Output:

      name  age  score
0    Alice   25     88
1      Bob   30     92
2  Charlie   22     79

2. From a dictionary of Series

df = pd.DataFrame({
    'Math': pd.Series([90, 85, 78], index=['Alice','Bob','Charlie']),
    'Science': pd.Series([88, 95, 82], index=['Alice','Bob','Charlie'])
})
print(df)

3. From a list of dictionaries

data = [
    {'name': 'Alice', 'age': 25},
    {'name': 'Bob', 'age': 30, 'city': 'NY'}
]
df = pd.DataFrame(data)
print(df)
  • Missing values will be filled with NaN.

4. From NumPy arrays

import numpy as np

arr = np.arange(9).reshape(3,3)
df = pd.DataFrame(arr, columns=['A','B','C'])
print(df)

Inspecting a DataFrame

print(df.head())       # first 5 rows
print(df.tail(3))      # last 3 rows
print(df.info())       # summary (dtypes, non-null counts)
print(df.describe())   # stats for numeric columns
print(df.shape)        # (rows, cols)
print(df.columns)      # list of column names
print(df.index)        # row index labels

Accessing Data

Columns

print(df['name'])       # single column (Series)
print(df[['name','age']]) # multiple columns (DataFrame)

Rows by index label / position

print(df.loc[0])   # row by label (index 0)
print(df.iloc[1])  # row by position (2nd row)

Row + column selection

print(df.loc[1, 'name'])   # value at row 1, column 'name'
print(df.iloc[2, 1])       # value at 3rd row, 2nd column

Slicing

print(df[0:2])   # first 2 rows

Adding, Modifying, Deleting Data

Add new column

df['passed'] = df['score'] >= 80

Modify column

df['age'] = df['age'] + 1

Delete column

df.drop('city', axis=1, inplace=True)

Add new row

new_row = {'name':'David', 'age':28, 'score':85}
df.loc[len(df)] = new_row

Filtering Data

print(df[df['score'] > 85])       # condition
print(df[(df['age'] > 25) & (df['score'] > 80)]) # multiple conditions

Sorting

print(df.sort_values('age'))              # ascending
print(df.sort_values('score', ascending=False))  # descending

Handling Missing Data

df = pd.DataFrame({
    'name': ['Alice','Bob','Charlie'],
    'age': [25, None, 22],
    'score': [88, 92, None]
})

print(df.isna())        # check missing
print(df.dropna())      # drop rows with NaN
print(df.fillna(0))     # replace NaN with 0

Useful Methods

  • Basic info df.dtypes # data types df.count() # non-null values per column df.nunique() # unique count per column
  • Statistics df.mean() # mean of numeric columns df.corr() # correlation matrix df['score'].max() # max value in a column
  • Value counts print(df['age'].value_counts())

Example: Small Analysis

import pandas as pd

# Sales data
data = {
    'Product': ['A','B','A','C','B','A'],
    'Units': [10,20,15,5,7,12],
    'Price': [100, 200, 100, 150, 200, 100]
}
df = pd.DataFrame(data)

# Add revenue column
df['Revenue'] = df['Units'] * df['Price']

# Filter: only Product A
product_a = df[df['Product'] == 'A']

# Group by Product and sum revenue
summary = df.groupby('Product')['Revenue'].sum()

print("Product A Sales:\n", product_a)
print("Revenue by Product:\n", summary)

Quick Exercises

  1. Create a DataFrame with 5 students (Name, Age, Math Score, English Score).
  2. Add a new column Total Score = sum of both subjects.
  3. Show students who scored more than 80 in English.
  4. Sort students by Total Score in descending order.
  5. Replace any missing Age values with the average age.

✅ In the next chapter, we’ll cover DataFrame Operations — grouping, merging, reshaping, and applying functions to data.