Python Pandas Tutorial — Chapter 2: Series Basics

In the previous chapter, we introduced pandas and its two main data structures: Series and DataFrame. In this chapter, we’ll focus on the Series, the simplest yet powerful building block of pandas.


What is a Series?

A Series is a one-dimensional labeled array in pandas. Think of it as:

  • A column in a spreadsheet.
  • A NumPy array with labels (called index).
  • A dictionary where keys are the index, and values are the data.

Syntax:

import pandas as pd

s = pd.Series(data, index=index, dtype=dtype)

Creating a Series

1. From a list

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

Output:

0    10
1    20
2    30
3    40
dtype: int64
  • Default index starts at 0.

2. With custom index

s = pd.Series([100, 200, 300], index=['a', 'b', 'c'])
print(s)

Output:

a    100
b    200
c    300
dtype: int64

3. From a dictionary

data = {'apples': 3, 'oranges': 5, 'bananas': 2}
s = pd.Series(data)
print(s)

Output:

apples     3
oranges    5
bananas    2
dtype: int64

4. From a scalar value

s = pd.Series(7, index=['x','y','z'])
print(s)

Output:

x    7
y    7
z    7
dtype: int64

Accessing Data in a Series

By index label

print(s['y'])    # 7

By position

print(s[1])      # 7

Slicing

s = pd.Series([10, 20, 30, 40, 50], index=['a','b','c','d','e'])
print(s['b':'d'])   # inclusive of 'd'
print(s[1:4])       # excludes position 4

Series Attributes

  • s.index → returns index labels
  • s.values → underlying NumPy array
  • s.dtype → data type of elements
  • s.shape → number of elements

Example:

print(s.index)   # Index(['a','b','c','d','e'], dtype='object')
print(s.values)  # [10 20 30 40 50]
print(s.dtype)   # int64
print(s.shape)   # (5,)

Vectorized Operations

Series behaves like a NumPy array — operations are element-wise.

s = pd.Series([1, 2, 3, 4])

print(s + 10)      # add scalar
print(s * 2)       # multiply scalar
print(s ** 2)      # square each element

Output:

0    11
1    12
2    13
3    14
dtype: int64

Alignment by Index

When performing operations between two Series, pandas aligns them by index.

s1 = pd.Series([1,2,3], index=['a','b','c'])
s2 = pd.Series([10,20,30], index=['b','c','d'])

print(s1 + s2)

Output:

a     NaN
b    12.0
c    23.0
d     NaN
dtype: float64
  • Notice how a and d have NaN because they don’t exist in both Series.

Handling Missing Values

s = pd.Series([1, None, 3, None, 5])

print(s.isna())       # check for missing
print(s.fillna(0))    # replace NaN with 0
print(s.dropna())     # remove NaN

Useful Methods

  • Descriptive Stats s = pd.Series([5, 10, 15, 20, 25]) print(s.mean()) # 15.0 print(s.max()) # 25 print(s.min()) # 5 print(s.std()) # 7.905...
  • Unique & Value Counts s = pd.Series(['apple','banana','apple','orange']) print(s.unique()) # ['apple' 'banana' 'orange'] print(s.value_counts()) # frequency count
  • Apply a function print(s.apply(str.upper))

Small End-to-End Example

import pandas as pd

# Student marks
marks = pd.Series([85, 92, 78, 90, 88], 
                  index=['Alice','Bob','Charlie','David','Eva'])

# Find who scored above 85
top_students = marks[marks > 85]

# Calculate mean score
mean_score = marks.mean()

print("Top Students:\n", top_students)
print("Average Score:", mean_score)

Quick Exercises

  1. Create a Series of 5 cities with custom indices (use city codes as index).
  2. Given sales = pd.Series([250, 400, 150, 300], index=['Q1','Q2','Q3','Q4']):
    • Find total sales.
    • Increase each sales value by 10%.
  3. Create a Series with some missing values, then:
    • Count missing values.
    • Replace them with the average.

✅ In the next chapter, we’ll explore DataFrames Basics — working with tabular data, multiple columns, and more powerful operations.

Python Pandas Tutorial — Chapter 2: Series Basics

In the previous chapter, we introduced pandas and its two main data structures: Series and DataFrame. In this chapter, we’ll focus on the Series, the simplest yet powerful building block of pandas.


What is a Series?

A Series is a one-dimensional labeled array in pandas. Think of it as:

  • A column in a spreadsheet.
  • A NumPy array with labels (called index).
  • A dictionary where keys are the index, and values are the data.

Syntax:

import pandas as pd

s = pd.Series(data, index=index, dtype=dtype)

Creating a Series

1. From a list

import pandas as pd

data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)

Output:

0    10
1    20
2    30
3    40
dtype: int64
  • Default index starts at 0.

2. With custom index

s = pd.Series([100, 200, 300], index=['a', 'b', 'c'])
print(s)

Output:

a    100
b    200
c    300
dtype: int64

3. From a dictionary

data = {'apples': 3, 'oranges': 5, 'bananas': 2}
s = pd.Series(data)
print(s)

Output:

apples     3
oranges    5
bananas    2
dtype: int64

4. From a scalar value

s = pd.Series(7, index=['x','y','z'])
print(s)

Output:

x    7
y    7
z    7
dtype: int64

Accessing Data in a Series

By index label

print(s['y'])    # 7

By position

print(s[1])      # 7

Slicing

s = pd.Series([10, 20, 30, 40, 50], index=['a','b','c','d','e'])
print(s['b':'d'])   # inclusive of 'd'
print(s[1:4])       # excludes position 4

Series Attributes

  • s.index → returns index labels
  • s.values → underlying NumPy array
  • s.dtype → data type of elements
  • s.shape → number of elements

Example:

print(s.index)   # Index(['a','b','c','d','e'], dtype='object')
print(s.values)  # [10 20 30 40 50]
print(s.dtype)   # int64
print(s.shape)   # (5,)

Vectorized Operations

Series behaves like a NumPy array — operations are element-wise.

s = pd.Series([1, 2, 3, 4])

print(s + 10)      # add scalar
print(s * 2)       # multiply scalar
print(s ** 2)      # square each element

Output:

0    11
1    12
2    13
3    14
dtype: int64

Alignment by Index

When performing operations between two Series, pandas aligns them by index.

s1 = pd.Series([1,2,3], index=['a','b','c'])
s2 = pd.Series([10,20,30], index=['b','c','d'])

print(s1 + s2)

Output:

a     NaN
b    12.0
c    23.0
d     NaN
dtype: float64
  • Notice how a and d have NaN because they don’t exist in both Series.

Handling Missing Values

s = pd.Series([1, None, 3, None, 5])

print(s.isna())       # check for missing
print(s.fillna(0))    # replace NaN with 0
print(s.dropna())     # remove NaN

Useful Methods

  • Descriptive Stats s = pd.Series([5, 10, 15, 20, 25]) print(s.mean()) # 15.0 print(s.max()) # 25 print(s.min()) # 5 print(s.std()) # 7.905...
  • Unique & Value Counts s = pd.Series(['apple','banana','apple','orange']) print(s.unique()) # ['apple' 'banana' 'orange'] print(s.value_counts()) # frequency count
  • Apply a function print(s.apply(str.upper))

Small End-to-End Example

import pandas as pd

# Student marks
marks = pd.Series([85, 92, 78, 90, 88], 
                  index=['Alice','Bob','Charlie','David','Eva'])

# Find who scored above 85
top_students = marks[marks > 85]

# Calculate mean score
mean_score = marks.mean()

print("Top Students:\n", top_students)
print("Average Score:", mean_score)

Quick Exercises

  1. Create a Series of 5 cities with custom indices (use city codes as index).
  2. Given sales = pd.Series([250, 400, 150, 300], index=['Q1','Q2','Q3','Q4']):
    • Find total sales.
    • Increase each sales value by 10%.
  3. Create a Series with some missing values, then:
    • Count missing values.
    • Replace them with the average.

✅ In the next chapter, we’ll explore DataFrames Basics — working with tabular data, multiple columns, and more powerful operations.