Python Pandas Tutorial — Chapter 2: Series Basics
In the previous chapter, we introduced pandas and its two main data structures: Series and DataFrame. In this chapter, we’ll focus on the Series, the simplest yet powerful building block of pandas.
What is a Series?
A Series is a one-dimensional labeled array in pandas. Think of it as:
- A column in a spreadsheet.
- A NumPy array with labels (called index).
- A dictionary where keys are the index, and values are the data.
Syntax:
import pandas as pd
s = pd.Series(data, index=index, dtype=dtype)
Creating a Series
1. From a list
import pandas as pd
data = [10, 20, 30, 40]
s = pd.Series(data)
print(s)
Output:
0 10
1 20
2 30
3 40
dtype: int64
- Default index starts at 0.
2. With custom index
s = pd.Series([100, 200, 300], index=['a', 'b', 'c'])
print(s)
Output:
a 100
b 200
c 300
dtype: int64
3. From a dictionary
data = {'apples': 3, 'oranges': 5, 'bananas': 2}
s = pd.Series(data)
print(s)
Output:
apples 3
oranges 5
bananas 2
dtype: int64
4. From a scalar value
s = pd.Series(7, index=['x','y','z'])
print(s)
Output:
x 7
y 7
z 7
dtype: int64
Accessing Data in a Series
By index label
print(s['y']) # 7
By position
print(s[1]) # 7
Slicing
s = pd.Series([10, 20, 30, 40, 50], index=['a','b','c','d','e'])
print(s['b':'d']) # inclusive of 'd'
print(s[1:4]) # excludes position 4
Series Attributes
s.index
→ returns index labelss.values
→ underlying NumPy arrays.dtype
→ data type of elementss.shape
→ number of elements
Example:
print(s.index) # Index(['a','b','c','d','e'], dtype='object')
print(s.values) # [10 20 30 40 50]
print(s.dtype) # int64
print(s.shape) # (5,)
Vectorized Operations
Series behaves like a NumPy array — operations are element-wise.
s = pd.Series([1, 2, 3, 4])
print(s + 10) # add scalar
print(s * 2) # multiply scalar
print(s ** 2) # square each element
Output:
0 11
1 12
2 13
3 14
dtype: int64
Alignment by Index
When performing operations between two Series, pandas aligns them by index.
s1 = pd.Series([1,2,3], index=['a','b','c'])
s2 = pd.Series([10,20,30], index=['b','c','d'])
print(s1 + s2)
Output:
a NaN
b 12.0
c 23.0
d NaN
dtype: float64
- Notice how
a
andd
haveNaN
because they don’t exist in both Series.
Handling Missing Values
s = pd.Series([1, None, 3, None, 5])
print(s.isna()) # check for missing
print(s.fillna(0)) # replace NaN with 0
print(s.dropna()) # remove NaN
Useful Methods
- Descriptive Stats
s = pd.Series([5, 10, 15, 20, 25]) print(s.mean()) # 15.0 print(s.max()) # 25 print(s.min()) # 5 print(s.std()) # 7.905...
- Unique & Value Counts
s = pd.Series(['apple','banana','apple','orange']) print(s.unique()) # ['apple' 'banana' 'orange'] print(s.value_counts()) # frequency count
- Apply a function
print(s.apply(str.upper))
Small End-to-End Example
import pandas as pd
# Student marks
marks = pd.Series([85, 92, 78, 90, 88],
index=['Alice','Bob','Charlie','David','Eva'])
# Find who scored above 85
top_students = marks[marks > 85]
# Calculate mean score
mean_score = marks.mean()
print("Top Students:\n", top_students)
print("Average Score:", mean_score)
Quick Exercises
- Create a Series of 5 cities with custom indices (use city codes as index).
- Given
sales = pd.Series([250, 400, 150, 300], index=['Q1','Q2','Q3','Q4'])
:- Find total sales.
- Increase each sales value by 10%.
- Create a Series with some missing values, then:
- Count missing values.
- Replace them with the average.
✅ In the next chapter, we’ll explore DataFrames Basics — working with tabular data, multiple columns, and more powerful operations.