Working with data stored in relational databases is a crucial skill in data analysis. MySQL is one of the most popular relational databases used worldwide. Pythonβs Pandas library makes it easy to connect to MySQL, run SQL queries, and import data directly into DataFrames for analysis.
In this chapter, weβll cover how to:
- Install the required libraries
- Connect to a MySQL database
- Read data from MySQL into Pandas
- Write Pandas DataFrames back to MySQL
- Handle connection errors safely
π§© Step 1: Install Required Libraries
You need the following Python libraries:
pip install pandas mysql-connector-python sqlalchemy
- pandas β for data manipulation and analysis
- mysql-connector-python β official MySQL driver
- sqlalchemy β provides a clean interface between Pandas and SQL databases
π§© Step 2: Create a Sample MySQL Database
Before connecting, letβs assume you have a MySQL database named company_db with a table called employees:
CREATE DATABASE company_db;
USE company_db;
CREATE TABLE employees (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50),
department VARCHAR(50),
salary FLOAT,
hire_date DATE
);
INSERT INTO employees (name, department, salary, hire_date)
VALUES
('Alice Johnson', 'Finance', 72000, '2020-05-15'),
('Bob Smith', 'Marketing', 65000, '2019-07-23'),
('Charlie Brown', 'IT', 85000, '2018-03-10'),
('Diana King', 'Finance', 78000, '2021-01-05'),
('Ethan Clark', 'HR', 56000, '2017-11-18');
π§© Step 3: Connect MySQL to Pandas Using mysql.connector
import mysql.connector
import pandas as pd
# Establish connection
conn = mysql.connector.connect(
host='localhost',
user='root',
password='your_password',
database='company_db'
)
# Read table into DataFrame
query = "SELECT * FROM employees"
df = pd.read_sql(query, conn)
print(df.head())
conn.close()
β Explanation
mysql.connector.connect()
establishes the connection.pd.read_sql()
runs SQL and loads the result into a DataFrame.conn.close()
closes the connection safely.
π§© Step 4: Using SQLAlchemy for Easier Integration
SQLAlchemy
provides a more flexible way to manage connections and is preferred for production code.
from sqlalchemy import create_engine
import pandas as pd
# Create SQLAlchemy engine
engine = create_engine("mysql+mysqlconnector://root:your_password@localhost/company_db")
# Read entire table
df = pd.read_sql("SELECT * FROM employees", con=engine)
print(df)
Benefits of SQLAlchemy
- Works with multiple databases (MySQL, PostgreSQL, SQLite, etc.)
- Can handle connection pooling
- Easier integration with
to_sql()
for writing DataFrames back to MySQL
π§© Step 5: Writing Pandas DataFrame to MySQL
Suppose you have a new DataFrame:
new_data = {
'name': ['Frank Miller', 'Grace Lee'],
'department': ['IT', 'Marketing'],
'salary': [88000, 63000],
'hire_date': ['2022-06-15', '2023-02-10']
}
df_new = pd.DataFrame(new_data)
# Write to MySQL table
df_new.to_sql('employees', con=engine, if_exists='append', index=False)
print("New records inserted successfully!")
if_exists
Parameter:
'fail'
β Raises error if table exists'replace'
β Drops the table and recreates it'append'
β Adds new rows to existing table
π§© Step 6: Querying and Filtering Data
Once your data is loaded, you can query and filter it using SQL or Pandas directly.
Using SQL Query
query = "SELECT * FROM employees WHERE department = 'Finance'"
df_finance = pd.read_sql(query, con=engine)
print(df_finance)
Using Pandas
df_finance = df[df['department'] == 'Finance']
print(df_finance)
π§© Step 7: Handling Errors Gracefully
When dealing with databases, always include exception handling.
import mysql.connector
from mysql.connector import Error
try:
conn = mysql.connector.connect(
host='localhost',
user='root',
password='your_password',
database='company_db'
)
if conn.is_connected():
print("β
Connected to MySQL database")
df = pd.read_sql("SELECT * FROM employees", conn)
print(df.head())
except Error as e:
print(f"β Error: {e}")
finally:
if conn.is_connected():
conn.close()
print("π Connection closed.")
π§© Step 8: Writing Query Results to Excel or CSV
Once you fetch data from MySQL, you can easily export it.
df.to_excel('employees_data.xlsx', index=False)
df.to_csv('employees_data.csv', index=False)
Now, you have data from MySQL saved locally for further analysis or sharing.
π§© Step 9: Use Case β Analyzing Employee Salaries
You can perform analysis directly on the MySQL data loaded into Pandas:
avg_salary = df.groupby('department')['salary'].mean().reset_index()
print(avg_salary)
Or visualize it using Matplotlib:
import matplotlib.pyplot as plt
avg_salary.plot(kind='bar', x='department', y='salary', legend=False)
plt.title('Average Salary by Department')
plt.ylabel('Salary ($)')
plt.show()
π§© Step 10: Summary
Task | Function/Method | Description |
---|---|---|
Connect to MySQL | mysql.connector.connect() | Establishes connection |
Read SQL Query | pd.read_sql() | Loads SQL result into DataFrame |
Write to MySQL | DataFrame.to_sql() | Inserts data back into SQL table |
Handle Errors | try-except block | Manages connection failures |
Export Data | to_csv() / to_excel() | Saves data locally |
π§© Step 11: Best Practices
- Use environment variables to store credentials securely.
- Close connections after every operation.
- Use
SQLAlchemy
for scalability and cleaner syntax. - Avoid loading very large tables at once β use
LIMIT
or chunking withpd.read_sql(query, chunksize=10000)
. - Always validate and sanitize input to prevent SQL injection.
β Summary
Connecting Pandas with MySQL empowers data professionals to bridge data storage and analytics seamlessly. You can query live databases, analyze them in Pandas, visualize results, and write insights back β all within Python.
This integration forms the foundation for advanced workflows in ETL, data analytics, and machine learning pipelines.