Split Excel Data into Multiple Files by Column Values using Python (fast & easy)

Category: Python + Excel

Introduction

In this tutorial, I will show you how to separate Excel data into multiple files using Python and the pandas library. My sample workbook contains sales records for different segments, countries, and products. The goal is to create a separate workbook for each country, which includes only the data specific to that country.

Sample workbook with sales records

Manually, you would need to filter the column, copy the data, paste it into a new workbook, and save it. This process can be tedious, especially with large datasets. Python simplifies this task significantly. However, real-world data isn’t always neat, so I will also cover how to handle messy data towards the end of this tutorial.

Getting Started

First, I have an empty output folder ready in my workbooks directory. I’ll start with a blank Python file. The first step is to import the required libraries:

import pandas as pd
import os

If you haven’t installed pandas yet, you can do so by using the command:

pip install pandas

You might also need the openpyxl library, which is an optional dependency for pandas. Install it using:

pip install openpyxl

Installing pandas and openpyxl

Creating the DataFrame

Next, create a pandas DataFrame by loading your Excel file:

df = pd.read_excel('your_file.xlsx')

Make sure your Python script is in the same directory as your workbook to avoid specifying the full path. If it’s not, you’ll need to provide the exact path to the file.

Set a variable called column_name to the column you want to split:

column_name = 'Country'

Now, create a list of unique values from that column:

countries = df[column_name].unique()

Creating a list of unique countries

Iterating Over Each Country

With the list of countries, I can iterate through each one using a for-loop.

This code checks if each country is contained within the specified column and creates a separate DataFrame for each country, exporting it to an Excel file. The output path is defined as the output folder.

Exporting DataFrames to Excel

Running the Script

To execute the script, navigate to the script directory in your command prompt and run the Python file. You should now see the separated Excel files in the output folder.

Output folder with exported files

Handling Messy Data

The real-world data can be messy. For instance, country names might have inconsistencies like different cases or extra spaces. Let’s modify our script to handle such scenarios:

df['Country'] = df['Country'].str.strip().str.title()

This line removes any leading or trailing whitespace and converts the country names to title case, ensuring consistency.

Cleaning data with strip and title case

Dealing with Special Characters

In some cases, country names may include special characters that can cause issues when creating filenames. For this, I will create a list of characters to remove:

symbols_to_remove = ['/', '\\', '?', '*', ':', '<', '>', '|']

Next, I will apply the replace method to clean the country names: Removing special characters from country names

Conclusion

In this tutorial, I demonstrated how to split Excel data into multiple files using Python. We started with a clean dataset and moved on to handling messier data with inconsistencies and special characters. This approach can save a lot of time and effort, especially when dealing with large datasets. If you have any questions, feel free to ask in the comments!

Further Links

Download the script and sample files