Copy Data From Multiple Files to Master File | Read/Write Closed Excel Files Using Openpyxl
Introduction
In this guide, I will walk you through automating the process of copying data from multiple Excel files into a master file using Python. Specifically, I will utilize the openpyxl library, which allows for reading and writing Excel files seamlessly. The task involves compiling daily sales reports from over 300 individual Excel files into a single master workbook, enhancing efficiency and accuracy.
Problem Statement and Final Outcome
My folder contains over 300 Excel files, each representing sales records by category for specific dates. The format is consistent across these files: column A contains product categories, while column B holds the corresponding sales amounts. My goal is to automate the tedious task of copying and pasting values from these daily reports into a master workbook, which has dedicated worksheets for each month.
Executing the Python script I will demonstrate takes only about 2 seconds, resulting in a new Excel file that consolidates all values from the individual reports. This file can then be validated by comparing it against the original sales reports.
Over 300 Single Excel Files:
Consolidated File:
Solution Approach
The solution is structured into three main steps:
- List all Excel files from the specified source directory.
- Retrieve values from each worksheet and store them in a dictionary.
- Iterate through the master workbook’s worksheets to match dates and paste the corresponding sales values.
Step 1
To start, I will import the necessary libraries. The pathlib module is included by default in Python, while openpyxl must be installed using the command pip install openpyxl
.
Next, I will obtain the paths to all Excel files using Path
and glob
. In my case, the source directory is named Daily Reports.
Step 2
Now that I have the paths to the Excel files, I will iterate over each file. The data of interest is located in the first worksheet, specifically within the cell range B2:B19. The process involves retrieving the values from these cells and appending them to a list.
Once I have the list of values and the corresponding reporting date, I will store this information in a dictionary where the date serves as the key. The execution of this step is efficient, taking around 2 seconds, and results in a dictionary containing all the sales data.
Step 3
With the values stored in a dictionary, I will now open the master template. I will iterate over each worksheet and check for matching dates in column B. This involves determining the range of cells to check, which varies depending on the month.
For each date found, I will write the corresponding sales values from the dictionary into the appropriate row in the master file. Each value from the list will be inserted into a different column, ensuring that the data is organized correctly.
Wrap Up
After saving the master file, the new Excel document will contain all the consolidated data from the daily reports. This example illustrates how to automate a repetitive task, significantly reducing the time and effort required to compile data manually.
While there are other Python libraries available for interacting with Excel, such as xlwings, openpyxl is advantageous because it loads the content directly into memory without needing to create an Excel instance, making it faster for processing multiple files.
Conclusion
In this tutorial, I demonstrated how to automate the process of consolidating data from multiple Excel files into a master file using Python and the openpyxl library. By following the outlined steps, you can efficiently manage large datasets and streamline your workflow. This approach not only saves time but also minimizes the risk of errors associated with manual data entry.
Thanks for reading.