Automate Excel Reporting Using Python (Real-Life-Example) | Pandas, Plotly, Xlwings Tutorial
Introduction
In this post, I will be sharing my favourite ways to automate Excel using Python. In particular, I will be using the open-source tools: Pandas, xlwings & plotly. After this, you will be able to create a custom Python script that allows you to combine Excel files & create charts out of them.
First things first, I need to install the required dependencies. I will upload this Jupyter Notebook to my GitHub page. Alternatively, you can install the libraries by typing pip install followed by the package’s name in your command prompt or terminal.
Merge All Excel Files
The main purpose of this example is to merge multiple Excel files, perform some calculations on the data, and visualize the result. I have stored five Excel files in the input folder, each containing financial data for a respective country.
To find out which country has generated the highest Sales and Profit Margin, I will import the libraries, declare the path of the example files, and create an empty dataframe. I will iterate over the input directory and append each Excel file to the dataframe.
After validating the dataframe, I will group the data and calculate the sum of the Sales & Profit columns. To find the Profit Margin, I will insert a new column, dividing the Profit by Sales and multiplying by 100.
To visualize the analysis, I will use a bar chart from the plotly express library. The chart will display countries on the x-axis and Sales on the y-axis, with the margin represented by color. After running this cell, we can see that the US generated the highest sales but has the worst profit margin, while France appears quite profitable.
This was a simplified example. In real life, datasets may not be that well-structured. Let’s move on to a more realistic business scenario where we want to merge multiple Excel files while preserving the original formatting and formulas.
Again, I will import the libraries and declare the input and output paths. Instead of creating an empty dataframe, I will generate an empty workbook and rename the first worksheet to summary. I will then iterate over each Excel file, suppress any potential warnings, and specify the worksheet with our data.
Once the data is copied to the summary workbook, I will insert the headers, set the font to bold & white, and fill the background with a grey color. One of the cool features of xlwings is that it allows for easy creation of a pandas dataframe from any cell range. This enables us to perform calculations and export the dataframe back to our Excel sheet.
After aggregating the sales by country, I will create a chart object and set the source data to the sales by country. I can also utilize matplotlib or pandas plotting capabilities. Finally, I will adjust the column width and save the workbook before closing it.
VBA vs Python
Before wrapping up, let’s discuss Excel automation and the comparison between VBA and Python. While Python offers powerful capabilities, consider if spending hours coding is worth the time saved on manual tasks. There are many great tools available, such as PowerQuery, for merging and cleaning up Excel data.
VBA is intuitive for Excel users and comes pre-installed with Office applications, allowing for easy sharing of macros. Python is more versatile, enabling you to create whatever you want, and is rapidly growing in popularity.
In summary, while Python may be more powerful than VBA, the right tool depends on your specific needs. There’s no need to overcomplicate a simple task, so use the tool that best fits the job at hand.
Conclusion
In this post, I discussed how to automate Excel reporting using Python, focusing on merging Excel files, performing calculations, and visualizing results. I also compared VBA and Python in the context of Excel automation, highlighting the strengths and weaknesses of each approach.