How to Merge Multiple PDF Files in Excel Using Python (fast & easy)
Introduction
In this tutorial, I will show you how to merge PDF files in Excel by using Python. This guide will take you step-by-step through the process of building a simple application that combines multiple PDF files into one. By the end of this tutorial, you’ll have a functioning Excel workbook that allows you to merge PDFs with ease.
Final Application Overview
Let me show you the final workbook that we are going to build from scratch. I have two folders:
- In the first folder, I have a mix of different files, including Excel workbooks, PDF documents, and a text file.
- In the other folder, I have 300 single PDF files.
The goal is to create one PDF file for each folder that combines all the PDFs. In the final application, you will be able to:
- Select the source directory containing the PDF files.
- Provide an output name for the merged PDF.
- Click a button to merge the PDFs, with a status message confirming the operation.
Prerequisites
Before getting started, ensure that you have Python installed on your machine. If not, check out my quick tutorial on how to install Python.
Setting Up Your Environment
First, open your terminal or command prompt. For this tutorial, we will need the PyPDF2 and xlwings libraries. Install them by executing the following commands:
pip install PyPDF2
pip install xlwings
Once installed, run the following command to create a new project:
xlwings quickstart pdf_merger --standalone
This command will create a new folder in your current directory with the necessary Excel and Python files.
Building the Excel Interface
Open the Excel file generated by xlwings and navigate to the Developer tab. If the Developer tab is not visible, you can enable it under File > Options > Customize Ribbon.
Next, insert a button into your worksheet and assign the macro called sample_call to the button. If everything is working, you should see “Hello xlwings” in cell A1 after clicking the button.
Setting Up Named Ranges
To make it easier to reference cells, set up named ranges for the source directory, output name, and status. Click on a cell to input the folder path for the PDF files, then go to the name box and type source_dir. Do the same for the output name and status cells.
Optional: VBA for File Selection
Instead of manually entering the file path, you can use a short VBA script to open a file picker dialog. Open the Visual Basic Editor and paste the following code into Module 1:
Sub SelectFolder()
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select a Folder"
If .Show = -1 Then
Range("source_dir").Value = .SelectedItems(1)
End If
End With
End Sub
Assign this macro to a button named “Select Folder”. When clicked, the file picker dialog will appear, allowing you to select a folder and return the path to your worksheet.
Writing the Python Code
Now, let’s move to the Python file. Open it and delete any unnecessary code. First, import the required libraries:
from pathlib import Path
from PyPDF2 import PdfFileMerger, PdfFileReader
Main Function
Inside the main function, create an instance of the PDF merger and clear the status cell:
Next, get the source directory and output name, ensuring to concatenate the output name with “.pdf”.
Getting PDF Files
Now, create a list of all PDF file paths in the source directory:
pdf_files = list(Path(source_dir).glob("*.pdf"))
Merging PDF Files
Iterate over each file in the list and append it to the merger:
for pdf_file in pdf_files:
merger.append(PdfFileReader(str(pdf_file), "rb"))
Writing the Output
Finally, write the merged PDF to the specified path and update the status cell:
merger.write(output_name)
sheet.range("status").value = f"Files have been saved to: {output_name}"
Testing the Application
After saving your script, head back to Excel. Click the button to execute the Python script, which will merge all PDF files from the specified source directory into a single PDF. You can also test the other folder with 300 single PDF pages.
Conclusion
This tutorial has shown you how to merge PDF files in Excel using Python, covering everything from setting up the environment to building the application interface and writing the Python code. If you have any questions or need further support, feel free to leave a comment below.