For the past eight(8) years Schalk Neethling has been working as a freelance developer under the pseudo of Volume4 and is now the president of Overt Strategy Consulting. During this period he has completed over 300 projects ranging from full web application development to complete branding. As president and lead developer of Overt Strategy Consulting, Schalk Neethling and his team has released a 100% Java standards based content management system called AlliedBridge and business document exchange and review system, called Doc-Central. Schalk Neethling is also actively involved on a daily basis in the open source, web standards and accessibility areas and is a current active member of the Web Standards Group. Schalk is also the co-founder and president of the non-profit The South Web Standards and Accessibility Group, which aims to actively educate and raise awareness of web standards and accessibility to both the developer society as well as business large and small. Schalk also has a long relationship with DZone and is currently zone leader for both the web builder, css.dzone.com, as well as the .NET zone, dotnet.dzone.com, and you can find a lot of his writing there as well as on his blog located at schalkneethling.alliedbridge.com. Schalk is constantly expanding on his knowledge of various aspects of technology and loves to stay in touch with the latest happenings. For Schalk web development and the internet is not just a job, it is a love, a passion and a life style. Schalk has posted 173 posts at DZone. View Full User Profile

Merging PDF’s with PDFBox

06.21.2008
| 27084 views |
  • submit to reddit

Merging Portable Document Format documents using PDFBox  couldn’t be simpler. The developer(s) of PDFBox has taken care of all of the hard work and encapsulated it in one class of their Application Programming Interface. All you need to do is use it.

The class I am referring to is the PDFMergerUtility class. This class provides everything you need to take multiple single or multi page PDF documents and merge them into one PDF document. Below I will go over the simple steps of using this class to merge all PDF’s located in a directory without having to pass each file as an argument.

The first step is to initialize the class as follows:

PDFMergerUtility mergePdf = new PDFMergerUtility();

With the class initialized we can start to use it to merge our PDF’s. The next step in our process is to read and store the two arguments that gets passed into our application for later use. When invoking our utility from the command line we expect two arguments to be passed in, the first, the folder that contains the documents and the second, the file name of the final merged PDF. We store these arguments as two String variables:

String folder = args[0];
String destinationFileName = args[1];

The next step is to get hold of all of the files in the directory that was passed to our utility and store them as a String variable called folder. For this I wrote a small method that uses the java.io.File class.

private static String[] getFiles(String folder) throws IOException
{
File _folder = new File(folder);
String[] filesInFolder;

if(_folder.isDirectory())
{
filesInFolder = _folder.list();
return filesInFolder;
}
else
{
throw new IOException("Path is not a directory");
}
}

The first thing we check is that the directory passed to us is in fact a directory. If not, we throw an IOException with the message Path is not a directory. After we verified that this is a directory we use the list() function from the java.io.File class to get the files from the directory. The list() method returns an array of all of the files in the directory. We store this in a String array and return this array to the caller.

String[] filesInFolder;   

if(_folder.isDirectory())
{
filesInFolder = _folder.list();
return filesInFolder;
}

Because the final steps of our utility can possibly cause one of two exception two be thrown, we will enclose it within a try/catch block. The first thing we do inside our try block is to store the size of the array as an int variable called numberOfFiles, we will be using this inside our for loop a little later. Next we store our files in a String[] called, you guessed it, files. Armed with this information we can go ahead and loop through our array of files. The reason why we need to loop through our files is because we need to add them to the source of the PDFMergeUtility using it’s addSource function.

The for loop is then also where we will be making use of the first of our two variables, numberOfFiles.

for(int i = 0; i < numberOfFiles; i++)

Inside the loop we add each file to the PDFMergeUtility’s source using the following line of code:

mergePdf.addSource(folder + File.separator + files[i]);

The only steps left for us is to set the file name and location of the merged document and then call the PDFMergeUtility’s mergeDocuments() method.

mergePdf.setDestinationFileName(folder + File.separator + destinationFileName);
mergePdf.mergeDocuments();

To close of our try block we catch the two possible exception that could be thrown by the methods used inside the try block. These are the COSVisitorException and an IOException. With this done our utility is complete! I hope you enjoyed this tutorial and find the utility useful. You can download the complete source here and use it as you see fit. Please feel free to post your comments as to how this utility can be improved and expanded upon.

References
Published at DZone with permission of its author, Schalk Neethling. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

David Voo replied on Sun, 2008/06/22 - 4:50am

Very good article

Schalk Neethling replied on Sun, 2008/06/22 - 5:24am in response to: David Voo

Thanks David, I am very glad you enjoyed it.

Harshada Joshi replied on Wed, 2009/05/06 - 3:17am

HI, Following code is throwing error as - PDDocument is not closed at line : String text = pdfText.getText(pdfdoc); PDFMergerUtility mergePdf = new PDFMergerUtility(); mergePdf.addSource( src1 ); mergePdf.addSource( src2); mergePdf.setDestinationFileName( destFile); mergePdf.mergeDocuments(); PDDocument pdfdoc = PDDocument.load( destFile ); PDFTextStripper pdfText = new PDFTextStripper(); String text = pdfText.getText(pdfdoc); Your assistance is highly appreciated. Thanks

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.