Yahoo Answers is shutting down on May 4th, 2021 (Eastern Time) and the Yahoo Answers website is now in read-only mode. There will be no changes to other Yahoo properties or services, or your Yahoo account. You can find more information about the Yahoo Answers shutdown and how to download your data on this help page.

I need a Linux-based script to replace text in a PDF document on the fly?

I would like to serve customized PDF documents from templates. The templates contain tokens like {firstname} that would be replaced with actual data.

Unfortunately, internal compression in the PDF makes it impossible to write a simple search/replace Perl script. What toolkit or existing script would allow me to do this on a Linux web server?

Update:

I tried saving to Postscript, editing the text within the PostScript file then reopening the document. The changed text was garbled. So, there is other metric information within the file that is no longer accurate once the text is changed.

I am looking at the CAM-PDF module that claims it can do a search and replace on text in a PDF doc but it has dependencies on other modules that have their own dependencies. It is a nightmare to try and install. Still trying.

Update 2:

FINAL UPDATE: I installed the CAM::PDF library from CPAN and it got the job done. Most importantly, I had to use a trick to ensure all the letters of all the fonts were in the PDF document.

I created a first page including the entire alphabet and character set for each font used in the document. That way, all the letters would be encapsulated for later use when I started substituting letters. So, with CAM::PDF, I did a text replacement then I removed the first page that contained the font alphabets. The commands I used are:

changepagestring.pl 1973_2pg.pdf "{YourName}" "Roger Smith" 1973m.pdf

deletepdfpage.pl 1973m.pdf 1 1973_final.pdf

The installation process for CPAN is lengthy and requires root access on the Linux server.

3 Answers

Relevance
  • Anonymous
    1 decade ago
    Favorite Answer

    Hmmm... that is tricky. I'm guessing that if you could properly decompress it to a file, you could run a sed substitution that would (hopefully) replace what you need. There is a tiny chance that it would screw up an image or other element, but you could try it. The only hard part would be decompressing it, and I have no idea how you could do that. There must be some sort of PDF library that is used in open source PDF readers and web browsers.

  • Anonymous
    5 years ago

    Replace Text In Pdf

  • 1 decade ago

    This may not be the answer that you are looking for but you could keep your templates in postscript format and perform the search & replace on those files. Then, you could use ps2pdf to convert the postscript file to PDF format for delivery.

Still have questions? Get your answers by asking now.