It is a fairly common business requirement to need to render an invoice, a ticket or a receipt in a read-only format which the users may save and print. This can be done in HTML (but that is not read-only), but this isn't very print optimised. PDF is better than MS Word or other similar tools with a smaller supported media range, as PDF supports mobile better and works better on the expanding markets of Mac OSX and Linux. In terms of the earlier versions of MSword, an internet optimised PDF was smaller so downloaded faster. This article is targeted at PHP/ LAMP stack; I will add extra text for external scripts, so other languages and platforms may benefit from the article. This is another boring summary article, to save my time later.

Whilst it is not PDF rendering, PDFs can be displayed inside page most browsers, via the Object tag 1. This has a good level of support in older browsers. When making PDFs care needs to be applied about resources, as missing resources will cause problems.

Technical Requirements

There are a couple of different technical requirement profiles inside this general article. To list:

  • Make “print grade” digital goods.
    • Ensure all source data images are high quality (abort the render on absense).
    • Ensure all fonts can be bound into the PDF.
    • Be able to render all the characters in the source data (trademarks, copyrights, names).
    • Ensure the Apple inc default paper size is corrected to A4. Maybe target other document sizes (e.g. posters).
    • Be able to present a test render to the user on screen, to verify the layout (the designer may not have been working to A4).
    • Covert colours to the print colour spaces (normally CYMK).
    • Render all the artifacts to at least 600 DPI.
    • Execution time isn't a top priority.
    • Normally will require level of control for object placement, inside the result PDF.
  • Make read-only transaction documents, to be used outside of the business website.
    • Speed of render is critical, due to work volume.
    • Create all necessary keywords, indexes and metadata.
    • Ensure tabular data is correctly presented.
    • Be able to render all the characters in the source data (trademarks, copyrights, names).
  • a Data Dump ~ this is likely to be rare, as there are more relevant formats
    • High volume of data per document.
    • Resource management on the renderer is critical.

Review on published libraries

There are several different implementations to make PDFs purely inside PHP. Then there are several options written in other languages. Your server environment will determine which you can use. We want headless rendering, for server environments; as the user won't be looking at it on the server (this for example eliminates the Adobe plugin for ms winword). As PDF rendering is quite a large problem, I can see busy projects using several different renderers.

The ancestor tool is ghostscript (and tool suite). It was originally to assist printing on Unix. The wiki 2 supplies details, and states that ps2pdf is just a wrapper. This tool is big, thorough, written in C; and uses alot of RAM. It runs on an impressive range of operating systems, so most useful for older businesses. Its use requires font management, or it can't render anything. I have used it in the past.

If you need to render complex layouts (not necessarily starting with HTML), Apache FOP 3 is a good candidate. It is written in Java and is evolving 4 ~ i.e. API changes ~ care to use the newer versions is useful. Unfortunately this project doesn't do floated objects layouts 5 ~ this is also a problem for some of the PHP processors. Unless you have Java installed on your host, try 6, if your data matches the strengths of this tool.


In PHP, there is mPDF, tcPDF, fPDF, DOMPDF, pdflib, and Zend_Pdf. I have used mPDF. fPDF is old and not maintained. There are comparisons between pcPDF and mPDF 7. The same reference states that Zend_PDF isn't useful for PNG with transparency but is strict/ compliant on UTF-8. A note in 8 states that mPDF supports normal CSS like curves, see examples ~ this makes it better than many MSIE. mPDF talks about fonts and typography for non-European languages 9, it also must support font specific CSS3 and UTF-8 to be able to render this. The actual features are documented on the main website 10, API docs for the library 11. Practical use of mPDF is discussed 12, the author rewrote the HTML to work better with floats. Another resource says mPDF 13 needs to know the target width of elements to be able to compute float (this limit may be resolved, as the reference is old). TcPDF is reviewed 14 as not working well with current HTML (i.e. DIV doesn't work, but TABLEs are). The same article mentions mPDF supporting PDF templates, which can save time if you aren't a webdesigner. The mPDF project is quite well documented, for example 15 the lit of supported HTML tags.

There is a project called html2ps (to be used with ps2pdf). Reading the literature states 16 that this tool doesn't read CSS, so I would avoid. I think I used another library to convert the HTML. As another library in the odd category, FreePDF is recommended 17 for large volumes. It is based on cursors, and can't use HTML.

Review on services

There is a service based on Node 18 ~ this is a github project but is also available as a REST API. As far as you may need to edit the HTML to work better with the renderer, there are docs 19.

An expensive option is PrinceXML?. If all open source solutions fail, try 20. This is priced out of most peoples capacity, and the open source options are much better than when this product was initially sold.

This option 21 is mostly focussed on Wiki, although should work for other document sources. I don't know what range of HTML and CSS they support.

A MSFT focussed solution is 22 available. They use sweat equity 23 to have a good software service.

A .NET service called Aspose.pdf has good documentation 24, and is branded up for “cloud”. The pricing tool is quite complex 25. I think branding “cloud” is very vacuous, but the API seems coherent and well managed.

A PHP based solution is 26, pricing 27, features 28.

Another is 29

Performance

In practical terms, no resource should be slower than two seconds, or users may abort or re-request. There are a couple of solutions to making PDFs, each has different execution costs. The required number of parallel requests is important, when building the architecture. The process for making a PDF is mostly CPU bound. I haven't memory profiled any of the renderers. Given recent hardware, an otherwise low-load machine should be able to make several PDFs concurrently. On previous projects, the biggest issue that I had was variable numbers of pages in the data sources. Obviously more pages, implies a slower render. Many of the earlier systems would mail you the PDF as soon as it was available (which if this a 50 page document would be abit long to wait). If this is an important business function ~ i.e. high use ~ it could be made into a parallel feature, probably going via a queue software. These days I would look for a rentable micro-service. When I looked previously, these didn't exist ~ but this was before AWS. Defraying the spiking processing requirements across multiple businesses is sensible, although it would be smart to be in a different timezone to most of the other customers of the service.


PDF generation (via PHP)

RSS. Share: Share this resource on your twitter account. Share this resource on your linked-in account. G+

PDF generation (via PHP)

RSS. Share: Share this resource on your linked-in account. Share this resource on your twitter account. G+ ­ Follow edited