Wednesday, August 11, 2010

PDF Size and Performance...

This is a topic which comes up frequently (and no, this is not Viagra spam).

People say "This PDF is too large and it will take to long to RIP."   Basically most people make a direct link between the size of the PDF and the performance they are going to get RIPing the file.

In general this is wrong for several reasons.  First of all, until very recently PDF was always turned into PostScript before rasterizing on a RIP.  Now PostScript is a programming language which means that the RIP must process the language statements in order to create the raster.  All this takes time - especially when you have a lot of nested PDF forms.  So any PDF file would be effectively processed twice - once to parse the PDF to PostScript, again to process the PostScript.

There isn't a one-to-one correspondence between PDF operators and PostScript operators, particularly in terms of complexity, so seemingly simple and short PDF might not be simple PostScript as far as the rasterizer is concerned.

PDFs can be very large due to embedded images.  The most profound effect I have seen on performance (and I mostly work on non-plating devices) is extra resolution, i.e., too many bits for the job.  Shear volume is the first problem - a 2,400 dpi CMYK image takes a long time for the RIP to consume because there are a lot of bytes.  If you only need 600 dpi then don't put a 2,400 dpi image into the file.  RIPs process images quickly but can be overwhelmed by sheer volume.

So even though there are lots of images the file may still RIP quickly.

Font usage is a weakness in many RIPs - particularly using a lot of fonts.   There are many cases of PDFs having a completely new set of fonts on each page in, say, a thousand page document.  RIPs don't deal well with this and for me its been a problem over the years.  This issue compounds another common font issue - bitmap fonts.  Many programs, particularly those that convert to PDF, tend to convert source bitmap fonts to PDF bitmap fonts.  My experience is that the higher the percentage of bitmap fonts in a file the slower, in general, it will RIP as page count goes up.

Applications go to great lengths to obfuscate fonts so that font licenses can't have their intellectual property stolen.  Unfortunately you may be paying the price for this with crappy RIP performance - so you're paying twice - once for the font if you need it and again to RIP hundreds or thousands of copies to prevent you from stealing it.

The last problem area is transparency.  There are two types - masks and true PDF transparency - and both create performance issues.  (Most of what I will say here is more or less invisible to a use because "A" list applications try very hard to "hide" the fact that transparency is being used.)  Basically any situation in a PDF file where you can "see through" a transparent part to some underlying PDF element is a problem.  For transparency masks, which can occur in any PDF, the problem increases as the size of mask increases.  For true PDF transparency (controlled by the graphic state) any use is a problem for performance.

The issue is simple - a printer cannot render "transparent" ink. So, if a blue line crosses over a black line and we want to see some of the black line through the blue one the RIP has to calculate what colors the effect of the transparency would have and print a gray-blue color to present that effect.  The calculation requires the RIP to rasterize the transparent areas, create colors for them, then merge that rasterization result with the result of rasterizing everything else.

The bottom line is that transparent areas are rasterized twice - which slows things down.

So very large PDF files, as long as they avoid these and other pitfalls, will RIP very fast.  At the same time, very small files using many of these bad constructs, will RIP slowly.

No comments:

Post a Comment