Wednesday, February 15, 2012

Fun with Microsoft Word-made PDFs

A potential client came to me the other day looking for a solution to replace something called "adlib."  I am not sure what that is but what it does is munge PDF files to handle certain aspects of digital signatures.

Basically the company uses Windows-based technology and Word to create PDFs and they needed some help with that.  Particularly in the area of tracking objects through that process.

So the idea is that Word makes it hard to do certain things, like control placement of objects in PDFs it generates.  So, for example, if you have a large Word document and you want, say, a signature in a certain place in the resulting PDF (created by Word) you can't do it.

So people instead come up with novel solutions using these other products, like "adlib."

So this potential customer asked how could you put something into Word and later come up with the exact location of that "thing" in the Word-generated PDF (Word sort of flows things along and you cannot easily know exact things about how this works).  Further, they asked, what could you tell us about the location (page, x, y, width, height) of the object in the final PDF.

So I came up with a feature to the pdfExpress Manufacturing product line to allow someone to place an image in Word, generate a PDF, and locate the image along with its exact position, page, height and width.

Basically the idea is this: create a "placeholder" image (though its easy to support non-image objects like lines and curves and circles as well).  Set certain attributes in that image.  Place it into Word using one of the myriad of Word API's from something like .NET.

Now Word does bad things to images which appear to be beyond he control of mere mortals, e.g., re-sizing them, re-rasterizing them, etc.  But, with a specially crafted image, you can make it work very well.

Once Word spits out the PDF you pass it though pdfExpress Manufacturing.  pdfExpress removes all these specially marked images (or other objects) and emits corresponding page, position, height and width data to tell you were the objects that were removed were.

Now you can post process the PDF to add annotations and other Word/PDF junk that apparently many government agencies and big corporations like to see in PDFs, e.g., digital signature junk, etc.  Of course, pdfExpress and friends make this easy as well.

pdfExpress can do this on huge PDFs, say 100K pages, as well as smaller PDFs.

1 comment:

  1. Interesting article. The scenario that you have mentioned above is somehow unique and strange to me. The solution that you have suggested is highly convincing to me. The product that is used to solve this issue is very useful and is also an effective solution that can easily manage huge PDFs.
    digital signature PDF

    ReplyDelete