The PDF Outsider

Wednesday, February 15, 2012

Fun with Microsoft Word-made PDFs

A potential client came to me the other day looking for a solution to replace something called "adlib." I am not sure what that is but what it does is munge PDF files to handle certain aspects of digital signatures.

Basically the company uses Windows-based technology and Word to create PDFs and they needed some help with that. Particularly in the area of tracking objects through that process.

So the idea is that Word makes it hard to do certain things, like control placement of objects in PDFs it generates. So, for example, if you have a large Word document and you want, say, a signature in a certain place in the resulting PDF (created by Word) you can't do it.

So people instead come up with novel solutions using these other products, like "adlib."

So this potential customer asked how could you put something into Word and later come up with the exact location of that "thing" in the Word-generated PDF (Word sort of flows things along and you cannot easily know exact things about how this works). Further, they asked, what could you tell us about the location (page, x, y, width, height) of the object in the final PDF.

So I came up with a feature to the pdfExpress Manufacturing product line to allow someone to place an image in Word, generate a PDF, and locate the image along with its exact position, page, height and width.

Basically the idea is this: create a "placeholder" image (though its easy to support non-image objects like lines and curves and circles as well). Set certain attributes in that image. Place it into Word using one of the myriad of Word API's from something like .NET.

Now Word does bad things to images which appear to be beyond he control of mere mortals, e.g., re-sizing them, re-rasterizing them, etc. But, with a specially crafted image, you can make it work very well.

Once Word spits out the PDF you pass it though pdfExpress Manufacturing. pdfExpress removes all these specially marked images (or other objects) and emits corresponding page, position, height and width data to tell you were the objects that were removed were.

Now you can post process the PDF to add annotations and other Word/PDF junk that apparently many government agencies and big corporations like to see in PDFs, e.g., digital signature junk, etc. Of course, pdfExpress and friends make this easy as well.

pdfExpress can do this on huge PDFs, say 100K pages, as well as smaller PDFs.

Monday, July 11, 2011

PDF Structure

I have spent 15 years working with PDF structure. I am impressed with how little various "hackers" and "security" people really know about its internal format - how little the tools really do.

Most tools simply dump out superficial dumps of streams and other content structures.

The real vulnerability of PDF is the CosObj structure and how its processed. Most constructs are poorly defined in the "standard" and therefore subject to "issues". Acrobat has been a source of consistent issues for me for the last decade and a half or so...

Sunday, July 10, 2011

PDF at the Forefront of iOS/iPhone Hacks...

I have neglected this blog for many, many months. But today I have some startling news PDF.

Virtually every version of iPhone/iOs/iPad/ from Apple appears to have a serious PDF security issue.

I found this due to the site jailbreakme - not that I am interested in jailbreaking per se - but it just showed up like ~~chloroform~~ chlorophyll on Casey Anthony's web browser.

The issue has to do with JavaScript (surprise, surprise). The problem can also surface in things like Safari and other iOS applications (presumably) that use browser technology. You just go to this site, click on some links and viola - your iDevice is jailbroken. (No messary hacking or mucking about with reds0wn and so forth.)

I have tracked down several sites (this, this, and this) with gory technical details but basically it works like this:

1) Inside a PDF you embedded some malicious JavaScript. There are a number of ways to do this. As an expert I am duty bound not to spill the beans.

2) When the PDF is loaded the JavaScript either calls some specific JS functions known to corrupt memory or use something called HeapSpray to fill memory with things you would like to execute.

3) Once the memory in the JavsScript memory in the PDF interpreter is corrupted and local OS ARM code is inserted by JavaScript the code is ARM code executed. The code then uses an exploit to access "root" space which is the highest-privileged operating mode for iOS.

Typically this exploit (as of July 9th, 2011) is the iOS IOSurface exploit - which creates bogus parameters for the IOSurface API call. Since IOSurfaces work in kernel, aka root, space, the hack is basically complete.

4) The IOSurface, now corrupted, executes code for the hack as root. It checks for a previously installed hack (via the existence of bash). If none is found it then downloads something called 'wad.bin' which contains more executable code as well as a Cydia installer for jailbreaking.

Interestingly enough these PDF JavaScript problems have been around for years.

My guess is that there are a huge variety of PDF hacks one could create by carefully manipulating a PDFs structure to cause any number of equivalent to JavaScript hacks.

I have believed for many years that Adobe no longer cares about PDF or print. And, because of this, their policies for upgrading the CS suites, and the fact that PDF and PS are outsourced to a foreign land where no one cares about print - we have holes that go unpatched for a long, long time.

Now certainly there are more consequences to jailbreaking than I am implying here - irreversible things, at least for iPad 2's - so don't just rush out to try this.

However, I am confused by some of this. Certainly it would seem that the right thing to do on an iPhone would be to install some App that runs other apps in kernel mode - rather than mucking up the security of the iOS in general by jailbreaking. That is, create an App that is root-owned, that runs in kernel mode, but is otherwise a normal app. Then fire up that app when you wanted to do things.

That way you're not breaking security....

At any rate... Some day Adobe will fix all this. There have been problems like this before (see the links) and there will surely be more in the future.

At any rate its nice to see that PDF, Adobe's legacy, is getting a whole new lease on life in the iOS world.

Friday, October 15, 2010

Where's the PDF Development?

Though Adobe has passed PDF off into the public domain there are still active pockets of PDF development going on around the world.

Google's CHROME seems to have some active PDF development activity though the focus seems to be on making Chrome use non-Adobe plug-ins for PDF rendering.

There is also "PDF Quick View" when you do a Google search that turns up a PDF (probably in gmail and other Google products as well - but I don't use those...). Its covered in this blog. This seems to work well over all - its generally a lot nicer than fooling around with the Adobe plug-in to see something.

The sad part of all this, though, is that Google is too stupid to understand the full picture of PDF, the one that includes print. (See this from Lone Wolf.) They only care about PDFs that fit into their model. A little digging will demonstrate that they don't fully support PDF rendering yet. I suppose they are working on it and one day will. But as we have seen they don't get color or many other things that many of us in the printing world know, love, and, most importantly NEED from PDF files.

Google is the king of ad placement - though I don't know if Adobe's exit from "PDF Ads" is good or bad in that regard.

More PDF activity is going on relative to SilverLight, PDF editors, and other tools.

There is also the ubiquitous http://www.planetpdf.com/ - though the forums and things there have not had much for years.

My fear is that without any sort of leadership PDF will become fully bastardized over the next couple of years. By that I mean that no one will step in to fill the leadership vacuum left by Adobe's exit from that position.

Each company with its own ax to grind will pick up the parts they care about and ignore the rest. Companies like Google will tear PDF apart faster than the rest because of their ubiquity.

Thursday, October 7, 2010

PDF - Technology to Live Without?

I found an interesting post here. Basically from an enterprise perspective PDF is a no-no - at least on the web - and it comes in at #3 on the list. I have seen this in the real world - many corporate types cannot receive PDFs in emails because they are blocked by the corporate fire wall. My belief is that IT types don't like PDF for a couple of reasons.

First, though it is relatively secure there are some clumsy problems like Javascript that make it seem like a risk.

There are various hack-schemes associated with PDF, javascript hacking chief among them from what I can see. This basically involves some mechanism to run or get you to run a nefarious javascript that has either been embedded in the PDF or is somehow linked to it via web browsing.

Adobe offers fixes for the elements that involve using the PDF to display a dialog that tricks the user into running a malicious app from the PDF as documented here. This is all linked together via the Zeus BotNet.

Second, the machinery of PDF is opaque to IT types. This is kind of an interesting point. I tracked down a Black Hat document on PDF threats (itself a PDF!) Eric Filiol, the author, is the Head Scientist Ofﬁcer of the Virology and Cryptology Laboratory at the French Army Signals Academy.

Basically this document outlines some of the attacks I describe above as well as covers some PDF basics.

What is of interest to me is that its relatively shallow in the nature of what it covers. PDFs are relatively complex files and there are quite a few malicious holes in them. But this analysis stops short of doing much more than a superficial inspection.

They do cover the various Forms actions you can associate with elements of a PDF and they also cover some about registry settings and what they can allow or not allow in terms of security.

I suspect the reasons for this are that to process the guts of a PDF you need some relatively sophisticated technology. The paper describes the PDFStructAzer which is a tool they wrote to monkey with PDF files for hacking purposes.

I sent this guy an email offering to discuss PDF with him - but so far I have not received a response.

Third, and probably most importantly, is that the Adobe Acrobat and Flash worlds are relatively closed. What I mean by this is that on the IT side of the world there is a lot of activity and interaction between the developers and the corporate folks. Back and forth on the Microsoft side over formats, developer kits, and so on. IT folks don't like closed because it makes their jobs harder to do.

Silverlight, for example, is kind of a Flash/PDF replacement for web use. This went through a long beta period with lots of user input from developers.

Try that with an Adobe product.

From the AFP perspective there is much to learn here. AFP is much less complex security-wise than PDF so I doubt you will have nearly the issues coming from that side of things.

Thursday, September 2, 2010

Patent war update...

Follow it here.

Monday, August 30, 2010

Off topic at Lone Wolf and some interesting history...

Post at Lone Wolf.

From a historical perspective readers might find this interesting: Argon.