TaxGeek
Home
News
Ideas
Forums
FAQ
Installation
Download
Developers
Links
Accessing the field names in Linux is relatively easy using the pdftk (see here. After downloading and installing, one can dump the field names to a file using the command:
% pdftk file.pdf dump_data_fields output fieldsout.txtWhere % is the prompt, file.pdf is the fillable pdf form whose data fields you are interested in, and fieldsout.txt is where you wish the fields names to be dumped to. Everything in italics is part of the command. When this is run on a tax form such as the f1040.pdf, the output fields look something like this:
As you will notice, the IRS does not use sensible field names, so some tedious matching of field names may be necessary for each form. Additionally, it is not possible to just replace text inside a PDF file, since each object inside a pdf file is cross-referenced by length and bytes from start of the file. Essentially the cross-reference and the trailer (which is a pointer to the cross-reference) act as checksums on the file. So if you do have clear text that you could replace in theory, in reality, once you do so, you also need to recalculate the cross-reference and the trailer. Oh yeah...and did I mention that PDFs can be compressed? See here for more PDF information.--- FieldType: Text FieldName: f1_01(0) FieldFlags: 0 FieldJustification: Center FieldMaxLength: 50 --- FieldType: Text FieldName: f1_02(0) FieldFlags: 0 FieldJustification: Center FieldMaxLength: 50 --- FieldType: Text FieldName: f1_03(0) FieldFlags: 0 FieldJustification: Center FieldMaxLength: 2 --- FieldType: Text FieldName: f1_04(0) FieldFlags: 0 FieldJustification: Left . . .
So on the plus side, implementing a JavaScript API to read and write to PDF forms and then adapting them to TaxGeek would be really handy. On the other-hand, it could probably be its own Summer of Code project.