TaxGeek


Navigation

Home
News
Ideas
Forums
FAQ
Installation
Download
Developers
Links

Release Notes
SourceForge.net Logo

TaxGeek 2006 Ideas Page

Idea 2: Direct writing to PDFs

Currently there is a dependency on PDF::Reuse for writing data to the PDF files. Because PDF::Reuse cannot work properly on IRS supplied fillable PDFs, the PDFs must be flattened first, one page at a time. Then, the data must be placed manually at specific x-y coordinates on the page. Etc. Being able to write directly to the fillable PDF files using the XPCOM capabilities of Mozilla-based browsers would provide a tool that would make TaxGeek platform independent (in a true sense as most Windows users will not be able to install Active Perl), and would provide a tool that other web-based application users would find valuable.

Accessing the field names in Linux is relatively easy using the pdftk (see here. After downloading and installing, one can dump the field names to a file using the command:

% pdftk file.pdf dump_data_fields output fieldsout.txt
Where % is the prompt, file.pdf is the fillable pdf form whose data fields you are interested in, and fieldsout.txt is where you wish the fields names to be dumped to. Everything in italics is part of the command. When this is run on a tax form such as the f1040.pdf, the output fields look something like this:
	---
FieldType: Text
FieldName: f1_01(0)
FieldFlags: 0
FieldJustification: Center
FieldMaxLength: 50
---
FieldType: Text
FieldName: f1_02(0)
FieldFlags: 0
FieldJustification: Center
FieldMaxLength: 50
---
FieldType: Text
FieldName: f1_03(0)
FieldFlags: 0
FieldJustification: Center
FieldMaxLength: 2
---
FieldType: Text
FieldName: f1_04(0)
FieldFlags: 0
FieldJustification: Left
.
.
.
As you will notice, the IRS does not use sensible field names, so some tedious matching of field names may be necessary for each form. Additionally, it is not possible to just replace text inside a PDF file, since each object inside a pdf file is cross-referenced by length and bytes from start of the file. Essentially the cross-reference and the trailer (which is a pointer to the cross-reference) act as checksums on the file. So if you do have clear text that you could replace in theory, in reality, once you do so, you also need to recalculate the cross-reference and the trailer. Oh yeah...and did I mention that PDFs can be compressed? See here for more PDF information.

So on the plus side, implementing a JavaScript API to read and write to PDF forms and then adapting them to TaxGeek would be really handy. On the other-hand, it could probably be its own Summer of Code project.

[frdm] Support SFLC