Creating PDF files using Apache FOP

Introduction

Because Tridion R5 separates content from layout, many different page formats are possible. However PDF, a popular document format, remains difficult because of its tangled structure that contains many elements in an almost random order, referring to each other. A solution to this problem is to combine Tridion with a third party tool. This article describes how to use Apache FOP to this end.

Prerequisites

Apache FOP is freeware from the Apache foundation. 
It can be found at xml.apache.org/fop/.

This example was developed against FOP 0.20 but may be compatible with newer versions too.

1. Mechanism

The PDF publishing mechanism involves a small chain of processing actors:

  1. Pages are published using templates in XSL-FO format.
  2. The published .fo files are picked up by a daemon and forwarded to Apache FOP.
  3. FOP creates .pdf files from the .fo files.       

The attached package consists of two Tridion templates and the source code for the daemon.

2. Page template

The page template "PDF.fo" is relatively straightforward. Most of it is XSL-FO that defines layout. At the end is a short piece of VBScript code to display all components on the page.

3. Component template

The component template "PDF.tcts" assumes that the schema contains the fields "Heading", "Date", "Author" and "FullText". The first three fields can be text, numbers or dates; the last field is assumed to be formatted text. 

In order to avoid frustrating the XSL-FO, <P> and <BR> tags in the "FullText" field are wrapped in blocks using the "StripHTMLCode" function that is defined in the template.
One should not use < and > characters in the component fields, to avoid breaking the XSL-FO syntax.

4. PDF daemon

The daemon is a command-line tool that runs in the background. It is not a Windows service, though it can be run as such using other tools. 

At startup, the daemon reads directory names from its command line parameters and proceeds to scan the directories given. It scans these for any files with .fo extension and stores them in an array of dictionary objects. For each file, name and modification date+time are stored.

It proceeds into an infinite loop, which is spent waiting most of the time. Every 100 milliseconds the daemon wakes up to signal Windows it is still active. Every 5 seconds it wakes up to scan its directories for changes.

Files that are found to be new or updated trigger the program into action. For each of these Apache FOP is launched, using the Shell command. FOP is not thread-safe, so after every launch the daemon waits 2 seconds before initiating another launch.

Processed files have their entries in the array updated. If a .fo file that is known to the daemon disappears, the daemon will delete the corresponding .pdf file also. FOP, activated through the batch file fop.bat, must be present in the same directory as the daemon. Or, the daemon must be installed in the same directory as FOP. 

The program writes all important actions to a logfile so that users can keep track of its activities. If an error occurs it will be written to the log but also pop up a message box.

Related Links