Installing an XSL-FO processor

This section describes how to install and use the free XSL-FO processor, FOP. The commercial processors are assumed to provide their own documentation and support, so installation instructions for commercial processors are not provided in this book.

Note

For a long time, version 0.20.5 of FOP was the only stable version while the code was being refactored. Now the refactored version has been released, with the first stable version 0.93. It is highly recommended that you not use version 0.20.5 anymore because of its limitations.

Installing FOP

FOP is also a Java program, so it is easy to install, especially if you already are using Java programs such as Saxon or Xalan.

  1. Update your Java

    Since FOP requires a Java runtime environment, you might need to obtain or update your Java setup before FOP will work. See the step on updating Java in the section “Installing Saxon”.

  2. Download FOP

    To download FOP, go to http://xml.apache.org and locate the latest stable version for download (currently version 0.93). You probably want the binary version rather than the source version. The distribution comes as a compressed zip file with everything you need. That site will also provide you with detailed instructions for getting started with FOP.

  3. Unpack the archive

    FOP is distributed as a zip file, which can be opened on almost all systems. Linux users can also download a gzipped tar file (.tar.gz suffix).

  4. Locate the FOP .jar files

    Although most people will run FOP using its included convenience scripts, it is useful to know where the files are. The main file is build/fop.jar in the directory you unpacked FOP into. The lib directory has other .jar files that may be used by the FOP convenience scripts. The version numbers shown here may differ from the ones in your distribution.

    avalon-framework-4.2.0.jar

    A software framework that allows software components to work together. It is used internally by FOP.

    batik-all-1.6.jar

    Provides the support library for SVG graphics.

    xalan-2.7.0.jar

    The Xalan XSLT processor that may be used by the FOP convenience scripts. The scripts have an option to convert your XML to XSL-FO using Xalan, and then process the XSL-FO, all with one command.

    xercesImpl-2.7.1.jar

    The XML parser used to parse the XSL-FO file.

    xml-apis-1.3.02.jar

    Provides the SAX, DOM, and JAVAX interfaces used by Xalan.

  5. Download the graphics library files

    You will most likely want to process bitmap graphics in your document. FOP has built-in support for some graphics formats, but some popular formats such as PNG are not supported natively. To process other graphics formats, FOP supports the use of Sun's Java Advanced Imaging (JAI) library, although it does not include the files. You can download the JAI files from http://java.sun.com/products/java-media/jai/current.html (you do not need the Image IO Tools download). If you do the CLASSPATH installation, you can put the files wherever you like. The easiest way to get JAI included is to copy the jai_core.jar and the jai_codec.jar files from the JAI installation area to the lib subdirectory of the FOP installation. Then they will automatically be included in the CLASSPATH for FOP processing.

  6. Download the hyphenation .jar file

    If you are processing languages other than English, then you need to download an additional file named fop-hyph.jar from http://offo.sourceforge.net/hyphenation/index.html. Copy it to the lib subdirectory of the FOP installation. Then it will automatically be included in the CLASSPATH for FOP processing.

Using FOP

FOP will convert a .fo file generated by one of the above processors into a .pdf file. FOP is a Java application, so to use the FOP Java command line, you need to set the CLASSPATH environment variable as described in the section “FOP Java command”. However, if you use one of the FOP convenience scripts, they will set the CLASSPATH for the duration of the script.

Before you run the FOP command, you need to process your DocBook file with the fo/docbook.xsl stylesheet to generate a .fo file. The .fo file is the input to the FOP processor. The stylesheet will tune the XSL-FO output for FOP when you set the stylesheet parameter fop1.extensions to 1.

Note

Use the stylesheet parameter fop1.extensions with FOP version 0.93 and later. The old fop.extensions parameter should only be used with FOP version 0.20.5 and earlier.

The following is an example command line using xsltproc to generate an XSL-FO file suitable for input to FOP:

xsltproc  \
    --output myfile.fo  
    --stringparam fop1.extensions 1  \
    docbook-xsl/fo/docbook.xsl  \
    myfile.xml

See Chapter 6, Using stylesheet parameters for more information on using stylesheet parameters.

You will know that it is working if you see a message like “Making portrait pages on US letter paper”. That message comes from a template named root.messages in the stylesheet file fo/docbook.xsl. You can change what the message says in a customization layer, or you could define it as an empty template there to turn off the message entirely. Once you have generated to XSL-FO file, you are ready to use FOP.

Fop convenience scripts

The FOP distribution includes some convenience scripts that set the CLASSPATH for you and run the Java command. Which script you use depends on the operating system: fop is a shell scripts for Linux or Unix, or fop.bat for Windows. The scripts can optionally run the XSLT process on your XML source file to produce the XSL-FO file before generating PDF. That may save you a step, but you will not be able to examine the XSL-FO output when you do that. The following are some examples of using the scripts:

Convert a .fo file on Unix or Linux:
fop.sh -fo myfile.fo -pdf myfile.pdf

Convert an XML source file Unix or Linux:
fop.sh -xsl /docbook-xsl/fo/docbook.xsl -xml myfile.xml -pdf myfile.pdf

Convert a .fo file on Windows:
fop.bat -fo myfile.fo -pdf myfile.pdf

Convert an XML source file on Windows:
fop.bat -xsl /docbook-xsl/fo/docbook.xsl -xml myfile.xml -pdf myfile.pdf

All of the arguments to the command are in the form of options, and they can be presented in any order. The options for FOP are listed at http://xml.apache.org/fop/running.html. One option you will not find is the ability to set DocBook stylesheet parameters on the command line when you use the -xsl option that processes the stylesheet. If you need to use parameters, you should use a separate XSLT processor first to generate the XSL-FO file for FOP to process.

FOP Java command

You may want to set your CLASSPATH yourself to run the FOP Java command. See the section “Installing FOP” for information on what files need to included in the CLASSPATH. The safest approach is to include everything in the lib directory of the FOP distribution as well as build/fop.jar. The following example assumes the FOP .jar files are installed into /usr/java. Replace any version strings in the example below with the actual version numbers on the files in your FOP distribution.

Setting CLASSPATH:
CLASSPATH="/usr/java/fop-0.93/build/fop.jar:\
/usr/java/fop-0.93/lib/avalon-framework-version.jar" 
/usr/java/fop-0.93/lib/batik-version.jar:\
/usr/java/fop-0.93/lib/commons-io-version.jar:\
/usr/java/fop-0.93/lib/commons-logging-version.jar:\
/usr/java/fop-0.93/lib/fop-hyph.jar:\
/usr/java/fop-0.93/lib/jai_core.jar:\
/usr/java/fop-0.93/lib/jai_codec.jar:\
/usr/java/fop-0.93/lib/serializer-version.jar:\
/usr/java/fop-0.93/lib/xalan-version.jar:\
/usr/java/fop-0.93/lib/xercesImpl-version.jar:\
/usr/java/fop-0.93/lib/xml-apis-version.jar:\
/usr/java/fop-0.93/lib/xmlgraphics-commons-version.jar:\
export CLASSPATH

General syntax:
java  org.apache.fop.cli.Main  [options]  \
    [-fo|-xml] infile  \
    [-xsl stylesheet-path]   \
    -pdf  outfile.pdf

Convert a .fo file to pdf:
 java  org.apache.fop.cli.Main  \
    -fo  myfile.fo  \
    -pdf myfile.pdf 

Convert an XML source file directly to pdf:
 java  org.apache.fop.cli.Main  \
    -xml myfile.xml  \
    -xsl docbook-xsl/fo/docbook.xsl  \
    -pdf myfile.pdf

This form of the command takes the same set of options as the FOP convenience scripts.

FOP java.lang.OutOfMemoryError

Depending on the memory configuration of your machine, your FOP process may fail on large documents with a java.lang.OutOfMemoryError. It may be that your system is not allocating enough memory to the Java Virtual Machine. You can increase the memory allocation by adding a -Xmx option to any Java command. You can make the change permanent by adding it in the FOP convenience script, such as fop.bat:

java -Xmx256m -cp "%LOCALCLASSPATH%" ...
 

In this example, the memory allocation is 256 MB. The value you use should be less than the installed memory on the system, and should leave enough memory for other processes that may be running.

Using other XSL-FO processors

The number of XSL-FO processors is growing. Most of them are commercial products, but they are in serious competition on price and features, which benefits the user community. They also differ in the features they offer. Here is a quick description of some of the features:

  • Some products like Antenna House's XSL Formatter provide a graphical interface that previews the formatted output.

  • Some products provide a command line interface or convenience script. These are useful for automated batch processing of many documents, so you do not have to open them one at a time in a graphical interface.

  • Some provide a programming API, so that you can incorporate the XSL-FO processing into larger applications.

  • Some provide extension elements and processing instructions to enable features that are not covered in the XSL-FO 1.0 standard. Many of those extensions were incorporated into the recently finalized XSL-FO 1.1 standard.

  • Some products can generate multiple output types, such as PDF and PostScript.

Because these products are undergoing rapid development, and because they provide their own documentation and support, this book will not provide general instructions on how to use them. But the DocBook XSL stylesheets include support for some of the extensions provided by a few of the processors, and those will be described in this book.

Processor extensions

As of the current writing, the DocBook stylesheets support extensions in PTC's Arbortext, RenderX's XEP, and Antenna House's XSL Formatter products. When the extensions for one of these processors is turned on, extra code is written by the stylesheet into the XSL-FO file. That extra code is understood only by a specific processor, so this feature is controlled by stylesheet parameters.

If you are using XEP, then set the xep.extensions parameter to 1. If you are using Antenna House's product, then set the axf.extensions parameter to 1. If you are using the Arbortext processor, then set the arbortext.extensions parameter to 1. You should never turn on the extensions for a processor you are not using, or you will likely get a lot of error messages from the XSL-FO processor that does not understand the extra code.

Not all extension functions in each product are used by the DocBook stylesheets. If you find in their documentation an extension you want to use, you can write a customization layer that implements an extension.

Here are the XSL-FO processor extensions that the stylesheets currently implement:

  • PDF bookmarks. When you open a PDF file in a PDF reader, the left window pane may show a table of contents. Those links are PDF bookmarks inserted into the PDF file by the stylesheet using the processor's extension elements. In XEP, the extension element is rx:bookmark. In Antenna House, an extension attribute named axf:outline-level is used. In Arbortext, the element is fo:bookmark, which is part of the XSL 1.1 standard that is recognized by the Arbortext processor.

  • PDF document information. When you view a PDF file's document properties in the reader, it may show title, author, subject, and keywords information. That information is inserted by the stylesheet as extension elements in the XSL-FO file. In XEP, the extension element is rx:meta-info. In Antenna House, the extension element is axf:document-info.

  • Index cleanup. The XSL-FO 1.0 standard has no way of specifying how page numbers in a book's index should be cleaned up. The cleanup process entails removing duplicate page numbers on an entry, and converting a sequence of consecutive numbers to a page range. This produces a more usable index. In XEP, the extension element is rx:page-index. In Antenna House, the extension is an attribute named axf:suppress-duplicate-page-number.