Processing your modular documents

Processing your modular documents
	Chapter 23. Modular DocBook files

You can run XSLT processing on individual modules or on whole documents assembled from modules. If you process whole documents, you will need a processor that can resolve any XIncludes. The following is an example with xsltproc and its --xinclude option:

xsltproc  \
      --xinclude \
      --stringparam base.dir htmlout/  \
      docbook-xsl/html/chunk.xsl  bookfile.xml

DocBook modules with a DOCTYPE declaration are valid mini documents, and they can be processed individually. This is useful for quick unit testing, but will not produce well integrated output. You generally will want to process your content for output using larger master documents that assemble modules. There are several reasons for doing this:

Numbering of chapters and appendixes depends on the content being processed together in the correct order.
Complete tables of contents require all the content to be processed together.
The olink target dataset for a document should be generated for the whole document so all potential olink targets are included and labeled properly.

When a modular file is processed on its own, certain context information is missing. For example, the third chapter in a book does not know it is the third chapter when processed by itself, so its chapter number appears as "1". Likewise, all printed chapters will begin on page 1. In order to process your content in modules and have each bit of output fit into the whole, you would have to create a customization that feeds the processor context information such as chapter number and starting page number.

If you decide to process individual modules for testing, you might want to output the results to a directory separate from where you output the whole document. That way you do not mix up partial builds with complete builds.

Java processors and XIncludes

Some XML parsers used in the Java XSLT processors Saxon and Xalan do not handle XIncludes. Although xsltproc handles XIncludes, you may be required to use Saxon or Xalan to take advantage of some of its extension functions. You have three choices for handling XIncludes with Saxon or Xalan.

Use the Xerces parser with Saxon or Xalan.
Use xmllint as a preprocessor to resolve XIncludes.
Use XIncluder as a preprocessor to resolve XIncludes.

Using Xerces to resolve XIncludes

You can use Xerces-J as the XML parser for your XSLT processing. Support for XInclude was added starting in version 2.5.0, but the later versions have more complete support. Currently Xerces handles inclusions of whole files or selection of a subset using an ID reference or numbered element positions.

The biggest advantage of Xerces is that it integrates completely with Saxon or Xalan. You will need to download the latest Xerces-J from http://xml.apache.org/xerces2-j/index.html, add the xercesImpl.jar file to your Java CLASSPATH, and add a couple of options to your java command. The following is an example using Saxon:

Example 23.4. XIncludes with Saxon and Xerces

java -cp "/xml/saxon653/saxon.jar:/xml/xerces-2_6_2/xercesImpl.jar" \
    -Djavax.xml.parsers.DocumentBuilderFactory=\
       org.apache.xerces.jaxp.DocumentBuilderFactoryImpl \
    -Djavax.xml.parsers.SAXParserFactory=\
       org.apache.xerces.jaxp.SAXParserFactoryImpl \
    -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
       org.apache.xerces.parsers.XIncludeParserConfiguration \
    com.icl.saxon.StyleSheet \
    -o bookfile.html \
    bookfile.xml \
    ../docbook-xsl-1.73.1/html/docbook.xsl

The first two -D options set up Xerces as the XML parser in Saxon, and the third one turns on the XInclude feature.

If you are also using an XML catalog, you will need to add the catalog resolver options to the command line. They appear after com.icl.saxon.StyleSheet, because those are options understood by that classname, not the Java interpreter. You must also add to your Java CLASSPATH the resolver.jar file and the directory containing the CatalogManager.properties file, as described in Chapter 5, XML catalogs. The following example shows the full command line:

Example 23.5. XIncludes and XML catalogs with Saxon and Xerces

java -cp "/xml/saxon653/saxon.jar:/xml/xerces-2_6_2/xercesImpl.jar:resolver.jar:." \
    -Djavax.xml.parsers.DocumentBuilderFactory=\
       org.apache.xerces.jaxp.DocumentBuilderFactoryImpl \
    -Djavax.xml.parsers.SAXParserFactory=\
       org.apache.xerces.jaxp.SAXParserFactoryImpl \
    -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
       org.apache.xerces.parsers.XIncludeParserConfiguration \
    com.icl.saxon.StyleSheet \
    -x org.apache.xml.resolver.tools.ResolvingXMLReader \
    -y org.apache.xml.resolver.tools.ResolvingXMLReader \   
    -r org.apache.xml.resolver.tools.CatalogResolver \
    -o bookfile.html \
    bookfile.xml \
    ../docbook-xsl-1.73.1/html/docbook.xsl

To add XInclude processing to Xalan, you only need to use the third -D option, because Xalan already is set up to use the Xerces XML parser. The XInclude version of Xerces has been included since Xalan version 2.6.0. If you are using an older version, you will need at least Xalan-J version 2.5.1 and Xerces 2.5.0. The following is an example command:

Example 23.6. XIncludes with Xalan and Xerces

java \
   -Djava.endorsed.dirs="/xml/xerces-2_6_2;/xml/xalan-2_6_0/bin"  \
   -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
      org.apache.xerces.parsers.XIncludeParserConfiguration \
   org.apache.xalan.xslt.Process  \
   -out bookfile.html \
   -in bookfile.xml \
   -xsl ../docbook-xsl-1.73.1/html/docbook.xsl

This example uses the java.endorsed.dirs option to make sure Java uses the newer version of Xalan. See the section “Bypassing the old Xalan installed with Java” for more information. That option identifies the directories that contain the necessary jar files. Put the path to the Xerces directory first so that version of xercesImpl.jar will be used instead of the possibly older one that is distributed with Xalan.

Using Xerces-J to validate XIncludes

Starting with version 2.5.0, the Xerces-J XML parser can validate files that have XIncludes. It uses a utility program sax.Counter that is included in the xercesSamples.jar file that comes with the Xerces-J distribution. The following is an example of how it is used.

Example 23.7. Validating XIncludes with Xerces

java \
        -cp "xerces-2_6_2/xercesSamples.jar:xerces-2_6_2/xercesImpl.jar" \
        -Dorg.apache.xerces.xni.parser.XMLParserConfiguration=\
            org.apache.xerces.parsers.XIncludeParserConfiguration \
        sax.Counter -v myfile.xml

Here myfile.xml contains one or more XIncludes. The file must also validate before the XIncludes are resolved, which means the xi:include element must be in the DTD. See the section “DTD customizations for XIncludes” for more information.

Using xmllint to resolve XIncludes

You can use xmllint's --xinclude option to generate a version of the document with all the XIncludes resolved, and then process the output with Saxon and the DocBook XSL stylesheets. The xmllint tool is included with libxml2 and is available for most platforms. The following example shows how it can be used.

xmllint  --xinclude  bookfile.xml  >  resolved.xml

java  com.icl.saxon.StyleSheet  resolved.xml \
      docbook-xsl/html/chunk.xsl  base.dir="htmlout/"

The result file resolved.xml is a copy of the input file bookfile.xml but with the XIncludes resolved. You can validate resolved.xml as a second step. The XInclude fallback feature is implemented in xmllint, as is the XPointer syntax that is supported in xsltproc.

Using XIncluder in XOM to resolve XIncludes

If you want a Java tool to preprocess XIncludes, you can try XIncluder written by Elliotte Rusty Harold. An earlier standalone version was available at ftp://ftp.ibiblio.org/pub/languages/java/javafaq/, but the latest version is part of his XOM package available at http://www.ibiblio.org/xml/XOM/. The package supports the xpointer attribute for selecting content, and includes several tools for integrating the engine into applications. But if you just want to resolve a document so you can validate or process it, you can use a command like the following:

Resolve XIncludes:
java \
    -cp "xom-1.0b8.jar:xom-samples.jar" \
    nu.xom.samples.XIncludeDriver \
    bookfile.xml  >  resolved.xml

Process the resolved file:
java  com.icl.saxon.StyleSheet  resolved.xml \
      docbook-xsl/html/chunk.xsl  base.dir="htmlout/"

You need to specify the CLASSPATH to include the xom-version.jar file from the distribution, as well as the xom-samples.jar file. This example uses the -cp option to specify the CLASSPATH. On Windows systems, replace the colon in the CLASSPATH with a semicolon.

The result file resolved.xml is a copy of the input file bookfile.xml but with the XIncludes resolved. You can validate resolved.xml as a second step. The current version of XIncluder in XOM implements almost all the features of XInclude, including the fallback feature and selection of content using the xpointer attribute (but not the xpointer() scheme within the attribute).

Using an XSL-FO processor with XIncludes

If you are generating print output from your DocBook files, you may be using one of the convenience scripts that are supplied with the XSL-FO processor. These convenience scripts can perform the XSLT transformation to XSL-FO and convert that to PDF in one step. They are described in the section “Installing an XSL-FO processor”.

The convenience scripts as they ship are not configured to use an XInclude-aware processor to perform the XSLT transformation to XSL-FO format. Because they do not resolve the XInclude elements before the transformation, the stylesheet will report errors that it has no template to handle xi:include elements.

If the convenience script is in an editable form, and if you understand its scripting language, then you can modify it to use an XInclude-aware XSLT processor in the first step. If you do that, then you will have to remember to repeat the process each time the XSL-FO processor and convenience script is updated. Fortunately, the convenience scripts do not have to perform the XSLT transformation step; they can be used to just convert the XSL-FO to PDF.

The easiest solution is to break the processing into two steps. First use an XInclude-aware processor to generate the XSL-FO file, and then process that file with the convenience script supplied by your XSL-FO processor. That scheme lets you choose any of the XInclude processors described earlier, choosing the one that best meets your needs. It also avoids having to understand and edit the convenience script to splice in an XInclude-aware processor.

For example, if you are using FOP, you can change from single step processing:

fop -xml mybook.xml -xsl docbook.xsl -pdf mybook.pdf

to two-step processing:

xsltproc --xinclude -o mybook.fo docbook.xsl mybook.xml
fop -fo mybook.fo -pdf mybook.pdf


Shared text entities		Using a module more than once in the same document