Chapter 23. Modular DocBook files

Chapter 23. Modular DocBook files
	Part IV. Special DocBook features

Modular DocBook means your content collection is broken up into smaller file modules that are recombined for publication. The advantages of modular documentation include:

Reusable content units.
Smaller file units to load into an editing program.
Distributed authoring.
Finer grain version control.

The best tools for modular documentation are XIncludes and olinks. XIncludes replace the old way of doing modular files using system entities. System entities were always a problem because they cannot have a DOCTYPE declaration, and therefore cannot be valid documents on their own. This creates problems when you try to load a system entity file into a structured editor that expects to be able to validate the document. With the introduction of the XInclude feature of XML, the modular files can be valid mini documents, complete with DOCTYPE declaration. Conveniently, the module's DOCTYPE does not generate an error when its content is pulled in using the XInclude mechanism.

Olinks enable you to form cross references among your modular files. If you try to use xref or link to cross reference to another file module, then your mini document is no longer valid. That is because those elements use an IDREF-type attribute to form the link, and the ID it points to must be in the same document. They will be together when you assemble your modules into a larger document, but the individual mini documents will be incomplete. When you try to open such a module in a structured editor, it will complain that the document is not valid. Olinks get around this problem by not using IDREF attributes to form the cross reference. Olinks are resolved by the stylesheet at runtime, whether you are processing a single module or the assembled document. See Chapter 24, Olinking between documents for general information about using olinks, and the section “Modular cross referencing” for using olinks with modular files.

Using XInclude

You can divide your content up into many individual valid file modules, and use XInclude to assemble them into larger valid documents. For example, you could put each chapter of a book into a separate chapter document file for writing and editing. Then you can assemble the chapters into a book for processing and publication.

Here is an annotated example of a chapter file, and a book file that includes the chapter file.

Chapter file intro.xml:
<?xml version="1.0"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> 
<chapter id="intro"> 
<title>Getting Started</title>
<section id="Installing">
...
</chapter>

Book file:
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> 
<book>
<title>User Guide</title>
<xi:include   
    xmlns:xi="http://www.w3.org/2001/XInclude"  href="intro.xml" /> 
...
</book>

	The chapter file has a complete DOCTYPE declaration that identifies the mini document as a `chapter`.
	Unless otherwise specified, an XInclude gets the root element and all of its children.
	The book file also has a DOCTYPE declaration, which says this document is a `book`.
	The syntax for the inclusion is an empty element whose name is `include`, with the name augmented with a namespace prefix `xi:`.
	The namespace declaration for XInclude. The URI in quotes must exactly match this string for it to work. But the namespace prefix `xi:` can be any name, as long as it matches what is used on the `include` element.
	The `href` contains a URI that points to the file you want to include. If no `xpointer` attribute is added, then it will pull in the whole file starting with its root element.

When the XInclude is resolved during the processing, the <xi:include> element will be replaced by the included chapter element and all of its children. It is the author's responsibility to make sure the included content is valid in the location where it is pulled in.

Note

In one of the draft XInclude standards, the namespace URI was changed to use 2003 instead of 2001 in the name, but it was changed back to 2001 for the final standard. Some XInclude processors may not have caught the change. For example Xerces version 2.6.2 expects the XInclude namespace to use the incorrect 2003 value. Later versions work with 2001 in the namespace.

Here are some other nifty features of XInclude:

You can nest XIncludes. That means an included file can contain XIncludes to further modularize the content. This might be useful when keeping a collection of section modules that can be assembled into several different versions of a chapter. Then the chapter file is included in the larger book file.
The href value in an XInclude can be an absolute path, a relative path, an HTTP URL that accesses a web server, or any other URI. As such, it can be mapped with XML catalog entries, as described in the section “XIncludes and XML catalogs”. A relative path is taken as relative to the document that contains the XInclude element (the including document). That is true for each of any nested includes as well, even when they are in different directories.
You can select parts of an included document instead of the whole content. See the section “Selecting part of a file” for more information.
You can include parts of the including document in order to repeat part of its content in the same document, if you do it carefully. When you omit the href attribute, and add an xpointer attribute, then it is interpreted as selecting from the current document. You cannot select the whole document or that part of the document that has the XInclude element, because that would be a circular reference. You also do not want to repeat content that has any id attributes, because duplicate id values are invalid.
A document's root element can be an XInclude element. In that case, there can be only one, since a well-formed document can only have a single root element. Likewise, the included content must resolve to a single element, with its children.

Selecting part of a file

The XInclude standard permits you to select part of a file for inclusion instead of the whole file. That is something that system entities were never able to do. In a modular source setup, that means you do not have to break out into a separate file every single piece of text that you want to include somewhere. You can organize your modules into logical units for writing and editing, and the select from within a file if you need just a piece of a module.

The simplest syntax just has an id value in an xpointer attribute. The following is an example.

<xi:include  
     href="intro.xml"
     xpointer="Installing"  
     xmlns:xi="http://www.w3.org/2001/XInclude" />

If the following chapter file is named intro.xml, then this XInclude will select the section element because it has id="Installing":

<?xml version="1.0"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"> 
<chapter id="intro"> 
<title>Getting Started</title>
<section id="Installing">
  <title>Running the installation</title>
  ...
</section>
</chapter>

For selections based on id, the included document must have a DOCTYPE declaration that correctly points to the DocBook DTD. It is the DTD that declares that id attributes are of the ID type (the name id is not sufficient). If the file does not have the DOCTYPE or if the DTD cannot be opened, then such references will not resolve.

Note

Earlier draft versions of the XInclude standard used a URI fragment syntax to select part of a document, as in href="intro.xml#Installing". That syntax is no longer supported. Now the href must point to a file, and you must use an xpointer attribute to select part of it.

More complex selections can be made using the full XPointer syntax. Several XPointer schemes are defined, not all of which are supported by every XInclude processor. Each scheme has a fixed name followed in parentheses by an expression appropriate to that scheme. The following are several examples that are supported by the xsltproc processor.

xpointer="element(Installing)", xpointer="xpointer(id('Installing'))": These two examples of the schemes named element() and xpointer() are equivalent to xpointer="Installing". They all select a single element with an id attribute. Be careful not to confuse the xpointer attribute with the xpointer() scheme name (see the Note below).
xpointer="element(/1/3/2)": This example selects the second child of the third child of the root element of the included document. For example, an included document could consist of a book root element, which contains only chapter elements that contain only section elements. This inclusion takes the second section of the third chapter of the book. The element() scheme always selects a single element for inclusion.
xpointer="element(Installing/2)": This example selects the second child of the element that has id="Installing" in the included document. With the element() scheme, you cannot refer to elements by element name, only by position number or id.
xpointer="xpointer(/book/chapter[3]/*)": The xpointer() scheme uses a subset of XPath in its expressions. In this case, it selects all of the child elements of the third chapter in the book, but it does not include the chapter element itself. The xpointer() scheme can select more than one element to be included.

Note

Not all processors support all XPointer syntax in XIncludes. One confusing aspect of the XInclude standard is the use of the term xpointer. The standard specifies an xpointer attribute that supports several schemes for selecting content. The element() scheme shown above is one example. Another scheme is named xpointer(), hence the confusion. The xpointer() scheme includes a variant on the XPath language for selecting content, but it never went past the Working Draft stage. While all XInclude processors support the xpointer attribute, only xsltproc supports part of the xpointer() scheme. Check the documentation of your processor to see what parts of XInclude it supports.

Including plain text

You can use XInclude to include plain text files as examples in your DocBook document. The XInclude element permits a parse="text" attribute that tells the XInclude processor to treat the incoming content as plain text instead of the default XML. To ensure that it is treated as text, any characters in the included content that are special to XML are converted to their respective entities:

&   becomes   &amp;
<   becomes   &lt;
>   becomes   &gt;
"   becomes   &quot;

All you need to do is point the href attribute to the filename, and add the parse="text" attribute:

<programlisting><xi:include  href="codesample.c"  parse="text"  
      xmlns:xi="http://www.w3.org/2001/XInclude" /></programlisting>

If you forget the parse="text" attribute, you will get validation errors if the included text has any of the XML special characters.

Since the included text is not XML, you cannot use an xpointer attribute with XPointer syntax to select part of it. You can only select the entire file's content.

But you can specify the encoding of the incoming text by adding an encoding attribute to the XInclude element. In general a processor cannot detect what encoding is used in a text file, so be sure to indicate the encoding if it is not UTF-8. The encoding attribute is not permitted when parse="xml", because the XML prolog already indicates the encoding of an XML file.

XInclude fallback

An XInclude can contain some fallback content. This permits processing to continue if an include cannot be resolved, maybe because the file does not exist or because of download problems. To use the fallback mechanism, instead of an empty xi:include element you put a single xi:fallback child element in it. The content of the child is used if the XInclude cannot be resolved at run time.

<xi:include  href="intro.xml" xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:fallback>
    <para><emphasis>FIXME:  MISSING XINCLUDE CONTENT</emphasis></para>
  </xi:fallback>
</xi:include>

The fallback content must be equally valid when inserted into the document for it to work. In fact, the xi:fallback element can contain another xi:include, which the processor will try to resolve as a secondary resource. The secondary include can also contain a secondary fallback, and so on.

Keep in mind that processing of the document does not stop when an XInclude cannot be resolved and it has a fallback child, even if that child is empty. If you want your processing to always continue regardless of how the includes resolve, then add a fallback element to all of your XInclude elements. If, on the other hand, your XIncludes must be resolved, then do not use fallback elements on the innermost includes and let the processing fail.

XIncludes and entities for filenames

Although XIncludes are intended to replace SYSTEM entities, it is still possible to use regular entities with XInclude. You can declare regular entities for filenames in a file's DOCTYPE declaration, and then use an entity reference in the href attribute of an XInclude element. That let's you declare all the pathname information at the top of the file, where it can be more easily managed than scattered throughout the file in various includes. The example above could be reworked in the following way:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY intro              "part1/intro.xml">
<!ENTITY basics             "part1/getting_started.xml">
<!ENTITY config             "admin/configuring_the_server.xml">
<!ENTITY advanced           "admin/advanced_user_moves.xml">
]> 
<book>
<title>User Guide</title>
<para>This guide shows you how to use the software.</para>
<xi:include  href="&intro;"    xmlns:xi="http://www.w3.org/2001/XInclude"/> 
<xi:include  href="&basics;"   xmlns:xi="http://www.w3.org/2001/XInclude"/> 
<xi:include  href="&config;"   xmlns:xi="http://www.w3.org/2001/XInclude"/> 
<xi:include  href="&advanced;" xmlns:xi="http://www.w3.org/2001/XInclude"/> 
...
</book>

You could also declare all the entities in a central file, and then use a parameter system entity to pull the declarations into all of your documents. See the section “Shared text entities” for an example.

XIncludes and XML catalogs

Since the href attribute of an XInclude element contains a URI, it can be remapped with an XML catalog. That setup would let you enter somewhat generic references in your XIncludes, and then let the catalog resolve them to specific locations on a given system. See Chapter 5, XML catalogs for more information on setting up catalogs.

For example, the following XIncludes use mythical pathnames that do not exist in the file system as they are written.

Example 23.1. XInclude and XML catalog

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                    "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
<book>
<title>User Guide</title>
<para>This guide shows you how to use the software.</para>
<xi:include  href="file:///basics/intro.xml"  
             xmlns:xi="http://www.w3.org/2001/XInclude" /> 
<xi:include  href="file:///basics/getting_started.xml"  
             xmlns:xi="http://www.w3.org/2001/XInclude" /> 
<xi:include  href="file:///admin/configuring_the_server.xml  
             "xmlns:xi="http://www.w3.org/2001/XInclude" /> 
<xi:include  href="file:///user/advanced_user_moves.xml"  
             xmlns:xi="http://www.w3.org/2001/XInclude" /> 
...
</book>

This XML catalog can be used to map these mythical pathnames to real file locations on either the local system or a remote system using a URL.

<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <rewriteURI
    uriStartString="file:///basics/"
    rewritePrefix="file:///usr/share/docsource/modules/IntroMaterial/" />
  <rewriteURI
    uriStartString="file:///admin/"
    rewritePrefix="http://myhost.mydomain.net:1482/library/administration/" />
  <rewriteURI
    uriStartString="file:///user/"
    rewritePrefix="http://myhost.mydomain.net:1482/cgi-bin/getmodule?" />

</catalog>

The resource being included could even be the output of a CGI request, as in the last example above. The href value would resolve in the catalog to http://myhost.mydomain.net:1482/cgi-bin/getmodule?advanced_user_moves.xml. Here getmodule could be a CGI script that pulls content from a database or version control system based on the query string submitted. Of course, processing a file that includes such content from the network relies on the resource being available at the time of processing.

XIncludes and directories

Once you break your content up into modules, you may find it desirable to create a hierarchy of directories to organize the modules. You no longer have to organize the source files according to the content flow in a publication. Rather, you are free to organize the modules on your file system in any way that facilitates the management of the content, such as by chapter, subject matter, user level, author, or whatever. Your publications can use XInclude to pick and choose from among your directory hierarchy to assemble the content.

An XIncluded file can contain other XIncludes, which allows you to create a nested hierarchy of XIncludes to build a publication. They can be nested to whatever depth is necessary. The nesting of XIncludes can be completely independent of the nesting of directories. When the master document is assembled by the parser, each sequence of XIncludes is followed through the directory hierarchy to locate the content.

The href in an XInclude can be either an absolute URI or a relative URI. Absolute URIs are unambiguous, but not very portable. That is, if you move the content to another machine that has a different base location, all of the addresses will be wrong. But absolute URIs can be made portable by using an XML catalog as described in the previous section. The catalog can map the absolute URIs in the hrefs to a different location on the new system.

If you use relative URIs in your XInclude hrefs, each path is taken relative to the location of the document that contains the XInclude. So each module only has to keep track of its own XIncludes, and does not have to worry about how it might be used elsewhere in the hierarchy. This means you can process an individual module for testing from its own location, and its XIncludes will work. And it means when you process the module as referenced from another XInclude, its own XIncludes will still work. This is how a modular system should work, and it does.

Relative URIs can use the ".." syntax to indicate a parent or higher directory. The following example will XInclude a file that is located two directory levels up and one level down relative to the current file's location:

<xi:include  href="../../userguide/chapter2.xml"  
             xmlns:xi="http://www.w3.org/2001/XInclude" />

Relative paths work best when they are kept simple. A complicated path like the preceding example indicates how flexible XIncludes can be, but do not get carried away. Remember, you have to maintain these files. If you decide to rearrange your directory hierarchy, you could end up having to fix a lot of XIncludes. You might be better off using an XML catalog with absolute URIs that the catalog can resolve. Then if you rearrange your directories, you just need to rewrite your catalog file.

XIncludes and graphics files

The previous section describes how XIncludes with relative URIs are resolved relative to the current file. The XInclude processor can do that because it fully recognizes each XInclude element from its unique namespace attribute.

But what about relative graphics file references? An XInclude-aware parser does not automatically know that the fileref attribute in an imagedata element is a path that needs to be resolved relative to the current file's location. It is the stylesheet's responsibility to do that. Fortunately, the XInclude standard helps the stylesheet do that automatically by requiring the XInclude processor to insert xml:base attributes when needed.

Here is how it works:

When the XInclude processor encounters an XInclude element, it replaces the XInclude element with the content pulled from the other file.
As it is copying the root element from the included content, it will add an xml:base attribute to that included root element if its directory differs from the location of the current file. The xml:base value indicates the location of the XIncluded file. Any XIncluded file from the same directory as the current file does not need an xml:base attribute.
When the XSL stylesheet processes the document with all of its XIncludes resolved, the stylesheet uses the xml:base attributes to help resolve any relative paths in a graphic element's fileref. It does that by scanning back through the graphic's ancestor elements to find an xml:base attribute. The stylesheet then prepends that to the fileref path.
If you have used nested XIncludes in different directories, the stylesheet will continue tracing backwards through the graphic element's ancestors, looking for xml:base attributes. The stylesheet combines them into one final path for the fileref, which ends up being the path from the master document to the graphics file.

If a fileref is an absolute URI, then it is used as it is written, and xml:base attributes are not added to it. If for some reason you want all of your fileref attributes to be left “as is”, then set the stylesheet parameter keep.relative.image.uris to 1. The default value is 0 in XSL-FO output, and 1 in HTML output.

To summarize:

A fileref or an entityref containing an absolute path is always copied without change.
An entityref containing a relative path is interpreted as relative to the file declaring the entity, without regard to any xml:base attributes.
A relative fileref processed without using XInclude is always copied without change.
A relative fileref processed with XInclude and with the parameter keep.relative.image.uris="0" is changed to account for any xml:base attributes (this is the default setting for print output).
A relative fileref processed with XInclude and with keep.relative.image.uris="1" is always copied without change (this is the default setting for HTML output).

XIncludes and external code files

The xml:base attributes are also used to resolve relative paths in fileref attributes in textobject elements. See the section “External code files” for an example.

Entity references in included text

You might be wondering what happens to any entity references that appear in the included content. An entity reference such as &companyname; must have an entity declaration in the DTD to be resolved. If your entities are all declared in an extension to the external DocBook DTD, then your main document and the modules that use that DTD will all share the same entity declarations and there is no problem.

But what if you declare an entity in the DOCTYPE of your included file? Does the declaration go along with the included content? The answer is basically yes, with some caveats.

If your main document has a DOCTYPE declaration at the top, then any entity declarations needed for the included content are copied to that DOCTYPE from the included file.
If the DOCTYPE in the main document already has an entity declaration for that name, then the declaration in the included file must match it, or else an error will be generated. There is no overriding or substitution of entity values when using XIncludes.
If there are any entity references in the included content that are not declared in the included file, then the include will fail. In other words, you cannot rely on the entity declarations in the main document to expand entity references in the included text. The text in the included document is parsed before it is included, and any entity references must resolve there.

See the section “Shared text entities” for a good strategy on managing entities in a modular doc setup.

XIncludes in XML editors

XIncludes are a fairly recent addition to the set of XML standards, so XIncludes are not uniformly supported in XML editing software. Certainly any XML editor can create and edit the modular XML files that go into a modular doc set. But an XML editor can go beyond that basic function and help you build those modules into complete documents.

As of early 2007, here is a summary of what some XML editors provide:

Serna from Syntext, Inc. has extensive support for XIncludes. This graphical editor provides menu options for inserting an XInclude, converting selected content into an XInclude, and converting an XInclude to local content. Even better, it formats and displays the XIncluded content inline in editable form, with the boundaries marked by icons. If you make changes to the content between the icons, the editor writes the changes out to the included file.
Arbortext Editor from PTC also has extensive support for XIncludes. You can insert an XInclude that will be validated in context, edit the content inline, and save the changes back to the XIncluded file.
XXE from XMLmind recognizes XIncludes and displays the content inline. The included content is not editable inline, but you can use a menu option to open the included document in another window for editing.
Oxygen from SyncRO Soft Ltd is an XML code editor that can recognize XIncludes. It supports validation of the master file and also the validation of the individual include files. However, it will not validate the master document with all XIncludes resolved.
XMetal from JustSystems, Inc. has no direct support for XIncludes, but it has extensive customization features that support modular content reuse.


Equation numbering		Validating with XIncludes