Chapter 4. DocBook 5 tools

Table of Contents

DocBook 5 differences
DocBook 5 namespace
DocBook 5 schemas
Universal linking in DocBook 5
Uniform metadata elements
Annotations
Entities with DocBook 5
Separate DocBook 5 entities file
DocBook character entities
Processing DocBook 5
DocBook 5 validation
DocBook 5 XSLT processing

DocBook 5 is the next generation of DocBook. While it is not a radical change in terms of element names and structures, it signficantly changes the foundation on which DocBook is based. The changes allow DocBook 5 to interact with other modern XML standards and practices.

DocBook 5 differences

These are the major changes included in DocBook 5. They are each described in more detail in the following sections.

  • DocBook namespace. The biggest change for DocBook 5 is that its elements are all defined in a DocBook namespace http://docbook.org/ns/docbook. This allows elements from other namespaces to be mixed into DocBook documents without creating element name conflicts. For example, MathML can be embedded using the MathML namespace. Likewise, DocBook fragments can more easily be embedded in other compound document types.

  • RelaxNG schema. For the first time, the DocBook standard is defined using the RelaxNG schema language. RelaxNG is more powerful than DTDs, and easier to customize than XML Schemas. RelaxNG permits an element in different contexts to have different content models. For convenience (or necessity in some cases), versions of DocBook 5 are also available in DTD and XML Schema form, but those are considered non-normative and do not match all the features of the RelaxNG version.

  • Universal linking. In DocBook 4, only a few elements like link and xref were used to form links. In DocBook 5, most elements that generate some output can be made into a link. The link can go to an internal or external destination. Also, the id attribute in DocBook 4 is replaced with the xml:id attribute.

  • Unform metadata elements. In DocBook 5, elements from DocBook 4 such as bookinfo, chapterinfo, sectioninfo, etc. are all replaced by a single info element. The element may have different content models in different contexts, to manage titled and non-titled elements, for example.

  • Annotations. DocBook 5 introduces a general purpose annotation mechanism that allows you to associate information with any element.

DocBook 5 namespace

A DocBook 5 XML file will look a lot like a DocBook 4 XML file. The main difference is that the document's root element must have the DocBook namespace attribute and a schema version attribute. For example:

<?xml version="1.0"?>
<book xmlns="http://docbook.org/ns/docbook" version="5.0">
...

A namespace attribute is the special XML attribute xmlns that identifies a unique URI that is the namespace name. In this case, the URI is http://docbook.org/ns/docbook, which was defined by the OASIS DocBook Technical Committee as the official namespace name for DocBook.

A namespace attribute may optionally define a namespace prefix, and then the elements in that namespace must use the prefix on the element name. In the following example, the prefix is d:

<?xml version="1.0"?>
<d:book xmlns:d="http://docbook.org/ns/docbook">
...

Note that the root element is now d:book, and all other DocBook elements in the document must also have the d: prefix on their names (in both opening tags and closing tags of elements). When a namespace attribute has no prefix, the namespace becomes the default namespace.

A namespace attribute on an element means that the namespace is in scope for that element and all of its descendants. That does not necessarily mean those elements are in the namespace, just that the namespace is recognized. An element within that scope is actually in the namespace only if the element's prefix matches the namespace attribute's prefix. That includes the special case of the default namespace when the attribute does not define a prefix, in which case any element that is in scope and without a prefix is in that namespace.

Setting the namespace as the default namespace is usually more convenient when creating an entire document in a single namespace, as is typically done with DocBook. If you put a default namespace attribute on the root element, then the namespace is in scope for all elements, and all elements without a prefix are in the namespace. Compare these two equivalent documents, one using the default namespace and the other using a prefix:

Default namespace:
<?xml version="1.0"?>
<book xmlns="http://docbook.org/ns/docbook" version="5.0">
  <info>
    <title>My book title</title>
    <subtitle>My subtitle</subtitle>
  </info>
  ...

Namespace prefix:
<?xml version="1.0"?>
<d:book xmlns:d="http://docbook.org/ns/docbook" version="5.0">
  <d:info>
    <d:title>My book title</d:title>
    <d:subtitle>My subtitle</d:subtitle>
  </d:info>
  ...

If a file does not have the DocBook namespace declaration on its root element, then the DocBook XSL stylesheets will try to process it as a DocBook 4 document.

Of course, just adding a namespace declaration may not make a DocBook 4 into a valid DocBook 5 document. There are differences in elements and content models between the two versions, so some fixup may be required.

Fortunately, a guide and conversion stylesheet exist to help transition DocBook 4 documents to DocBook5. First consult DocBook V5.0: The Transition Guide. It provides guidelines for conversion and describes the db4-upgrade.xsl stylesheet that can upgrade a version 4 document to version 5.

Included files

If you are assembling modular DocBook files into larger documents as described in Chapter 23, Modular DocBook files, then each of the included files must also have the DocBook namespace declaration on its root element. If not, then the stylesheet will report that the module's root element has no matching template.

DocBook 5 schemas

Another major difference between DocBook 4 and DocBook 5 is the schema language. An XML schema defines the element and attribute names, and the rules for how they are combined into documents. DocBook 4 was created when the only XML schema language was the Document Type Definition (DTD) so the official DocBook 4 schemas are all DTDs. But DTDs predate namespaces, so a DTD is not suitable as a namespace-aware schema.

The DocBook 5 reference schema is written in RelaxNG, a relatively new XML schema language. Its major advantages for use as the official DocBook schema include:

  • It handles namespaces.

  • It allows the content model of an element to be different when that element is in different contexts.

  • It is relatively easy to read in its compact form.

  • It is quite easy to customize in order to extend or subset the DocBook schema. See the section “Adding attributes to RelaxNG” for an example of customizing DocBook's RelaxNG schema to add attributes.

Although the normative version of the DocBook 5 schema is written in RelaxNG, there are non-normative versions generated from it in DTD form and in the W3C XML Schema language. These other versions contain the same element and attribute names. However, in each of these other versions, certain features of the schema are lost. For example, the DTD version does not permit an element to have different content when the element appears in different contexts.

The disadvantages of using RelaxNG include the following:

  • No support for entity declarations.

  • Fewer tools for validation.

Universal linking in DocBook 5

In DocBook 4, only specialized elements are used for creating links within and between documents. In DocBook 4, you can use xref or link with linkend attributes to form links within a DocBook document, you can use olink to form links between DocBook documents, and you can use ulink to form an arbitrary URL link.

In DocBook 5, almost all elements can be used as the basis for a link. That's because almost all elements have a set of attributes that are defined in the XLink namespace, such as xlink:href. For example, you can turn a command element into a link that targets the reference page for the command.

<para>Use the
<command xlink:href="#ref-preview">Preview</command>
command to generate a preview.</para>

The XML Linking Language (XLink) has been a W3C standard since 2001. That standard says that any XML element can become the source or target of a link if it has the universal XLink attributes on it. These attributes are in their own namespace named http://www.w3.org/1999/xlink. Because these attributes are in their own namespace, they do not interfere with any native attributes declared for an element.

An xlink:href attribute value can have several different forms:

  • An attribute such as xlink:href="#intro" refers to an xml:id attribute that exists in the current document. This is similar to the DocBook 4 link and xref elements. The link and xref elements were retained in DocBook 5.

  • An attribute such as xlink:href="http://docbook.org" refers to an arbitrary URL. This is similar to the DocBook 4 ulink element, which was removed in DocBook 5. Instead of ulink, use a link element with a URL in its xlink:href attribute.

  • An olink-style link from any element can be formed using two attributes. If there is a xlink:role="http://docbook.org/xlink/role/olink attribute present, then a link attribute of the form xlink:href="targetdoc#targetptr is interpreted as the two parts of an olink. The olink element itself is retained in DocBook 5. See Chapter 24, Olinking between documents to learn more about DocBook olinks.

At the same time, the familiar DocBook linking attribute linkend has also been added anywhere an XLink can be used. The linkend attribute is limited to linking to an xml:id target within the same document.

The universal linking mechanism enables you to create logical links between any two DocBook elements. However, such logical links may or may not be expressible in formatted output. For example, if you put an xlink:href on an inline element, then the text of the inline element can become clickable link text in the output. However, if you put an xlink:href attribute on a block element such as section, then it is doubtful that making all the text in the section into a clickable link will be useful. The DocBook stylesheets currently only handle xlink:href on inline elements for this reason. If you want to express linking from a block element, you will have to customize the stylesheet to do so, perhaps by putting a clickable icon in the margin.

Table 4.1, “DocBook 5 linking examples” shows the range of linking syntax in DocBook 5. The middle column shows DocBook 4 syntax, and the right columns shows DocBook 5 syntax. In DocBook 5, many links can be done in more than one way.

Table 4.1. DocBook 5 linking examples

TypeDocBook 4 exampleDocBook 5 examples
Internal link with generated text
<xref linkend="preview"/>
<xref linkend="preview"/>
<xref xlink:href="#preview"/>
Internal link with literal text
<link linkend="preview">
previewing</link>
<link linkend="preview">previewing</link>
<link xlink:href="#preview">previewing</link>
Element as internal link
<link linkend="preview"><command>
Preview</command></link>
<command linkend="preview">Preview</command>
<command xlink:href="#preview">Preview</command>
URL link with generated text.
<ulink url="http://docbook.org"/>
<xref xlink:href="http://docbook.org"/>
URL link with literal text
<ulink url="http://docbook.org">
DocBook</ulink>
<link xlink:href="http://docbook.org">
DocBook</link>
Element as URL link
<ulink url="http://docbook.org">
<sgmltag>simplelist</sgmltag>
</ulink>
<tag xlink:href="http://docbook.org">
simplelist</tag>
Olink with generated text
<olink targetdoc="reference"
targetptr="more.1"/>
<olink targetdoc="reference" targetptr="more.1"/>
<olink xlink:href="reference#more.1" 
xlink:role="http://docbook.org/xlink/role/olink"/>
Olink with literal text
<olink targetdoc="reference"
targetptr="more.1">more(1)</olink>
<olink targetdoc="reference" targetptr="more.1">
more(1)</olink>
<olink xlink:href="reference#more.1" 
xlink:role="http://docbook.org/xlink/role/olink">
more</olink>
Element as olink
<olink targetdoc="reference"
targetptr="more.1">
<command>more</command></olink>
<command xlink:href="reference#more.1" 
xlink:role="http://docbook.org/xlink/role/olink">
more</command>

Uniform metadata elements

DocBook 5 introduces two major changes to the handling of metadata:

  • All hierarchical elements and many block elements can have a metadata container.

  • A single info element name is used as the metadata container for all elements.

In DocBook 4, only elements that defined the document hierarchy had a container element for metadata, and each hierarchical element had its own name for its metadata element. For example, book had bookinfo, chapter had chapterinfo, etc. Using separate element names permitted each of them to have a different content model in the DTD, if it was needed.

In DocBook 5, only a single metadata element is needed because it uses RelaxNG as the schema language. RelaxNG permits an element to have a different content model when the element appears in a different context. For example, the DocBook 5 info element in book can contain a title, but the info element in para cannot.

RelaxNG also solves some other problems that DTDs had for managing titles. In DocBook 4, a title element is permitted as a child of chapter, but also as a child of chapterinfo. It was never intended that title elements be used in both locations, because it is not clear which title should be used in the output. But in DTD syntax there was no way to write a content model to prevent that combination. In RelaxNG there is, so you can have only one title on a chapter if you validate with RelaxNG. The title can be in either location, but not both.

DocBook 5 also has consistent placement of the info element relative to a separate title element. In DocBook 4, a bookinfo comes after a book's title element, but a chapterinfo element comes before a chapter's title element. In DocBook 5, the info element always comes after any separate title element and before any other content. Of course, you can always put the title inside the info element so you do not have to remember the order at all.

Annotations

DocBook 5 has a new system for associating annotations with elements. It adds the following two new elements and defines the semantics of associating an annotation with an element.

  • The alt element for a short text description.

  • The annotation element for an arbitrarily complex description.

These are described in more detail in the following sections.

The alt element

The alt element lets you attach a short text description to an element. In HTML, an alt attribute lets you describe an IMG with text. In DocBook 5, the alt element serves a similar function except that it is an element and it can be applied to many elements, not just images. It permits as content only text and inlinemediaobject (which is only included to support characters not in the current font).

An alt element is placed as a child of the element it is describing. So an alt element is always describing its parent element. The following are some examples:

Alt text for a mediaobject
<mediaobject>
  <alt>mouse buttons</alt>
  <imageobject>
    <imagedata fileref="mouse.png"/>
  </imageobject>
</mediaobject>

An equation
<equation>
  <title>Computing energy use</title>
  <alt>Integral of power over time</alt>
  <mediaobject>
    <imageobject>
      <imagedata fileref="power-time.svg"/>
    </imageobject>
  </mediaobject>
</equation>

The text in an alt element may not appear in the output, depending on the application. For example, in HTML output an alt element in a mediaobject will become an alt attribute on an IMG element. But it is not be used at all in PDF output, unless a customization does so.

The annotation element

The annotation element takes over when the alt element is too limited. It is a general purpose element that can be used for a wide variety of annotation semantics. It has these features:

  • An annotation element's content can be any mix of DocBook block elements. Its content model is like section but without any nested sections. So it can contain any number of paragraphs, lists, admonitions, etc. Plain text without a container element is not permitted (use alt for such cases).

  • An annotation is associated with an element using attributes, not by placement, and the association can go in either or both directions. Specifically:

    • An annotates attribute on an annotation element matches the value of the xml:id of the element it is annotating.

    • An annotations attribute on any element matches the value of the xml:id of an annotation element associated with it.

  • The annotation element's annotates attribute accepts multiple space-separated values, so any annotation can be associated with more than one annotated element.

  • An element's annotations attribute accepts multiple space-separated values, so any element can be associated with more than one annotation.

  • Because the association is by attributes, an annotation element can be located close to or far from the element it is annotating. In fact, there is no implied association based on element position, proximity, or lineage, such as parent-child.

  • You can assign a role attribute to an annotation to identify it as a certain kind of annotation. There are no predefined role values.

An example is probably easier to understand than the explanations. The following is an example of an annotation element associated with a chapter element:

<chapter xml:id="setup" annotations="setup-background">
  <title>Setting up the system</title>
  <info>
    <annotation xml:id="setup-background" annotates="setup">
      <title>Background information for setup</title>
      <para>...</para>
    </annotation>
  </info>
  ...

This example shows how the association of an annotation and its target element can be formed in both directions. The chapter's xml:id value is referenced in the annotation's annotates attribute. Likewise, the annotation's xml:id value is referenced in the chapter's annotations attribute. Either direction is sufficient to establish the association. Using both directions makes it easier to find and maintain your annotations.

Placing this annotation element in the chapter's info element is simply a convenience. It could just as well have been placed in the book element's info element, in an appendix element, or anywhere else in the document, and it would have the same association.

To make them more flexible for modular documents, the annotates and annotations attributes are declared as attribute type CDATA. This means they are plain text, and not of attribute type IDREF. If they were of type IDREF, then the elements would have to be in the same file as the associated xml:id attributes to be validated. As CDATA, the annotations can be stored off in a separate file and used as needed. The disadvantage is that they will not be validated by the parser, but then they will also not generate validation errors if they are not stored in the same file. If your application requires annotations to work, then be sure your stylesheets check the integrity of the associations.

The DocBook XSL stylesheets do not output the content of annotation elements by default. That is because the semantics of a particular annotation are defined by the application, not the DocBook schema. You will need to develop a stylesheet customization if you want to include annotation information in your output. See the section “Annotations customization” for an example.