Chapter 25. Other output forms

Table of Contents

XHTML
Generating XHTML
Validating XHTML
Customizing XHTML
HTML Help
Generating HTML Help
Processing options
Formatting options
Additional resources
JavaHelp
Eclipse Platform help system
Formatted plain text
Refentry to man
Man to refentry
Microsoft Word

The DocBook XSL stylesheets have a flexible design that permit them to be customized for specific applications. These customizations are included with the DocBook XSL distribution or are available from the DocBook SourceForge website at http://sourceforge.net/projects/docbook/.

XHTML

For generating XHTML (Extensible HyperText Markup Language) from DocBook XML files.

HTML Help

For generating HTML files suitable for use in the Microsoft HTML Help online documentation system.

JavaHelp

For generating HTML files suitable for use in Sun Microsystem's JavaHelp™ online documentation system.

Eclipse Platform help system

For generating HTML files suitable for use in the Eclipse Platform help system.

Formatted plain text

A method, not a stylesheet, for generating formatted plain text output.

Unix man pages

For generating manual pages in troff markup from refentry elements.

Microsoft Word

For generating a WordML that can be loaded into Microsoft Word. You can also create Word files that can be converted DocBook.

XHTML

XHTML is HTML reformulated in XML. XHTML is a W3C Recommendation, and is described in XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition). One of the goals of the reformulation was to move all formatting out of the HTML and into the CSS stylesheet, thus separating content from style. But the authors also recognized the need to transition from existing HTML, so there are actually three DTDs for XHTML 1.0:

Strict (xhtml1-strict.dtd)

The strictest version of XHTML, with no formatting elements or attributes.

Transitional (xhtml1-transitional.dtd)

Retains most HTML formatting elements of HTML 4.

Frameset (xhtml1-frameset.dtd)

Same as Transitional buts adds HTML frameset elements.

XHTML uses XML syntax, so it differs from HTML. The original HTML was written in SGML, and uses features that are only in SGML and not XML:

  • Omission of end tags is permitted in HTML, so you will often see a <P> tag without its closing </P> tag. Such omissions are not allowed in XML, so all start tags in XHTML must have a closing tag (except empty elements).

  • Empty elements in HTML do not use the trailing slash character. Thus HTML uses <HR> while XHTML uses <HR/>, or sometimes <HR />. The extra space is often included for backwards compatibility with browsers that are not XHTML-aware.

There are several other differences between HTML and Transitional XHTML. The W3C included HTML Compatibility Guidelines in the XHTML specification to help create XHTML that is compatible with existing HTML browsers.

Strict XHTML is very different from HTML. It permits no elements or attributes that are intended for formatting, on the assumption that a CSS stylesheet will be handling all formatting. So attributes such as type on an ol list are not permitted (indicates number format for numbered list), or width on a table cell.

Generating XHTML

The DocBook XSL stylesheet distribution includes a set of stylesheets that generate XHTML. These stylesheets are in the xhtml subdirectory of the distribution, and include versions of docbook.xsl for single-file output and chunk.xsl for chunked output. These stylesheets are derived from the HTML stylesheets, so they have all the same features and parameters.

Keep in mind that not all browsers support XHTML. Some people are still using web browers that predate XHTML. If you want the highest number of people to be able to read your output without problems, then you will have to use HTML for awhile yet.

XHTML using xsltproc

To generate XHTML output using xsltproc, you can use commands such as these:

Single file XHTML:
xsltproc  \
     --output  myfile.xhtml  \
    xhtml/docbook.xsl  myfile.xml

Chunked XHTML:
xsltproc  \
    --stringparam chunker.output.doctype-public \   For versions 1.61 and earlier
               "-//W3C//DTD XHTML 1.0 Transitional//EN" \
    --stringparam chunker.output.doctype-system \   For versions 1.61 and earlier
               "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" \
    xhtml/chunk.xsl  myfile.xml

Prior to version 1.62 of the stylesheets, the chunking stylesheet did not output a DOCTYPE declaration, so you had to specify the extra parameters as shown here. Since version 1.62, the Transitional DOCTYPE is output automatically so you do not need those parameters.

If you examine the output, you will notice some differences from the HTML version of the output:

  • The top of the output file has an XML declaration and a reference to the XHTML DTD in its DOCTYPE:

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    
  • The output encoding is UTF-8, the default XML encoding.

  • The head element includes an XHTML namespace declaration:

    <html  xmlns="http://www.w3.org/1999/xhtml">

    Since this declaration appears in the document's root element, its scope is the whole document. The declaration does not include a namespace prefix, which means it is the default namespace, so all tags without a prefix are in that namespace. That permits content in other namespaces (such as MathML) to be mixed in as long as its elements use their own namespace prefix.

  • The body element has no attributes, which are not permitted in XHTML Strict.

  • Anchor name attributes are replaced with id attributes.

The nice thing about using xsltproc is that it detects that the DOCTYPE is XHTML and adjusts the serialization of the output so it follows most of the HTML compatibility guidelines. This enables more browsers to be able the read the XHTML.

XHTML using Saxon

To generate XHTML using Saxon, you use commands similar to those of xsltproc. However, Saxon does not automatically detect that it is outputting XHTML. Fortunately, you can use a Saxon extension that causes Saxon to adjust its output to satisfy the HTML compatibility guidelines. You must set up the extension in a customization layer. Here are the steps.

  1. Create a customization layer for XHTML processing. It is like other DocBook customization files, except in the xsl:import statement you import one of the XHTML stylesheets, xhtml/docbook.xsl or xhtml/chunk.xsl.

  2. Make the following changes, highlighted in boldface, to the customization layer:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
            xmlns:saxon="http://icl.com/saxon"
            version="1.0">
    <xsl:import href="../docbook-xsl/xhtml/chunk.xsl"/>
    <xsl:output method="saxon:xhtml" />
    

    These changes add the saxon namespace to the stylesheet's root element, and then use that namespace to declare a Saxon extension output method.

  3. Use a standard Saxon command to process your documents with your customization layer:

    Single file XHTML:
    java  com.icl.saxon.StyleSheet  \
        -o myfile.html  myfile.xml  \
        custom-xhtml-docbook.xsl
    
    Chunked XHTML:
    java  com.icl.saxon.StyleSheet  \
        myfile.xml  \
        custom-xhtml-chunk.xsl 
    

    Notice that you do not have to use the chunker.output.doctype parameters to get the XHTML DOCTYPE in chunked output. Saxon does that automatically. You can, however, add other stylesheet parameters as needed.

Generating Strict XHTML

By default, the DocBook XSL stylesheets generate Transitional XHTML. There is no option or parameter for turning on Strict XHTML processing. By following certain Strict XHTML guidelines, you can produce output that would validate with the Strict DTD. However, the DOCTYPE declaration in the output will still refer to the Transitional XHTML DTD. Here is how you insert a reference to the Strict XHTML DTD instead.

If you are using the docbook.xsl stylesheet , then you need a stylesheet customization layer to change the xsl:output element to specify a different DTD. This is how it should appear in your customization file:

<xsl:output method="xml" 
    encoding="UTF-8" 
    indent="no" 
    doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" 
    doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>

If you are using the chunk.xsl stylesheet, you can change it with two parameters. You can set the parameters in a customization layer, or on the command line as shown here:

xsltproc  \
     --stringparam chunker.output.doctype-public \
           "-//W3C//DTD XHTML 1.0 Strict//EN" \
     --stringparam chunker.output.doctype-system \
           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" \
     xhtml/chunk.xsl  myfile.xml

You can also use these parameters for single file output if you use the xhtml/onechunk.xsl stylesheet instead of xhtml/docbook.xsl. See the section “Single file options with onechunk” for more information on using onechunk.xsl.

Validating XHTML

The DocBook XSL stylesheets are capable of producing XHTML that can be validated with either the Transitional or Strict XHTML 1.0 DTDs (the Frameset DTD is not considered because DocBook does not output frameset elements). But it is also possible to produce XHTML that will not validate with either of them.

If your goal is to produce valid XHTML, you need to keep some guidelines in mind when creating your DocBook XML source files and selecting your processing options.

Transitional XHTML guidelines

By default, the DocBook XSL xhtml stylesheets output Transitional XHTML. However, there are still some things you need to do to satisfy the requirements of the DTD.

  • In XHTML, all img elements are required to have an alt attribute. See the section “Alt text” for more information on outputting alt attributes.

Strict XHTML guidelines

To validate with the Strict XHTML DTD, you need to follow the Transitional XHTML guidelines as well as the following Strict guidelines:

  • Process your documents with the css.decoration parameter set to zero. That will avoid the use of style attributes in XHTML elements where they are not permitted.

  • Your images in mediaobject elements cannot make use of a viewport. To create a viewport in HTML, the stylesheets use a table and set the row height. Unfortunately, a height attribute is not permitted on an XHTML Strict tr element.

  • In your imagedata elements, avoid the use of alignment attributes align and valign.

  • If you have textobjects that will generate longdesc attributes in images, then you need to turn that feature off because it includes an alignment attribute in the div element. You can turn it of by setting the html.longdesc parameter to zero.

  • Set parameter ulink.target to null because the target attribute is not permitted on a anchor elements.

  • Table data element td does not take a width attribute.

  • ol does not take a type attribute.

  • Set parameter use.viewport to zero so that img does not get a border="0" attribute.

  • Footnotes generate an hr element with align="left" and width="100%", neither of which are permitted.

Customizing XHTML

The XHTML stylesheets can be customized in a manner very similar to the HTML stylesheets. You create a customization layer that imports the XHTML stylesheet, and then sets parameters and customizes templates. The following example shows how to get started.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
        version='1.0'>

<xsl:import href="/path/to/docbook-xsl/xhtml/docbook.xsl"/>

Your customizations go here.

</xsl:stylesheet>

Keep these differences in mind:

  • The output method is XML, not HTML.

    <xsl:output  method="xml"/>
  • All literal output elements need to be in the XHTML namespace. The easiest way to do that is to make it the default namespace for your stylesheet customization layer. You can do that by including the namespace attribute in the root element of your stylesheet customization file:

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                    xmlns="http://www.w3.org/1999/xhtml"
                    version="1.0">
    

    If you do not include the namespace attribute in the stylesheet, then you will get namespace attributes in your XHTML output as an element switches out of the default XHTML namespace.