Formatted plain text

Sometimes it is useful to be able to generate output that is simply plain text. For example, you may need plain text for a README file. It would be fairly simple to delete all the markup tags from a DocBook document, but the result would not be very satisfactory. Paragraphs might be readable, but tables would not be. Also, there would be no generated text such as number labels and xref text.

What you want is text that is processed by a DocBook stylesheet and formatted sufficiently to be meaningful. Unfortunately, there is no DocBook XSL stylesheet dedicated to generating formatted plain text. Most people who need formatted text use a two-step process:

  1. Process the DocBook into HTML output.

  2. Use a text-based web browser to convert the HTML to formatted plain text.

There are at least three nongraphical text-based web browsers that you can choose from to format HTML as plain text:

Lynx

The original text-based web browser, still used by many people. The latest version handles simple tables. Lynx is available from http://lynx.browser.org/ for most platforms. You can use its -dump option to save the formatted text to a file:

lynx -dump myfile.html > myfile.txt
ELinks

An enhanced version of the Links (no relation to Lynx) character browser. It handles tables better than Lynx. ELinks is available from http://www.elinks.or.cz/. It also has a -dump option:

elinks -dump myfile.html > myfile.txt
W3M

A text-based browser developed in Japan that can handle tables. It is available from http://w3m.sourceforge.net/. W3m also has a -dump option:

w3m -dump myfile.html > myfile.txt

Conversion of HTML generated by DocBook works quite well with these browsers. That's because the HTML is pretty clean. DocBook's HTML output does not use frames, layout tables or Javascript, all features that are hard for text-only browsers to handle. Any CSS styling you apply will be lost, of course.