Former Home of html2text

Related pages:

html2text is a command line utility, written in C++, that converts HTML documents into plain text.

Each HTML document is loaded from a location indicated by a URI or read from standard input, and formatted into a stream of plain text characters that is written to standard output or into an output-file. The input-URI may specify a remote site, from that the documents are loaded via the Hypertext Transfer Protocol (HTTP).

The program is able to preserve the original positions of table fields, allows you to set the screen width (to a given number of output characters), and accepts also syntactically incorrect input (attempting to interpret it "reasonably"). Boldface and underlined text is rendered by default with backspace sequences (which is particulary usefull when piping the program's output into "less" or an other pager). All rendering properties can largely be costomised trough an RC-file.

html2text was written up to version 1.2.2 by Arno Unkrig for GMRS. In 2016, he has re-developed the tool in Java. Please refer to Html2txt for the Java implementation.

html2text is now maintained by Fabian Groffen. Please refer to GitHub for the most recent sources of the program (starting with version 2.0), or for any contributions or inquiries.