HTML::SyntaxHighlighter - a module for converting raw HTML into html-escaped, highlighted code; suitable for inclusion within a web page.



 my $p = HTML::SyntaxHighlighter->new();
 $p->parse_file( "$file" ) or die "Cannot open '$file': $!"...

From within HTML::Mason

 <& /lib/header.m, title => "Formatted source code for '$file'", stylesheet => [ 'html_highlight.css' ] &>

  my $path = "/usr/data/www/";
  my $p = HTML::SyntaxHighlighter->new(
                                       out_func => sub{ $m->out( @_ ) },
                                       header => 0,

  $p->parse_file( "$path/$file" ) or die "Cannot open '$path/$file': $!";

 <& /lib/footer.m &>

  use HTML::SyntaxHighlighter;



This module is designed to take raw HTML code, either from a variable or a file, html-escape it and highlight it (using stylesheets), rendering it suitable for inclusion in a web page. It is build on top of HTML::Parser.

It is intended primarily for people wanting to include 'example HTML code' in an dynamically generated web page (be it created with CGI, HTML::Mason, or whatever); if you find other uses, please let me know.


Options can either be set from the constructor:

 my $p = HTML::SyntaxHighlighter->new(
                                      default_type => 'xhtml'
                                      force_type => 1,

Or by calling method with the same name:

 $p->debug( 1 );

The output function. Can be one of the following:

A coderef

The function is called whenever output is generated.

 $p->out_func( sub { $r->print( @_ ) } );
A filehandle globref

Output is redirected to the filehandle.

 $p->out_func( \*DATAFILE );
A scalar ref

Output is saved to the scalar variable.

 $p->out_func( \$data );

The default value is '\*STDOUT'.


If this option is turned on, then inline tags and text will be collapsed onto a single line; only block-level elements and table rows being indented as normal. This should probably only be used on small html snippets, since it has not been extensively tested against large ones, and I'd be surprised if it stood up well to handling complex or less-than-perfect code.

If this option is turned off, then only tags between '<body>' and '</body>' will be outputted.


Determines whether we expect documents to be html or xhtml, which affects parsing slightly. Default is 'html'.


Normally, the doctype declaration will override default_type. If this option is set, then default_type will be used in all cases.


Turns on debugging mode, which marks out sections of erroneous code, and attempt to correct some basic errors (e.g. not closing '<p>' tags).


The string to be used to generate line breaks in the output. Default value is '<br />'.


Pretty much all of the other methods you will use are inherited from HTML::Parser.

Included are slightly adapted docs for the two most commonly used methods.

parse_file( $file )

Take code to be highlighted directly from a file. The $file argument can be a filename, an open file handle, or a reference to a an open file handle. If $file contains a filename and the file can't be opened, then themethod returns an undefined value and $! tells why it failed. Otherwise the return value is a reference to the syntaxhighlighter object.

parse( $string )

Parse $string as the next chunk of the HTML document. The return value is normally a reference to the syntaxhighlighter object.


The module only generates the HTML. You will also require a stylesheet, which must either be included in or linked from your html file. One is included with this module ('examples/html_highlight.css'), which gives roughly the same colours as xemacs' html-mode does by default.

If you decide to make your own stylesheet, you will need definitions for the following:


The document type declaration.


Html, head and body tags.


Block-level elements; e.g. p, table, ol.


Inline elements; e.g. b, i, tt.


Tag attributes.


Plain text.


Text within 'script' and 'style' tags.


HTML comments.


Errors; only appear when 'debug' mode is on.


Alex Bowley <>