The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Syntax::SourceHighlight - Perl Binding to GNU Source Highlight

SYNOPSIS

    use Syntax::SourceHighlight;

    my $hl = Syntax::SourceHighlight->new('esc.outlang');
    my $lm = Syntax::SourceHighlight::LangMap->new();

    print $hl->highlightString(
        "my \$_ = 42;\n",
        $lm->getMappedFileName('perl')
    );

DESCRIPTION

GNU Source Highlight is a library to format code written in many programming languages as text in several markup languages. This binding to the underlying C++ library is very basic, supporting only the essential functionality.

USAGE

The Perl library exports part of the libsource-highlight API as is therefore any functionality details may be consulted with the API manual. There are some deviations though:

  • only the three-argument highlight() method is available; the stream-oriented variant is not yet implemented,

  • the additional "highlightString()" allows for operating on strings rather than files,

  • the constructor LangMap->new() may be invoked with no parameters at all; 'lang.map' will be used as the default language map file.

All symbols that are exported retain the original C++ camel caps naming convention. Methods are accessible from Perl blessed hashrefs. Any attributes are mapped to hash values. Any exceptions thrown by the library are passed back to Perl with the equivalent of the die statement. The srchilite namespace is mapped to Perl's Syntax::SourceHighlight:: except for the main class, srchilite::SourceHighlight, which can be used directly as Syntax::SourceHighlight->new(). Its fully qualified equivalent also exists for both completeness and compatibility with the older versions of the package.

The argument to the boolean set*() series of functions default to true regardless of the initial default value of the variable they address.

CLASSES

Syntax::SourceHighlight

This class is the counterpart of the srchilite::SourceHighlight library class. Most of the methods are exported. This class does not have any public attributes.

new()

    my $hl = Syntax::SourceHighlight->new($output_format)

Creates a new source highlighting control object that formats code using the specified output language. It accepts one optional argument, the name of the output definition file. The default is 'html.outlang'.

The output language is a file name resolved relative to the data directory of the control object. The default data directory depends on the compilation time setup of the underlying library.

highlight()

    $hl->highlight( $input_file_name, $output_file_name, $input_language )

Highlights the contents of the input file into the output file, using the specified input language definition. If any of the input or output file names are empty strings standard input or output will be used respectively.

Again the input language definition file is resolved relative to the data directory.

The four argument variant of this method that uses IO streams has not been implemented yet.

highlightString()

    my $str = $hl->highlightString( $input, $input_language, $input_file_name )

Highlights the contents of the input string using the specified input language definition. The output is again returned as a string. The optional third argument sets the “filename” that can be used by output templates.

This method is an extension of the original library.

setHighlightEventListener()

    $hl->setHighlightEventListener(
        sub {
            my $evt = shift;
            ...
        }
    )

A callback to be invoked on each highlighting event. It should accept one argument – an object of the class "Syntax::SourceHighlight::HighlightEvent":

The highlighting event objects passed to the callback are roots of object graphs valid only during the dynamic scope of the callback execution.

checkLangDef()

    $hl->checkLangDef($input_language)

Checks the validity of the language definition file. An exception is thrown if the language definition is invalid. Otherwise, this method returns no result.

checkOutLangDef()

    $hl->checkOutLangDef($output_language)

Checks the validity of the output definition file. Exception is thrown if the definition is invalid.

createOutputFileName()

    $hl->createOutputFileName($input_file_name)

Given the input file name creates an output file name.

setDataDir()

    $hl->setDataDir($data_directory_name)

Sets an alternative directory where the definition files are. The default is compiled into the library.

setStyleFile()

    $hl->setStyleFile($style_file_name)

The definition file containing format options. The default is default.style.

setStyleCssFile()

    $hl->setStyleCssFile($style_file_name)

The CSS style file.

setStyleDefaultFile()

    $hl->setStyleDefaultFile($style_file_name)

The style defaults file.

setTitle()

    $hl->setTitle($title)

The title of the output document. Defaults to the source file name.

setCss()

    $hl->setCss($css_file)

Path to an external CSS file.

setHeaderFileName()

    $hl->setHeaderFileName($header_file_name)

The file name of the header.

setFooterFileName()

    $hl->setFooterFileName($footer_file_name)

The file name of the footer.

setOutputDir()

    $hl->setOutputDir($output_directory_name)

The directory for output files.

setOptimize()

    $hl->setOptimize($flag)

Whether to optimize output. For example, adjacent text parts belonging to the same element will be buffered and generated as a single text part. The optional $flag parameter defaults to true.

setGenerateLineNumbers()

    $hl->setGenerateLineNumbers($flag)

Whether to generate line numbers. The optional $flag parameter defaults to true.

setGenerateLineNumberRefs()

    $hl->setGenerateLineNumberRefs($flag)

Whether to generate line numbers with references. The optional $flag parameter defaults to true.

setLineNumberPad()

    $hl->setLineNumberPad($character)

The line number padding char. Defaults to '0'.

setLineNumberAnchorPrefix()

    $hl->setLineNumberAnchorPrefix($prefix)

The prefix for the line number anchors.

setGenerateEntireDoc()

    $hl->setGenerateEntireDoc($flag)

Whether to generate an entire document. The initial state is no. The optional $flag parameter defaults to true.

setGenerateVersion()

    $hl->setGenerateVersion($flag)

Whether to generate the program version in the output file and initially it is set to yes. The optional $flag parameter defaults to true.

setCanUseStdOut()

    $hl->setCanUseStdOut($flag)

Whether the standard output can be used for output. This is true by default. The optional $flag parameter defaults to true.

setBinaryOutput()

    $hl->setBinaryOutput($flag)

Whether to open output files in binary mode. Defaults to false. The optional $flag parameter defaults to true.

setRangeSeparator()

    $hl->setRangeSeparator($separator)

The optional separator to be printed between ranges such as “..”.

setTabSpaces()

    $hl->setTabSpaces($number)

Sets the tab width. The value 0 disables replacing tabs with spaces and this is the initial setting.

Syntax::SourceHighlight::LangMap

new()

    my $lm = Syntax::SourceHighlight::LangMap->new($language_map)

or

    my $lm = Syntax::SourceHighlight::LangMap->new(
        $data_directory, $language_map
    )

Creates a new language map using the given name and data directory. A language map can be used to determine the correct input language file name for a source file name or a language name.

The language map name is a file name resolved relative to the data directory. The default value is 'lang.map' if new() is invoked with no arguments at all. The default data directory is compiled into the C++ library.

The zero-argument variant of this constructor is an extension of the original library.

getMappedFileName()

    $lm->getMappedFileName($language)

Determines a suitable input language name by using the map file. It contains some of the lower case names of the languages or interpreters as well as common file suffixes. If no known input language definition is found, the method returns an empty string.

getMappedFileNameFromFileName()

    $lm->getMappedFileNameFromFileName($file_name)

Determines a suitable input language name for the given source file name. If no known input language definition is found, the method returns the empty string.

Note that the default language map shipped with recent versions of the Source Highlight library maps the file name suffix .pl to Prolog, not Perl.

getLangNames()

    $lm->getLangNames()

An array reference containing all known human-readable language names known to the language map.

getMappedFileNames()

    $lm->getMappedFileNames()

An array reference containing all known file names of language definitions known to the language map.

Syntax::SourceHighlight::HighlightEvent

There is no Perl constructor for this object as it is normally created by the library and passed to the callback set with "setHighlightEventListener()".

It has two attributes:

type

The type of the event. The value is equal to one of the following constants:

  • $Syntax::SourceHighlight::HighlightEvent::FORMAT

  • $Syntax::SourceHighlight::HighlightEvent::FORMATDEFAULT

  • $Syntax::SourceHighlight::HighlightEvent::ENTERSTATE

  • $Syntax::SourceHighlight::HighlightEvent::EXITSTATE

token

The token of source text corresponding to the event represented by the "Syntax::SourceHighlight::HighlightToken" the class.

Syntax::SourceHighlight::HighlightToken

There is no Perl constructor for this object as it is normally created by the library and passed to the callback set with "setHighlightEventListener()". This object class represents part of the text being formatted and its highlighting pattern definition.

The following attributes are defined in this class:

prefix

A possible part of source text before the matched string.

prefixOnlySpaces

True if the prefix is empty or consists only of whitespace characters.

suffix

A possible part of source text after the matched string.

matchedSize

The length of the whole matched data.

matched

An array reference containing strings of the form element name:source text. The element name depends on the source language definition and usually classifies the type of source text, for example, whether it is a variable name or a keyword.

EXAMPLES

The following script takes file names from command line parameters and prints the output to the terminal with ANSI escape codes.

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use Syntax::SourceHighlight;
    
    my $hl = Syntax::SourceHighlight->new('esc.outlang');
    my $lm = Syntax::SourceHighlight::LangMap->new();
    
    foreach (@ARGV) {
        my $lang = $lm->getMappedFileNameFromFileName($_);
        unless ($lang) {
            warn "Cannot determine file format for '$_'.\n";
            next;
        }
        $hl->highlightFile( $_, '', $lang );
    }

The next example enhances the previous script with an event listener that counts the number of objects found in the file. It prints the summary at the end.

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use Syntax::SourceHighlight;
    
    my $hl = Syntax::SourceHighlight->new('esc.outlang');
    my $lm = Syntax::SourceHighlight::LangMap->new();
    
    my %tokens;
    $hl->setHighlightEventListener(
        sub {
            my $he = shift;
            foreach ( @{ $he->{token}->{matched} } ) {
                next unless m/^(.*?):/s;
                $tokens{$1}++;
            }
        }
    );
    
    foreach (@ARGV) {
        %tokens = ();
        my $lang = $lm->getMappedFileNameFromFileName($_);
        unless ($lang) {
            warn "Cannot determine file format for '$_'.\n";
            next;
        }
        $hl->highlightFile( $_, '', $lang );
        next unless keys %tokens;
        print(
            "\nFound: ",
            join( ', ', map { "$tokens{$_} ${_}s" } sort keys %tokens ),
            "\n\n"
        );
    }

SEE ALSO

The homepage of the original library is at https://www.gnu.org/software/src-highlite/.

AUTHORS

COPYRIGHT AND LICENSE

Copyright © 2010 by Thomas Chust

This binding is in the Public Domain.