The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Ecma48::Util - A selection of subroutines supporting ANSI escape sequence handling

SYNOPSIS

    use Ecma48::Util qw(remove_seqs move_seqs_before_lastnl ... quotectrl);

    my $nude=quotectrl remove_bs_bolding remove_seqs remove_fillchars $decorated;

DESCRIPTION

Ecma48::Util contains a selection of subroutines which allow the handling of Ecma-48 based markup sequences - better known as ANSI escape sequences.

It helps to separate string handling from decorating. If you can't change the order of processing and you are forced to do your string handling after the decoration is already in effect, then you can find some adequate utility functions here.

USE CASES

Do you like colors in your terminal? And a nice guy has written a plugin to bring in the color - maybe with the help of Term::ANSIColor? Unfortunately, now things like chomp and testing if a string is empty do start to fail? Then this module is worth a look.

FUNCTIONS

By default Ecma48::Util does not export any subroutines. The subroutines defined are

remove_seqs STRING

remove_seqs returns a string where well-formed Ecma48 sequences from STRING are deleted.

    $foo = remove_seqs "color\e[34;1mful\e[m example"; # colorful example

Keep in mind that this is not the right tool for secure disarmament. Not all terminal sequences are well-formed and most terminals also accept sequences with some errors. See quotectrl.

split_seqs STRING

split_seqs splits string and returns a list where escape sequences are marked by being scalar references.

    @foo = split_seqs "color\e[34;1mful\e[m example";
    # ( 'color', \"\e[34;1m", 'ful', \"\e[m", ' example' )
ensure_terminating_nl STRING

Does a newline exist at the end of the visible part? If not ensure_terminating_nl adds one.

    $foo = ensure_terminating_nl "color\e[34;1mful\e[m";   # add \n
    $foo = ensure_terminating_nl "color\e[34;1mful\n\e[m"; # as is
    $foo = ensure_terminating_nl "color\e[34;1mful\e[m\n"; # as is
remove_terminating_nl STRING

Similar to ensure_terminating_nl but instead of making the string terminate with newline, it makes the string open ended without a newline at the end.

    $foo = remove_terminating_nl "color\e[34;1mful\e[m";   # as is
    $foo = remove_terminating_nl "color\e[34;1mful\n\e[m"; # as in previous example
    $foo = remove_terminating_nl "color\e[34;1mful\e[m\n"; # ditto
move_seqs_before_lastnl STRING

Makes your STRING chomp-friendly.

    $foo = move_seqs_before_lastnl "color\e[34;1mful\n\e[m";
    # "color\e[34;1mful\e[m\n"
quote_ctrl STRING

Replaces control characters with a visible representation. Traditional linebreaks (\n, \r\n) are reasonable exceptions. quotectrl is an alias of quote_ctrl. When local $Ecma48::Util::PREFER_UNICODE_SYMBOLS=1 is set, control chars from C0 (\00..\x1F) and DEL (\x7F) are displayed with their unicode symbol e.g. \x{241B}= ␛.

    $foo = quotectrl "color\e[34;1mful\n\e[m";
    # "color\\e[34;1mful\n\\e[m"
    local $Ecma48::Util::PREFER_UNICODE_SYMBOLS=1;
    $foo = quotectrl "color\e[34;1mful\n\e[m";
    # "color\x{241B}[34;1mful\n\x{241B}[m"
quote_nongraph STRING

Like quote_ctrl, except for all non printable characters. The decision is based on [[:graph:]] regex class, and so depends on settings of the locale pragma and the unicode_strings feature.

ctrl_chars LIST

ctrl_chars returns the requested control characters or introducers. LIST can consist of names, the char codes or the actual control characters. Beside the coded char the eventually existing 7-bit equivalent is also returned. In scalar context it returns a regex catching all requested sequence intros including their alternatives.

    @foo = ctrl_chars 'CSI'; # "\x9b", "\e\["
    $foo = ctrl_chars 'CSI'; # as qr/\x9b|\e\[/

Multiple control characters can be given to ctrl_chars as separated parameters.

seq_regex

seq_regex returns a regex which catch Ecma-48 sequences.

remove_bs_bolding STRING

In the old days you could simulate bold printing with BackSpace (\cH) and overstrike with the same character. Some Terminals of the 7-bit era simulate this behavior of that kind of printer.

    $foo = remove_bs_bolding "A\cHA\cHAB\cHB\cHCD\cHD";        # "AB\cHCD"
    $foo = remove_bs_bolding "This was b\cHbo\cHol\cHld\cHd."; # "This was bold."

BS as combiner is defined in Ecma-6 and in Ecma-43 it is mentioned that this should not be used in 8-bit environments. It is not part of Ecma-48. However if you have to deal with terminal sequences, you may also have to handle such issues.

replace_bs_bolding STRING, [PRE, [POST], [INTER]]

Like remove_bs_bolding but allows you to mark the bold substrings in other ways. Default is bright/bold mode.

    $foo = replace_bs_bolding "This is b\cHbo\cHol\cHld\cHd.";
    # "This is \e[1mbold\e[22m."
    $foo = replace_bs_bolding "This is b\cHbo\cHol\cHld\cHd.",'*';
    # "This is *bold*."
    $foo = replace_bs_bolding "This is b\cHbo\cHol\cHld\cHd.",1,0;
    # "This is \e[1mbold\e[0m."
    $foo = replace_bs_bolding "This is b\cHbo\cHol\cHld\cHd.",'','','_';
    # "This is b_o_l_d."

If you specify PRE but not POST this function tries to guess the closing sequence.

closing_seq STRING

Tries to find the sequence which resets back again what STRING had changed.

    $foo = closing_seq "\e[2m";    # "\e[22m"
    $foo = closing_seq "\e[3h";    # "\e[3l"

Of course this is only an approximation, because no strict 1:1 mapping exists. This function is also used internally by replace_bs_bolding.

As a surplus it find counterparts for braces and so on.

    $foo = closing_seq '{[(';      # ')]}'
    $foo = closing_seq '.oO ';     # ' Oo.'
    $foo = closing_seq '==>>';     # '<<=='
    $foo = closing_seq '_*/';      # '/*_'
    $foo = closing_seq "\x{25C4}"; # "\x{25BA}"
    $foo = closing_seq "\x{2767}"; # "\x{2619}"

\x{25C4}= ◄, \x{25BA}= ►, \x{2767}= ❧, \x{2619}= ☙

remove_fillchars STRING

remove_fillchars removes NUL (\00) and DEL (\x7F) characters. Also CRs (\r) which are placed directly for other CRs, because CR is idempotent.

IMPORT TAGS

:all exports all functions, and :var exports $PREFER_UNICODE_SYMBOLS.

CAVEATS

Mixed 7-bit/8-bit work-flow

This module does not entirely honor the extension to handle Ecma-35 artefacts in 7-bit/8-bit transformation processes. If you have to work under such strange circumstances, try to use this module before such stuff came into effect.

Escape sequences outside the Ecma48 universe

Some terminal commands violate/infringe the schema, and are not matched by these routines.

Different handling compared to terminal (emulators)

Most terminals execute ill-formed codes after applying some error correction. But these sequences are ignored by this module and are returned as-is.

Fill-chars inside escape sequences

The standard is unclear in this respect. Anyways, nowadays it shouldn't be an issue. However an own function remove_fillchar exists for preparation.

KNOWN BUGS

Returns wrong results under character sets such as EBCDIC.

SEE ALSO

Ecma-48, ISO 6429, ANSI X3.64, A List of many Escape Sequences

LOOSELY RELATED

Term::ANSIColor, Win32::Console::ANSI

COPYRIGHT

(c) 2012 Josef. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.