The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Unicode::Wrap - Unicode Line Breaking

SYNOPSIS

  use Unicode::Wrap;

  my $wrapper = new Unicode::Wrap( line_length => 75 );
  my @lines = $wrapper->break_lines($long_string);
  my $text  = $wrapper->wrap("  ", "", $long_string);
  my $text  = $wrapper->rewrap("", "", $text);

  use Unicode::Wrap qw/ break_lines wrap /;
  $Unicode::Wrap::columns = 75;
  my @lines = break_lines($long_string);
  my $text = wrap("  ", "", $long_string);

ABSTRACT

This module implements UAX#14: Line Breaking Properties. It goes through a text string, classifies each character and computes a length for each. When the line gets too long, it's separated. Some Text::Wrap-style functions are also provided to do some simple text wrapping.

DESCRIPTION

The following methods are available:

new(parameters)

This constructs a new wrapping object. Parameters:

line_length

Specifies the length of a line (in whatever units you want to use)

emergency_break

If set, and there are no breaking opportunities before the line_length is reached, an 'emergency' break will be inserted at this position. Generally this should be set to line_length (or 1, since it won't be used until line_length is reached anyway).

If emergency_break is not set, no emergency breaks will be inserted, which could result in some really long lines if no line-breaking opportunity exists.

length_lookup

This should contain a coderef to your own 'length' implementation. It will be passed the character in question and the classification of that character. It should return the length of the character in your chosen unit.

This may also contain a simple hashref, keyed on the character, with values consisting of the length of that character.

In theory, this could be used to estimate the number of pixels each character would consume, using a variable-width font. You could then wrap based on the number of pixels and not just the number of characters.

classify

If you wish to override the module's default classification method, you can either set this to be a hashref of direct mappings, or a coderef, which will be called (@_ = ($self, $code)) to determine the line breaking classification of that character. This function can return undef if you wish to defer to the default classification system for that lookup.

The next may be called either as object methods, or as functions:

break_lines($text, ...)

This will break $text up into individual lines. Newlines are preserved but none will be added. Use this if you need an implementation of UAX#14 that just breaks lines up without re-assembling them into a text string.

wrap($initial_whitespace, $subsequent_whitespace, $text)

This will take a chunk of text, normalize the newlines (but preserve them) and attempt to wrap it per UAX#14 in the style of Text::Wrap. The difference here is that only one chunk of text can be wrapped at a time.

rewrap($initial, $subsequent, $text)

This does the same thing as wrap, except that newlines are normalized to spaces before wrapping. This might be used if you already have a paragraph of text that you want to re-wrap.

classify($character)

Returns the Line Breaking classification of the character passed.

  print classify("a");          # AL
  print $self->classify("5");   # NU, unless $self->{classify} overrides

BUGS

This module can be slow. It's a pure-Perl implementation that goes through an expensive classification process per character.
Combining Marks should "inherit" the breaking properties of the character they're being combined with, so that if a character normally allows a break after, the opportunity needs to be translated to the combining mark, so that the break can occur after the combined result.
Tests are not very complete.

SEE ALSO

http://www.unicode.org/reports/tr14/

Unicode Standard Annex #14: Line Breaking Properties

Text::Wrap, unicode

AUTHOR

David NESTING <david@fastolfe.net>

Copyright (c) 2003 David Nesting. All Rights Reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.