Unicode::Wrap - Unicode Line Breaking
use Unicode::Wrap; $wrapper = new Unicode::Wrap( line_length => 75 ); @lines = $wrapper->break_lines($long_string); use Unicode::Wrap qw/ text_properties lb_class class_properties /; @break_classes = map { lb_class $_ } split //, $long_string; @break_properties = class_properties(@break_classes); @break_properties = text_properties($long_string); @best_breaks = find_breaks($long_string);
This module implements UAX#14: Line Breaking Properties. It goes through a text string, classifies each character and computes a length for each. When the line gets too long, it's separated.
All of the functions described here can be called procedurally or as an object method.
This constructs a new wrapping object. Parameters:
Specifies the length of a line (in whatever units you want to use)
If set, and there are no breaking opportunities before the line_length is reached, an 'emergency' break will be inserted at this position. Generally this should be set to line_length (or 1, since it won't be used until line_length is reached anyway).
If emergency_break is not set, no emergency breaks will be inserted, which could result in some really long lines if no line-breaking opportunity exists.
This will break $text up into individual lines. Newlines are preserved but none will be added. Use this if you need an implementation of UAX#14 that just breaks lines up without re-assembling them into a text string.
$text
If you need finer control over your own line-breaking, there's a few other functions that can be used to obtain character classifications and breaking properties for a set of characters.
Feel free to override some of these functions in descendent classes to fine-tune the behavior of this module. Some classifications and breaking properties require language-specific input and presently that's the only way to provide it.
Returns the Line Breaking classification of the character passed.
print lb_class("a"); # AL print $self->lb_class("5"); # NU
Accepts a list of character classes (e.g. 'AL' or 'NU') and returns an identically-sized array of breaking properties (for the location immediately following the character at that index; no break is permitted at the start of a line). The value of each property is a number from 0 to 3 (with constants defined in the Unicode::Wrap namespace):
0 FORBIDDEN No break is permitted after this position 1 INDIRECT A break is permitted after this position 2 DIRECT A break is permitted after this position 3 REQUIRED A break is required after this position
The values INDIRECT and DIRECT are the same for all intents and purposes, but actually have a subtle difference in that an indirect break is allowed simply because there's a space in that position. A direct break opportunity allows a break under any circumstances. But you don't need to worry about the difference by this point.
Required breaks occur primarily after newlines.
This behaves like class_properties, but instead of working with a list of pre-determined classes, it classifies your $text. It will return a list (one element for each character) representing where breaks can and cannot occur.
This might be the most useful function for someone wanting to build a more intelligent line-wrapping algorithm on top of this. You could scan through the returned list of break opportunities and figure out how you want to do your own wrapping.
This is similar to text_properties, but actually attempts to apply line lengths to find the best breaks for each line. It will return a list of indexes to the start of each new line (minus the first). Use break_lines to go the rest of the way and actually break the string up into lines.
break_lines
Unicode Standard Annex #14: Line Breaking Properties
David NESTING <david@fastolfe.net>
Copyright (c) 2003 David Nesting. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Unicode::Wrap, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unicode::Wrap
CPAN shell
perl -MCPAN -e shell install Unicode::Wrap
For more information on module installation, please visit the detailed CPAN module installation guide.