The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Text::KnuthPlass - Breaks paragraphs into lines using the TeX algorithm

SYNOPSIS

    use Text::KnuthPlass;
    my $typesetter = Text::KnuthPlass->new();
    my @lines = $typesetter->typeset($paragraph);
    ...

To use with plain text:

    for (@lines) {
        for (@{$_->{nodes}}) {
            if ($_->isa("Text::KnuthPlass::Box")) { print $_->value }
            elsif ($_->isa("Text::KnuthPlass::Glue")) { print " " }
        }
        if ($_->{nodes}[-1]->is_penalty) { print "-" }
        print "\n";
    }

To use with PDF::API2:

    my $text = $page->text;
    $text->font($font, 12);
    $text->lead(13.5);

    my $t = Text::KnuthPlass->new(
        measure => sub { $text->advancewidth(shift) }, 
        linelengths => [235]
    );
    my @lines = $t->typeset($paragraph);

    my $y = 500;
    for my $line (@lines) {
        $x = 50; 
        for my $node (@{$line->{nodes}}) {
            $text->translate($x,$y);
            if ($node->isa("Text::KnuthPlass::Box")) {
                $text->text($node->value);
                $x += $node->width;
            } elsif ($node->isa("Text::KnuthPlass::Glue")) {
                $x += $node->width + $line->{ratio} *
                    ($line->{ratio} < 0 ? $node->shrink : $node->stretch);
            }
        }
        if ($line->{nodes}[-1]->is_penalty) { $text->text("-") }
        $y -= $text->lead();
    }

METHODS

new

The constructor takes a number of options. The most important ones are:

measure

A subroutine reference to determine the width of a piece of text. This defaults to length(shift), which is what you want if you're typesetting plain monospaced text. You will need to change this to plug into your font metrics if you're doing something graphical.

linelengths

This is an array of line lengths. For instance, [30,40,50] will typeset a triangle-shaped piece of text with three lines. What if the text spills over to more than three lines? In that case, the final value in the array is used for all further lines. So to typeset an ordinary block-shaped column of text, you only need specify an array with one value: the default is [78] .

tolerance

How much leeway we have in leaving wider spaces than the algorithm would prefer.

hyphenator

An object which hyphenates words. If you have Text::Hyphen installed (highly recommended) then a Text::Hyphen object is instantiated by default; if not, an object of the class Text::KnuthPlass::DummyHyphenator is instantiated - this simply finds no hyphenation points at all. So to turn hyphenation off, set

    hyphenator => Text::KnuthPlass::DummyHyphenator->new()

To typeset non-English text, pass in an object which responds to the hyphenate method, returning a list of hyphen positions. (See Text::Hyphen for the interface.)

There are other options for fine-tuning the output. If you know your way around TeX, dig into the source to find out what they are.

typeset

This is the main interface to the algorithm, made up of the constituent parts below. It takes a paragraph of text and returns a list of lines if suitable breakpoints could be found.

The list has the following structure:

    (
        { nodes => \@nodes, ratio => $ratio },
        { nodes => \@nodes, ratio => $ratio },
        ...
    )

The node list in each element will be a list of objects. Each object will be either Text::KnuthPlass::Box, Text::KnuthPlass::Glue or Text::KnuthPlass::Penalty. See below for more on these.

The ratio is the amount of stretch or shrink which should be applied to each glue element in this line. The corrected width of each glue node should be:

    $node->width + $line->{ratio} *
        ($line->{ratio} < 0 ? $node->shrink : $node->stretch);

Each box, glue or penalty node has a width attribute. Boxes have values, which are the text which went into them; glue has stretch and shrink to determine how much it should vary in width. That should be all you need for basic typesetting; for more, see the source, and see the original Knuth-Plass paper in "Digital Typography".

This method is a thin wrapper around the three methods below.

break_text_into_nodes

This turns a paragraph into a list of box/glue/penalty nodes. It's fairly basic, and designed to be overloaded. It should also support multiple justification styles (centering, ragged right, etc.) but this will come in a future release; right now, it just does full justification.

If you are doing clever typography or using non-Western languages you may find that you will want to break text into nodes yourself, and pass the list of nodes to the methods below, instead of using this method.

break

This implements the main body of the algorithm; it turns a list of nodes (produced from the above method) into a list of breakpoint objects.

breakpoints_to_lines

And this takes the breakpoints and the nodes, and assembles them into lines.

AUTHOR

Simon Cozens, <simon at cpan.org>

ACKNOWLEDGEMENTS

This module is a Perl translation of Bram Stein's Javascript Knuth-Plass implementation. Any bugs, however, are probably my fault.

BUGS

Please report any bugs or feature requests to bug-text-knuthplass at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-KnuthPlass. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Copyright 2011 Simon Cozens.

This program is released under the following license: Perl, GPL