The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::KnuthPlass - Breaks paragraphs into lines using the TeX algorithm

SYNOPSIS

    use Text::KnuthPlass;
    my $typesetter = Text::KnuthPlass->new();
    my @lines = $typesetter->typeset($paragraph);
    ...

To use with plain text:

    for (@lines) {
        for (@{$_->{nodes}}) {
            if ($_->isa("Text::KnuthPlass::Box")) { print $_->value }
            elsif ($_->isa("Text::KnuthPlass::Glue")) { print " " }
        }
        if ($_->{nodes}[-1]->is_penalty) { print "-" }
        print "\n";
    }

To use with PDF::Builder: (as well as PDF::API2)

    my $text = $page->text;
    $text->font($font, 12);
    $text->lead(13.5);

    my $t = Text::KnuthPlass->new(
        measure => sub { $text->advancewidth(shift) }, 
        linelengths => [235]
    );
    my @lines = $t->typeset($paragraph);

    my $y = 500;
    for my $line (@lines) {
        $x = 50; 
        for my $node (@{$line->{nodes}}) {
            $text->translate($x,$y);
            if ($node->isa("Text::KnuthPlass::Box")) {
                $text->text($node->value);
                $x += $node->width;
            } elsif ($node->isa("Text::KnuthPlass::Glue")) {
                $x += $node->width + $line->{ratio} *
                    ($line->{ratio} < 0 ? $node->shrink : $node->stretch);
            }
        }
        if ($line->{nodes}[-1]->is_penalty) { $text->text("-") }
        $y -= $text->lead();
    }

METHODS

new

The constructor takes a number of options. The most important ones are:

measure

A subroutine reference to determine the width of a piece of text. This defaults to length(shift), which is what you want if you're typesetting plain monospaced text. You will need to change this to plug into your font metrics if you're doing something graphical.

linelengths

This is an array of line lengths. For instance, [30,40,50] will typeset a triangle-shaped piece of text with three lines. What if the text spills over to more than three lines? In that case, the final value in the array is used for all further lines. So to typeset an ordinary block-shaped column of text, you only need specify an array with one value: the default is [78] .

tolerance

How much leeway we have in leaving wider spaces than the algorithm would prefer.

hyphenator

An object which hyphenates words. If you have Text::Hyphen installed (highly recommended) then a Text::Hyphen object is instantiated by default; if not, an object of the class Text::KnuthPlass::DummyHyphenator is instantiated - this simply finds no hyphenation points at all. So to turn hyphenation off, set

    hyphenator => Text::KnuthPlass::DummyHyphenator->new()

To typeset non-English text, pass in an object which responds to the hyphenate method, returning a list of hyphen positions. (See Text::Hyphen for the interface.)

There are other options for fine-tuning the output. If you know your way around TeX, dig into the source to find out what they are.

typeset

This is the main interface to the algorithm, made up of the constituent parts below. It takes a paragraph of text and returns a list of lines if suitable breakpoints could be found.

The list has the following structure:

    (
        { nodes => \@nodes, ratio => $ratio },
        { nodes => \@nodes, ratio => $ratio },
        ...
    )

The node list in each element will be a list of objects. Each object will be either Text::KnuthPlass::Box, Text::KnuthPlass::Glue or Text::KnuthPlass::Penalty. See below for more on these.

The ratio is the amount of stretch or shrink which should be applied to each glue element in this line. The corrected width of each glue node should be:

    $node->width + $line->{ratio} *
        ($line->{ratio} < 0 ? $node->shrink : $node->stretch);

Each box, glue or penalty node has a width attribute. Boxes have values, which are the text which went into them; glue has stretch and shrink to determine how much it should vary in width. That should be all you need for basic typesetting; for more, see the source, and see the original Knuth-Plass paper in "Digital Typography".

Why typeset rather than something like linesplit? Per "ACKNOWLEDGEMENTS", this code is ported from the Javascript product typeset.

This method is a thin wrapper around the three methods below.

break_text_into_nodes

This turns a paragraph into a list of box/glue/penalty nodes. It's fairly basic, and designed to be overloaded. It should also support multiple justification styles (centering, ragged right, etc.) but this will come in a future release; right now, it just does full justification.

If you are doing clever typography or using non-Western languages you may find that you will want to break text into nodes yourself, and pass the list of nodes to the methods below, instead of using this method.

break

This implements the main body of the algorithm; it turns a list of nodes (produced from the above method) into a list of breakpoint objects.

breakpoints_to_lines

And this takes the breakpoints and the nodes, and assembles them into lines.

glueclass

penaltyclass

For subclassers.

AUTHOR

originally written by Simon Cozens, <simon at cpan.org>

since 2020, maintained by Phil Perry

ACKNOWLEDGEMENTS

This module is a Perl translation of Bram Stein's "Typeset" Javascript Knuth-Plass implementation. Any bugs, however, are probably my fault.

BUGS

Please report any bugs or feature requests to the issues section of https://github.com/PhilterPaper/Text-KnuthPlass.

Do NOT under ANY circumstances open a PR (Pull Request) to report a bug. It is a waste of both your and our time and effort. Open a regular ticket (issue), and attach a Perl (.pl) program illustrating the problem, if possible. If you believe that you have a program patch, and offer to share it as a PR, we may give the go-ahead. Unsolicited PRs may be closed without further action.

COPYRIGHT & LICENSE

Copyright (c) 2011 Simon Cozens.

Copyright (c) 2020-2021 Phil M Perry.

This program is released under the following license: Perl, GPL