The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RTF::Tokenizer - Tokenize RTF

DESCRIPTION

Tokenizes RTF

SYNOPSIS

  use RTF::Tokenizer;

  sub entity_handler {
    return "&#" . hex($_[0]);
  }

  my $object = RTF::Tokenizer->new($line);
  #my $object = RTF::Tokenizer->new($line, \&entity_handler);

  while (1) {
    my ($type, $value, $extra) = $object->get_token;
    print "$type, $value, $extra\n";
    if ($type eq 'eof') { exit; }
  }

  $rtf->bookmark('save', '_font_table_original');

  $rtf->jump_to_control_word('fonttbl');
  my ($la, $la, $la) = $rtf->get_token; # 'control', 'fonttbl'

  $rtf->bookmark('retr', '_font_table_original');

  $rtf->jump_to_control_word('rtf');
  my ($la, $la, $la) = $rtf->get_token; # 'control', 'rtf', 1

  $rtf->bookmark('retr', '_font_table_original');

  $rtf->bookmark('delete', '_font_table_original');

METHODS

new ( $data [, entity handling subroutine ] )

Creates an instance. Needs a string of RTF for the first argument and an optional subroutine for the second. This subroutine is what to do upon finding an entity. Default behaviour is to change it into the character represented, but you can make it spit out HTML entities if you want too (as per the example above). The argument passed to this routine will be a hex value for the entity.

get_token

Returns a list, containing: token type (one of: control, text, group or eof), token data, and then if it's a control word, the integer value associated with it (if there is one).

bookmark ( action, name )

Saves a copy of the current buffer to a hash in the object, with the key of 'name'. Possible actions are 'save', 'retr' and 'delete.' It's probably a good idea, if you have a large amount of text, to delete your bookmarks when done, because the hash contains a copy of the data, rather than a position in the buffer. Font.pm contains a good example.

jump_to_control_word ( list of control words )

Goes through the buffer until it finds one of the control words. The next token from get_token, having done this, will be the control word. The buffer up to this point will be lost (unless you've saved it.)

AUTHOR

Peter Sergeant <pete@clueball.com>

COPYRIGHT

Copyright 2002 Peter Sergeant.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.