The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

EBook::Tools::EReader - Palm::PDB handler for manipulating the Fictionwise/PeanutPress eReader format.

SYNOPSIS

 use EBook::Tools::EReader;
 my $pdb = EBook::Tools::EReader->new();
 $pdb->Load('myfile-er.pdb');
 print "Loaded '",$pdb->{title},"' by ",$pdb->{author},"\n";
 my $html = $pdb->html;
 my $pml = $pdb->pml
 $pdb->write_unknown_records

DEPENDENCIES

  • Compress::Zlib

  • Image::Size

  • P5-Palm

CONSTRUCTOR

new()

Instantiates a new Ebook::Tools::EReader object.

ACCESSOR METHODS

filebase

In scalar context, this is the basename of the object attribute filename. In list context, it actually returns the basename, directory, and extension as per fileparse from File::Basename.

footnotes()

Returns a hash containing all of the footnotes found in the file, where the keys are the footnote ids and the values contain the footnote text.

footnotes_pml()

Returns a string containing all of the footnotes in a form suitable to append to the end of PML text output. This is called as part of "pml()".

footnotes_html()

Returns a string containing all of the footnotes in a form suitable to append to the end of HTML text output. This is called as part of "html()".

pml()

Returns a string containing the entire original document text in its original encoding, including all sidebars and footnotes.

html()

Returns a string containing the entire document text (including all sidebars and footnotes) converted to HTML.

Note that the PML text is stored in the object (and thus retrieving it is very fast), but generating the HTML output requires that the text be converted every time this method is used, consuming extra processing time.

sidebars()

Returns a hash containing all of the sidebars found in the file, where the keys are the sidebar ids and the values contain the sidebar text.

Returns a string containing all of the sidebars in a form suitable to append to the end of PML text output. This is called as part of "pml()".

Returns a string containing all of the sidebars in a form suitable to append to the end of HTML text output. This is called as part of "html()".

write_html($filename)

Writes the raw book text to disk in PML form (including all sidebars and footnotes) with the given filename.

If $filename is not specified, writes to $self->filebase with a ".html" extension.

Returns the filename used on success, or undef if there was no text to write.

write_images()

Writes each image record to the disk.

Returns a list containing the filenames of all images written, or undef if none were found.

write_pml($filename)

Writes the raw book text to disk in PML form (including all sidebars and footnotes) with the given filename.

If $filename is not specified, writes to $self->filebase with a ".pml" extension.

Returns the filename used on success, or undef if there was no text to write.

write_unknown_records()

Writes each unidentified record to disk with a filename in the format of 'raw-record-####', where #### is the record number (not the record ID).

Returns the number of records written.

MODIFIER METHODS

Load($filename)

Sets $self->{filename} and then loads and parses the file specified by $filename, calling "ParseRecord(%record)" on every record found.

ParseRecord(%record)

Parses PDB records, updating the object attributes. This method is called automatically on every database record during Load().

ParseRecord0($data)

Parses the header record and places the parsed values into the hashref $self->{header}.

Returns the hash (not the hashref).

PROCEDURES

cp1252_to_pml()

An unfinished and completely nonfunctional procedure to convert Windows-1252 characters to PML \a codes.

DO NOT USE.

pml_to_html($text,$filebase)

Takes as input a text string in Windows-1252 encoding containing PML markup codes and returns a string with those codes converted to UTF-8 HTML.

Requires a second argument $filebase to specify the basename of the file (or specifically, the basename of the file to which output text will be written) so that image links can be generated correctly.

BUGS AND LIMITATIONS

  • HTML conversion doesn't handle handle the \T command used to indent.

  • HTML conversion may be suboptimal in many ways.

    Most notably, most linebreaks are handled as <br />, and without any heed to whether those linebreaks occur inside of some other element. Validation is extremely unlikely.

AUTHOR

Zed Pobre <zed@debian.org>

LICENSE AND COPYRIGHT

Copyright 2008 Zed Pobre

Licensed to the public under the terms of the GNU GPL, version 2