The Perl and Raku Conference 2025: Greenville, South Carolina - June 27-29 Learn more

#!/usr/bin/perl
use 5.016;
use strict;
my $ishmael = EBook::Ishmael->init();
$ishmael->run();
=head1 NAME
ishmael - EBook dumper
=head1 SYNOPSIS
ishmael [options] file [output]
=head1 DESCRIPTION
B<ishmael> is a Perl program that can read and dump the contents of various
popular (and unpopular) ebook formats. It originally only dumped the formatted
text contents of an ebook, but has since grown to be able to dump metadata,
images, and more. B<ishmael> dumps formatted text by default, but it can dump
other kinds of content through the use of command-line options.
I<file> is the ebook file for B<ishmael> to dump. I<output> is the path to
write any output to. If not specified, it defaults to F<stdout> (except for the
B<-c>|B<--cover> and B<-g>|B<--image> options). F<stdout> can also be manually
specified via C<->.
B<ishmael> currently supports the following ebook formats:
=over 4
=item EPUB
=item MOBI
=item AZW3/KF8
=item AZW
=item HTML/XHTML
=item PDF
=item FictionBook2
=item PalmDoc
=item zTXT
=item Comic Book Archives (cbr, cbz, cb7)
=item Microsoft Compiled HTML Help (CHM)
=item Text
=back
=head1 OPTIONS
=over 4
=item B<-d>|B<--dumper>=I<dumper>
Specify the program to use for formatting ebook text. The following are valid
options, as long they're installed on your system:
=over 4
=item elinks
=item links
=item lynx
=item w3m
=item chawan
=item queequeg
=back
L<queequeg(1)> is a script distributed with B<ishmael> that acts as a fallback
dumper if no other dumper is installed on your system. If this program was
installed normally, L<queequeg(1)> should always be available to B<ishmael>.
By default, B<ishmael> will either use the dumper specified by the
C<ISHMAEL_DUMPER> environment variable if set, or the first one it finds
installed on your system otherwise.
=item B<-e>|B<--encoding>=I<enc>
Print outputted text in the I<enc> character encoding. This option only affects
the formatted text dump, raw text dump, and HTML dump modes.
For a list of supported encodings, please consult the Perl documentation for
L<Encode::Supported>.
Can also be set by the C<ISHMAEL_ENCODING> environment variable.
If not specified, the default encoding is C<utf8>.
=item B<-f>|B<--format>=I<format>
Instead of trying to determine the given ebook format via a series of
heuristics, manually specify the format. The following are valid options
(caes does not matter):
=over 4
=item epub
=item fictionbook2 (or fb2)
=item html
=item xhtml
=item mobi
=item kf8 (or azw3)
=item azw
=item palmdoc
=item pdf
=item cb7
=item cbr
=item cbz
=item ztxt
=item chm
=item text
=back
=item B<-w>|B<--width>=I<width>
Specify the formatted text line width. Defaults to C<80>.
=item B<-H>|B<--html>
Dump the HTML-ified contents of the ebook instead of the formatted plain text.
=item B<-c>|B<--cover>
Dump the ebook's cover image, if it has one. By default, output is written to
F<I<file-basename>.I<image-suffix>>. When specifying the output yourself, you
can put a C<.-> (dot hyphen) at the end of the path name, which B<ishmael>
will substitute for the image's format suffix.
# Could create foo.jpg, foo.png, foo.gif, etc.
ishmael -c ebook.epub 'foo.-'
=item B<-g>|B<--image>
Dump all images found in the ebook to a specified output directory. By default,
output is written to a directory named after the basename of the given ebook.
Images created will follow the F<I<ebook-name>-I<num>.I<img>> naming scheme.
=item B<-i>|B<--identify>
Instead of dumping the text contents of an ebook, try to identify its format
instead.
=item B<-m>|B<--metadata>[=I<form>]
Dump the ebook's metadata. I<form> is an optional argument specifying the format
to use for the dumped metadata. The following are valid I<form>s.
=over 4
=item ishmael
=item json
=item pjson (pretty JSON)
=item xml
=item pxml (pretty XML)
=back
The default I<form> is C<ishmael>.
=item B<-r>|B<--raw>
Dump the ebook's raw, unformatted text contents.
=item B<-h>|B<--help>
Print help message and exit.
=item B<-v>|B<--version>
Print version and copyright info, then exit.
=back
=head1 EXAMPLES
Pipe B<ishmael> into a pager for a basic terminal e-reader.
ishmael ebook.epub | less
L<grep(1)> for a specific pattern in an ebook.
ishmael slaughterhouse-five.epub | grep --color -C 5 'So it goes'
=head1 ENVIRONMENT
=over 4
=item ISHMAEL_DUMPER
Name of dumper program to use for formatting text by default.
=item ISHMAEL_ENCODING
Name of character encoding to use for outputted text by default.
=back
=head1 CAVEATS
When dumping ebook metadata, the format of a given field can differ between
different ebook formats. For example, the C<Created> field could potentially be
in the C<YYYY-MM-DD> format, C<DD.MM.YYYY> format, etc. Special care should be
taken if you intend to process these fields across different formats.
=head1 RESTRICTIONS
PDF processing is inefficient and the output is ugly.
=head1 AUTHOR
Written by Samuel Young, E<lt>samyoung12788@gmail.comE<gt>.
This project's source can be found on its
L<Codeberg Page|https://codeberg.org/1-1sam/ishmael>. Comments and pull
requests are welcome!
=head1 HISTORY
This is the fifth iteration of this program, and hopefully the last :-).
This program originally went by the name of B<ebread>. The first iteration was
written in C and only supported EPUBs, it was quite buggy. The second
iteration was written as a learning exercise for Perl, it too only supported
EPUBs, it was also where I got the idea to delegate the text formatting task to
another program. The third iteration was again in C, but this time supported
a bunch of other ebook formats. It wasn't nearly as buggy as the first, but the
code was quite sloppy and had gotten to the point where I couldn't extend it
much. The fourth iteration was written in Raku, it only supported EPUBs. This
iteration, I renamed the project to B<ishmael> because I got bored of the last
name. This iteration supports multiple different ebook formats, but is written
in Perl so it should (hopefully) be less buggy and more maintainable.
=head1 COPYRIGHT
Copyright (C) 2025 Samuel Young
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
=head1 SEE ALSO
L<queequeg(1)>, L<elinks(1)>, L<links(1)>, L<lynx(1)>, L<w3m(1)>, L<cha(1)>,
L<Encode::Supported>
=cut