Dave Cross: Still Munging Data With Perl: Online event - Mar 17 Learn more

NAME

ishmael - EBook dumper

SYNOPSIS

ishmael [options] file [output]

DESCRIPTION

ishmael is a Perl program that can read and dump the contents of various popular (and unpopular) ebook formats. It originally only dumped the formatted text contents of an ebook, but has since grown to be able to dump metadata, images, and more. ishmael dumps text by default, but it can dump other kinds of content through the use of command-line options.

file is the ebook file for ishmael to dump. output is the path to write any output to. If not specified, it defaults to stdout (except for -c|--cover and -g|--image). stdout can also be manually specified via -.

ishmael currently supports the following ebook formats:

EPUB
MOBI
AZW
HTML/XHTML
PDF
FictionBook2
PalmDoc
zTXT
Comic Book Archives (cbr, cbz, cb7)
Microsoft Compiled HTML Help (CHM)
Text

OPTIONS

-d|--dumper=dumper

Specify the program to use for formatting ebook text. The following are valid options, as long they're installed on your system:

lynx
w3m
queequeg

queequeg(1) is a script distributed with ishmael that acts as a fallback dumper if no other dumper is installed on your system. If this program was installed normally, queequeg(1) should always be available to ishmael.

By default, ishmael will either use the dumper specified by the ISHMAEL_DUMPER environment variable if set, or the first one it finds installed on your system otherwise.

-f|--format=format

Instead of trying to determine the given ebook format via a series of heuristics, manually specify the format. The following are valid options (caes does not matter):

epub
fictionbook2 (or fb2)
html
xhtml
mobi
azw
palmdoc
pdf
cb7
cbr
cbz
ztxt
chm
text
-w|--width=width

Specify the outputted line width. Defaults to 80.

-H|--html

Dump the HTML-ified contents of the ebook instead of the formatted plain text.

-c|--cover

Dump the ebook's cover image, if it has one. By default, output is written to file-basename.image-suffix. When specifying the output yourself, you can get put a .* (dot asterisk) at the end of the path name, which ishmael will substitute for the the image's format suffix.

# Could create foo.jpg, foo.png, foo.gif, etc.
ishmael -c ebook.epub foo.*
-g|--image

Dump all images found in the ebook to a specified output directory. By default, output is written to a directory named after the basename of the given ebook. Images created will follow the ebook-name-num.img naming scheme.

-i|--identify

Instead of dumping the text contents of an ebook, try to identify its format instead.

-m|--metadata[=form]

Dump the ebook's metadata. form is an optional argument specifying the format to use for the dumped metadata. The following are valid forms.

ishmael
json
pjson (pretty JSON)
xml
pxml (pretty XML)

The default form is ishmael.

Something important to note about the metadata dump is that no field is guaranteed to have a consistent format across different ebook formats.

-r|--raw

Dump the ebook's raw, unformatted text contents.

-h|--help

Print help message and exit.

-v|--version

Print version and copyright info, then exit.

EXAMPLES

Pipe ishmael into a pager for a basic terminal e-reader.

ishmael ebook.epub | less

grep(1) for a specific pattern in an ebook.

ishmael slaughterhouse-five.epub | grep --color -C 5 'So it goes'

ENVIRONMENT

ISHMAEL_DUMPER

Name of dumper program to use by default.

RESTRICTIONS

PDF processing is inefficient and the output is ugly.

AUTHOR

Written by Samuel Young, <samyoung12788@gmail.com>.

This project's source can be found on its Codeberg Page. Comments and pull requests are welcome!

HISTORY

This is the fifth iteration of this program, and hopefully the last :-).

This program originally went by the name of ebread. The first iteration was written in C and only supported EPUBs, it was quite buggy. The second iteration was written as a learning exercise for Perl, it too only supported EPUBs, it was also where I got the idea to delegate the text formatting task to another program. The third iteration was again in C, but this time supported a bunch of other ebook formats. It wasn't nearly as buggy as the first, but the code was quite sloppy and had gotten to the point where I couldn't extend it much. The fourth iteration was written in Raku, it only supported EPUBs. This iteration, I renamed the project to ishmael because I got bored of the last name. This iteration supports multiple different ebook formats, but is written in Perl so it should (hopefully) be less buggy and more maintainable.

COPYRIGHT

Copyright (C) 2025 Samuel Young

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

SEE ALSO

queequeg(1), elinks(1), links(1), lynx(1), w3m(1)