Dmitry Karasik


OCR::Naive - convert images into text in an extremely naive fashion


The module implements a very simple and unsophisticated OCR by finding all known images in a larger image. The known images are mapped to text using the preexisting dictionary, and the text lines are returned.

The interesting stuff here is the image finding itself - it is done by a regexp! For all practical reasons, images can be easily treated as byte strings, and regexps are not exception. For example, one needs to locate an image 2x2 in larger 7x7 image. The regexp constructed should be the first scanline of smaller image, 2 bytes, verbatim, then 7 - 2 = 5 of any character, and finally the second scanline, 2 bytes again. Of course there are some quirks, but these explained in API section.

Dictionaries for different fonts can be created interactively by bin/makedict; the non-interactive recognition is performed by bin/ocr which is a mere wrapper to this module.


    use Prima::noX11; # Prima imaging required
    use OCR::Naive;

    # load a dictionary created by bin/makedict
    $db = load_dictionary( 'my.dict');

    # load image to recognize
    my $i = Prima::Image-> load( 'screenshot.png' );
    $i = enhance_image( $i );

    # ocr!
    print "$_\n" for recognize( $i, $db);


load_dictionary $FILE

Loads a glyph dictionary from $FILE, returns a dictionary hash table. If not loaded, returns undef and $! contains the error.

save_dictionary $FILE, $DB

Saves a glyph dictionary from $DB into $FILE, returns success flag. If failed, $! contains the error.

image2db_key $IMAGE

The dictionary is intended to be a simple hash, where the key is the image pixel data, and value is a hash of image attributes - width, height, text, and possible something more for the future. The key currently is image data verbatim, and image2db_key returns the data of $IMAGE.


Locates a $SUBIMAGE in $IMAGE, returns one or many matches, depending on $MULTIPLE. If single match is requested, stops on the first match, and returns a pair of (X,Y) coordinates. If $MULTIPLE is 1, returns array of (X,Y) pairs. In both modes, returns empty list if nothing was found.

suggest_glyph_order $DB

When more than one subimage is to be found on a larger image, it is important that parts of larger glyphs are not eventually attributed to smaller ones. For example, letter ('i') might be detected as a combination of ('dot') and ('dotlessi'). To avoid this suggest_glyph_order sorts all dictionary entries by their occupied area, larger first, and returns sorted set of keys.

enhance_image $IMAGE, %OPTIONS

Glyphs in dictionary are black-and-white images, and the ideal detection should also happed on 2-color images. enhance_image tries to enhance the contrast of the image, find histogram peaks, and detect what is foreground and what is background, and finally converts the image into a black-and-white.

This procedure is of course nowhere near any decent pre-OCR image processing, so don't expect much. OTOH it might be serve a good-enough quick hack for screen dumps.

If $OPTIONS{verbose} is set, prints details is it goes.

recognize $IMAGE, $DB, %OPTIONS

Given a dictionary $DB, recognizes all text it can find on $IMAGE. Returns array of text lines.

The spaces are a problem with approach, and even though recognize tries to deduce a minimal width in pixels that should not be treated a <C('space')> character, it will inevitably fail. Set $OPTION{minspace} to the space width if you happen to know what font you're detecting.

If $OPTIONS{verbose} is set, prints details is it goes.


Prima, IPA


OCR::PerfectCR, PDF::OCR


Copyright (c) 2007 capmon ApS. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Dmitry Karasik, <>.