The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Catmandu::ALTOXML - tools to work with ALTOXML documents

SYNOPSIS

    #From the command line

    #Extract OCR data, treating each line as a record

    $ catmandu convert ALTOXML --file input.xml to YAML

    #In a script

    use Catmandu::Sane;

    use Catmandu::Importer::ALTOXML;

    my $importer = Catmandu::Importer::ALTOXML->new( file => "/tmp/input.xml" );

    $importer->each(sub{

        my $record = $_[0];
        #..

    });

EXAMPLE OUTPUT IN YAML

    ---
    block: 5
    block_h: 63
    block_w: 114
    block_x: 2294
    block_y: 2713
    h: 38
    page: 1
    page_h: 3316
    page_w: 2904
    page_x: ~
    page_y: ~
    text: '1'
    w: 17
    x: 2349
    y: 2717
    ...

INSTALLATION

In order to install this package you need the following system packages installed

Centos

* perl-devel

* make

* gcc

* gcc-c++

* libyaml-devel

* libxml2 version 2.6.21 or higher. Reason: the module XML::LibXML::Reader uses the libxml2 pull parser to read xml documents incrementally.

AUTHORS

Nicolas Franck <nicolas.franck at ugent.be>

SEE ALSO

Catmandu::Importer::ALTOXML, XML::LibXML::Reader, Catmandu, Catmandu::Importer

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.