The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Parser::Manual::ComparingWithNativePerl - A comparison of text parsing with native Perl and Text::Parser

VERSION

version 0.927

LIMITATIONS OF THE PERL ONE-LINER

When people compare Perl against AWK, the usual answer is this:

    $ > perl -lane 'print;' file.txt

But the problem is that it isn't useful for anything more than just oneliners. Secondly, this cannot be used in a complex program. And even if you could write some one-liner code, you cannot follow good programming practices like use strict.

The Perl one-liner is surely not a useful solution for serious programs. But if you're not convinced, we'll go through some examples here.

A SIMPLE EXAMPLE

To understand how Text::Parser compares to the native Perl way of doing things, let's take a simple example and see how we would write code. Let's say we have a simple text file (info.txt) with lines of information like this:

    NAME: Brian
    EMAIL: brian@webhost.net
    ADDRESS: 401 Burnswick Ave, Cool City, UT 12345
    NAME: Darin Cruz
    ADDRESS: 209 Random St, Forest City, CA 92710
    EMAIL: darin123@yahoo.co.uk
    NAME: Elizabeth Andrews
    ADDRESS: 0 Muutama Lane, Inaccessible Forest area, AK 88170
    NAME: Audrey C. Miller
    ADDRESS: 9 New St, Smart City, PA 12933
    EMAIL: aud@audrey.io

You have to write code that would parse this to create a data structure with all names and corresponding email addresses.

    { name => "Brian", email => "brian@webhost.net", address => "401 Burnswick Ave, Cool City, UT 12345"}, 
    .
    .
    .

Perl one-liner

Could we do this using a Perl one-liner?

    perl -lane 'BEGIN {
        @data = ();\
        }\
        if($F[0] eq "NAME:") {\
            shift @F;\
            push @data, {name => join(' ', @F)};\
        } elsif($F[0] eq "EMAIL:") {\
            $d = pop @data; $d->{email} = $F[1];\
        } elsif($F[0] eq "ADDRESS:") {\
            $d = pop @data;\
            shift @F; \
            $d->{address} = join ' ', @F;\
        }' info.txt

So much for a one-liner! But you can't do anything else with this, can you?

Native Perl script

Here's an implementation in native Perl scipt:

    open IN, "<info.txt";
    my @data = ();
    while(<IN>) {
        chomp;
        my (@field) = split /\s+/;
        if ($field[0] eq 'NAME:') {
            shift @field;
            push @data, { name => join(' ', @field) };
        } elsif($field[0] eq 'EMAIL:') {
            $data[-1]->{email} = $field[1];
        } elsif($field[0] eq 'ADDRESS:') {
            shift @field;
            $data[-1]->{email} = join ' ', @field;
        }
    }
    close IN;

With Text::Parser

Here's how you'd write the same thing with Text::Parser.

    use Text::Parser;

    my $parser = Text::Parser->new();
    $parser->add_rule( if => '$1 eq "NAME:"', do => 'return { name => ${2+} }' );
    $parser->add_rule( if => '$1 eq "EMAIL:"',
        do => 'my $rec = $this->pop_record; $rec->{email} = $2; return $rec' );
    $parser->add_rule( if => '$1 eq "ADDRESS:"',
        do => 'my $rec = $this->pop_record; $rec->{email} = ${2+}; return $rec' );
    $parser->read('info.txt');

Quick observations

The programmer has to still specify how to extract data, but:

  • she can focus on the content rather than the mechanics of file handling

  • another programmer can instantly understand what is going on

  • the results can be used in a more complex program - not just a one-liner

  • parsing files has never been this intuiive, especially with shortcuts like ${2+}

Besides, did you notice the bug in the while loop of the native Perl script above? Hint: What happens if we split a string with leading and trailing spaces?

ANOTHER SIMPLE EXAMPLE

Take another simple example. Here we have new stuff in info.txt:

    State: California
    County: Santa Clara, 1304, San Jose, 2/18/1850
    County: Alameda, 821, Oakland, 3/25/1853
    County: San Mateo, 774, Redwood City, 4/19/1856
    .
    .
    .

    State: Arkansas
    .
    .
    .

Let's say you have to parse this and form a data structure like this:

    [
        {
            state           => 'California', 
            'Santa Clara'   => {area => 1304, county_seat => 'San Jose', date_inc => '2/18/1850'}, 
            'Alameda'       => {area => 821, county_seat => 'Oakland', date_inc => '3/25/1853'}, 
            'San Mateo'     => {area => 774, county_seat => 'Redwood City', date_inc => '4/19/1856'}, 
        }, 
        {
            state           => 'Arkansas', 
            ...
        }
    ]

Perl one-liner

It is clear that the one-liner is no longer really a one-liner. And you cannot use strict. But go ahead and give it a try if you want.

Native Perl code

    use String::Util 'trim';

    open IN, "<info.txt";
    my @data = ();
    while(<IN>) {
        chomp;
        $_ = trim($_);
        my (@field) = split /[:,]\s+/;
        if ($field[0] eq 'State') {
            push @data, { state => $field[1] };
        } elsif($field[0] eq 'County') {
            my $data = pop @data;
            $data->{$field[1]} => {area => $field[2], county_seat => $field[3], date_inc => $field[4]};
            push @data, $data;
        }
    }
    close IN;

With Text::Parser

    use Text::Parser;

    my $parser = Text::Parser->new(auto_split => 1, FS => qr/[:,]\s+/);
    $parser->add_rule(if => '$1 eq "State"', do => 'return {state => $2}');
    $parser->add_rule(if => '$1 eq "County"',
        do => 'my $data = $this->pop_record;
        $data->{$2} = { area => $3, county_seat => $4, date_inc => $5, };
        return $data;'
    );
    $parser->read('info.txt');

SOMETHING MORE FUN

Let's take something more fun. A selection of students from Riverdale High and Hogwarts took part in a quiz. This is a record of their scores.

    School = Riverdale High
    Grade = 1
    Student number, Name
    0, Phoebe
    1, Rachel
    
    Student number, Score
    0, 3
    1, 7
    
    Grade = 2
    Student number, Name
    0, Angela
    1, Tristan
    2, Aurora
    
    Student number, Score
    0, 6
    1, 3
    2, 9
    
    School = Hogwarts
    Grade = 1
    Student number, Name
    0, Ginny
    1, Luna
    
    Student number, Score
    0, 8
    1, 7
    
    Grade = 2
    Student number, Name
    0, Harry
    1, Hermione
    
    Student number, Score
    0, 5
    1, 10
    
    Grade = 3
    Student number, Name
    0, Fred
    1, George
    
    Student number, Score
    0, 0
    1, 0 

You want to parse this into a data structure like this:

    # Entries data-structure hierarchy is:
    #   school/grade/student number/Name
    #   school/grade/student number/Score
    {
        "Riverdale High" => {
            "1" => {
                0 => {Name => "Phoebe", Score => 3}, 
                1 => {Name => "Rachel", Score => 7}
            }, 
            "2" => {
                0 => {Name => "Angela", Score => 6}, 
                1 => {Name => "Tristan", Score => 3}, 
                2 => {Name => "Aurora", Score => 9}, 
            }, 
        }, 
    }, 
    {
        "Hogwarts" => {
            "1" => {
                0 => {Name => "Ginny", Score => 8}, 
                1 => {Name => "Luna", Score => 7}, 
            }, 
            "2" => {
                0 => {Name => "Harry", Score => 5}, 
                1 => {Name => "Hermione", Score => 10}, 
            }, 
            "3" => {
                0 => {Name => "Fred", Score => 0}, 
                1 => {Name => "George", Score => 0 }, 
            },
        }, 
    }

This problem comes from a source where the solution was implemented in Python using a PEG parser.

Native Perl

Do I have to really do this? Why don't I let you try this yourself.

With Text::Parser

    use Text::Parser;

    my $parser = Text::Parser->new(FS => qr/\s+\=\s+|,\s+/);
    $parser->add_rule(if => '$1 eq "School"',
        do => '~school = $2; return {$2 => {}};');
    $parser->add_rule(if => '$1 eq "Grade"',
        do => 'my $p = $this->pop_record;
        $p->{~school}{$2} = {};
        ~grade = $2;
        return $p;');
    $parser->add_rule(if => '$1 eq "Student number"',
        do => '~info = $2;', dont_record => 1);
    $parser->add_rule(
        do => 'my $p = $this->pop_record;
        $p->{~school}{~grade}{$1}{~info} = $2;
        return $p;'
    );
    $parser->read('info.txt');

That's it!

By now, you should have concluded that the Text::Parser way is much better. If not, you must know a better solution and perhaps you should make a Perl module (or feel free to contact me and contribute if you like this project).

Table of contents | Next

BUGS

Please report any bugs or feature requests on the bugtracker website http://github.com/balajirama/Text-Parser/issues

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

Balaji Ramasubramanian <balajiram@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018-2019 by Balaji Ramasubramanian.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.