Text::Parser::Manual::ComparingWithNativePerl - A comparison of text parsing with native Perl and Text::Parser
version 1.000
When people compare Perl against AWK, the usual answer is this:
$ > perl -lane 'print;' file.txt
But the problem is that it isn't useful for anything more than just oneliners. Secondly, this cannot be used in a complex program. And even if you could write some code in a separate file, you cannot follow good programming practices like use strict.
use strict
The Perl one-liner is surely not a useful solution for serious programs that have to parse the content of complex file formats. But if you're not convinced, we'll go through some examples here.
To understand how Text::Parser compares to the native Perl way of doing things, let's take a simple example and see how we would write code. Let's say we have a simple text file (info.txt) with lines of information like this:
NAME: Brian EMAIL: brian@webhost.net ADDRESS: 401 Burnswick Ave, Cool City, UT 12345 NAME: Darin Cruz ADDRESS: 209 Random St, Forest City, CA 92710 EMAIL: darin123@yahoo.co.uk NAME: Elizabeth Andrews ADDRESS: 0 Muutama Lane, Inaccessible Forest area, AK 88170 NAME: Audrey C. Miller ADDRESS: 9 New St, Smart City, PA 12933 EMAIL: aud@audrey.io
You have to write code that would parse this to create a data structure with all names and corresponding email addresses.
{ name => "Brian", email => "brian@webhost.net", address => "401 Burnswick Ave, Cool City, UT 12345"}, . . .
The important thing to note is that NAME, and ADDRESS fields can be long strings.
NAME
ADDRESS
Could we do this using a Perl one-liner?
perl -lane 'BEGIN {\ @data = ();\ }\ if($F[0] eq "NAME:") {\ shift @F;\ push @data, {name => join(' ', @F)};\ } elsif($F[0] eq "EMAIL:") {\ $d = pop @data; $d->{email} = $F[1];\ } elsif($F[0] eq "ADDRESS:") {\ $d = pop @data;\ shift @F; \ $d->{address} = join ' ', @F;\ }' info.txt
So much for a one-liner! But you can't make it shorter, can you?
Here's an implementation in native Perl scipt:
open IN, "<info.txt"; my @data = (); while(<IN>) { chomp; my (@field) = split /\s+/; if ($field[0] eq 'NAME:') { shift @field; push @data, { name => join(' ', @field) }; } elsif($field[0] eq 'EMAIL:') { $data[-1]->{email} = $field[1]; } elsif($field[0] eq 'ADDRESS:') { shift @field; $data[-1]->{email} = join ' ', @field; } } close IN;
Here's how you'd write the same thing with Text::Parser.
use Text::Parser; my $parser = Text::Parser->new(); $parser->add_rule( if => '$1 eq "NAME:"', do => 'return { name => ${2+} };' ); $parser->add_rule( if => '$1 eq "EMAIL:"', do => 'my $rec = $this->pop_record; $rec->{email} = $2; return $rec;' ); $parser->add_rule( if => '$1 eq "ADDRESS:"', do => 'my $rec = $this->pop_record; $rec->{email} = ${2+}; return $rec;' ); $parser->read('info.txt');
The programmer has to still specify how to extract data, but:
she can focus on the content rather than the mechanics of file handling
another programmer can instantly understand what is going on
the results can be used in a more complex program - not just a one-liner
parsing files has never been this intuiive, especially with shortcuts like ${2+}
${2+}
Besides, did you notice the bug in the while loop of the native Perl script above? It is hard to notice.
while
Take another simple example. Here we have new stuff in info.txt:
State: California County: Santa Clara, 1304, San Jose, 2/18/1850 County: Alameda, 821, Oakland, 3/25/1853 County: San Mateo, 774, Redwood City, 4/19/1856 . . . State: Arkansas . . .
Let's say you have to parse this and form a data structure like this:
[ { state => 'California', 'Santa Clara' => {area => 1304, county_seat => 'San Jose', date_inc => '2/18/1850'}, 'Alameda' => {area => 821, county_seat => 'Oakland', date_inc => '3/25/1853'}, 'San Mateo' => {area => 774, county_seat => 'Redwood City', date_inc => '4/19/1856'}, }, { state => 'Arkansas', ... } ]
It is clear that the one-liner is no longer really a one-liner. And you cannot use strict. But go ahead and give it a try if you want.
use String::Util 'trim'; open IN, "<info.txt"; my @data = (); while(<IN>) { chomp; $_ = trim($_); my (@field) = split /[:,]\s+/; if ($field[0] eq 'State') { push @data, { state => $field[1] }; } elsif($field[0] eq 'County') { my $data = pop @data; $data->{$field[1]} => {area => $field[2], county_seat => $field[3], date_inc => $field[4]}; push @data, $data; } } close IN;
use Text::Parser; my $parser = Text::Parser->new(auto_split => 1, FS => qr/[:,]\s+/); $parser->add_rule(if => '$1 eq "State"', do => 'return {state => $2}'); $parser->add_rule(if => '$1 eq "County"', do => 'my $data = $this->pop_record; $data->{$2} = { area => $3, county_seat => $4, date_inc => $5, }; return $data;' ); $parser->read('info.txt');
Let's take something more fun. A selection of students from Riverdale High and Hogwarts took part in a quiz. This is a record of their scores.
School = Riverdale High Grade = 1 Student number, Name 0, Phoebe 1, Rachel Student number, Score 0, 3 1, 7 Grade = 2 Student number, Name 0, Angela 1, Tristan 2, Aurora Student number, Score 0, 6 1, 3 2, 9 School = Hogwarts Grade = 1 Student number, Name 0, Ginny 1, Luna Student number, Score 0, 8 1, 7 Grade = 2 Student number, Name 0, Harry 1, Hermione Student number, Score 0, 5 1, 10 Grade = 3 Student number, Name 0, Fred 1, George Student number, Score 0, 0 1, 0
You want to parse this into a data structure like this:
# Entries data-structure hierarchy is: # school/grade/student number/Name # school/grade/student number/Score { "Riverdale High" => { "1" => { 0 => {Name => "Phoebe", Score => 3}, 1 => {Name => "Rachel", Score => 7} }, "2" => { 0 => {Name => "Angela", Score => 6}, 1 => {Name => "Tristan", Score => 3}, 2 => {Name => "Aurora", Score => 9}, }, }, }, { "Hogwarts" => { "1" => { 0 => {Name => "Ginny", Score => 8}, 1 => {Name => "Luna", Score => 7}, }, "2" => { 0 => {Name => "Harry", Score => 5}, 1 => {Name => "Hermione", Score => 10}, }, "3" => { 0 => {Name => "Fred", Score => 0}, 1 => {Name => "George", Score => 0 }, }, }, }
This problem comes from a source where the solution was implemented in Python using a PEG parser.
Do I have to really do this? Why don't I let you try this yourself.
use Text::Parser; my $parser = Text::Parser->new(FS => qr/\s+\=\s+|,\s+/); $parser->add_rule( if => '$1 eq "School"', do => '~school = $2;', dont_record => 1, ); $parser->add_rule( if => '$1 eq "Grade"', do => '~grade = $2;', dont_record => 1, ); $parser->add_rule( if => '$1 eq "Student number"', do => '~info = $2;', dont_record => 1 ); $parser->add_rule( do => 'my $p = $this->pop_record; $p->{~school}{~grade}{$1}{~info} = $2; return $p;' ); $parser->read('info.txt');
That's it! Just notice how elegant it looks.
By now, you should have concluded that the Text::Parser way is much better. If not, you must know a better solution and perhaps you should make a Perl module (or feel free to contact me and contribute if you like this project).
There will be a compile-time penalty in using Text::Parser. If compile-time performance is important for you, this package is not for you.
But run-time performance is something I want to improve and am committed to working on improving. I have to admit that run-time performance is slower than native Perl. But I know where there is scope to improve on runtime, and will come up with some statistics on that.
For now, you should assume that Text::Parser takes roughly 2x run-time. Earlier versions where about 5x slower than native Perl.
Text::Parser
Table of contents | Next
Please report any bugs or feature requests on the bugtracker website http://github.com/balajirama/Text-Parser/issues
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
Balaji Ramasubramanian <balajiram@cpan.org>
This software is copyright (c) 2018-2019 by Balaji Ramasubramanian.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Text::Parser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Parser
CPAN shell
perl -MCPAN -e shell install Text::Parser
For more information on module installation, please visit the detailed CPAN module installation guide.