Text::Parser::RuleSpec - Syntax sugar for rule specification while subclassing Text::Parser or derivatives
version 1.000
package MyFavorite::Parser; use Text::Parser::RuleSpec; extends 'Text::Parser'; has '+multiline_type' => (default => 'join_next'); unwraps_lines_using ( is_wrapped => sub { my $self = shift; $_ = shift; chomp; m/\s+[~]\s*$/; }, unwrap_routine => sub { my ($self, $last, $current) = @_; chomp $last; $last =~ s/\s+[~]\s*$//g; "$last $current"; }, ); applies_rule get_emails => ( if => '$1 eq "EMAIL:"', do => '$2;' ); package main; my $parser = MyFavorite::Parser->new(); $parser->read('/path/to/email_lists.txt'); my (@emails) = $parser->get_records(); print "Here are all the emails from the file: @emails\n";
This class enables users to create their own parser classes for a known text file format, and facilitates code-sharing across multiple variants of the same basic text format. The basic steps are as follows:
package MyFavorite::Parser; use Text::Parser::RuleSpec; extends 'Text::Parser';
That's it! This is the bare-minimum required to make your own text parser. But it is not particularly useful at this point without any rules of its own.
applies_rule comment_char => ( if => '$1 =~ /^#/;', dont_record => 1, );
This above rule ignores all comment lines and is added to MyFavorite::Parser class. So now when you create an instance of MyFavorite::Parser, it would automatically run this rule when you call read.
MyFavorite::Parser
read
We can preset any attributes for this parser class using the familiar Moose functions. Here is an example:
has '+line_wrap_style' => ( default => 'trailing_backslash', is => 'ro', ); has '+auto_trim' => ( default => 'b', is => 'ro', );
Sometimes, you may want to store the parsed information in attributes, instead of records. So for example:
has current_section => ( is => 'rw', isa => 'Str|Undef', default => undef, lazy => 1, ); has _num_lines_by_section => ( is => 'rw', isa => 'HashRef[Int]', default => sub { {}; }, lazy => 1, handles => { num_lines => 'get', _set_num_lines => 'set', } ); applies_rule inc_section_num_lines => ( if => '$1 ne "SECTION"', do => 'my $sec = $this->current_section; my $n = $this->num_lines($sec); $this->_set_num_lines($sec => $n+1);', dont_record => 1, ); applies_rule get_section_name => ( if => '$1 eq "SECTION"', do => '$this->current_section($2); $this->_set_num_lines($2 => 0);', dont_record => 1, );
In the above example, you can see how the section name we get from one rule is used in a different rule.
We can further subclass a class that extends Text::Parser. Inheriting the rules of the superclass is automatic:
extends
package MyParser1; use Text::Parser::RuleSpec; extends 'Text::Parser'; applies_rule rule1 => ( do => '# something', ); package MyParser2; use Text::Parser::RuleSpec; extends 'MyParser1'; applies_rule rule1 => ( do => '# something else', );
Now, MyParser2 contains two rules: MyParser1/rule1 and MyParser2/rule1. Note that both the rules in both classes are called rule1 and both will be executed. By default, rules of superclasses will be run before rules in the subclass. The subclass can change this order by explicitly stating that its own rule1 is run before the rule1 of MyParser1:
MyParser2
MyParser1/rule1
MyParser2/rule1
rule1
before
MyParser1
package MyParser2; use Text::Parser::RuleSpec; extends 'MyParser1'; applies_rule rule1 => ( do => '# something else', before => 'MyParser1/rule1', );
A subclass may choose to disable any superclass rules:
package MyParser3; use Text::Parser::RuleSpec; extends 'MyParser2'; disables_superclass_rules qr/^MyParser1/; # disables all rules from MyParser1 class
Or to clone a rule from either the same class, a superclass, or even from some other random class.
package ClonerParser; use Text::Parser::RuleSpec; use Some::Parser; # contains rules: "heading", "section" extends 'MyParser2'; applies_rule my_own_rule => ( if => '# check something', do => '# collect some data', after => 'MyParser2/rule1', ); applies_cloned_rule 'MyParser2/rule1' => ( add_precondition => '# Additional condition', do => '# Optionally change the action', # prepend_action => '# Or just prepend something', # append_action => '# Or append something', after => 'MyParser1/rule1', );
Imagine this situation: Programmer A writes a text parser for a text format syntax SYNT1, and programmer B notices that the text format he wishes to parse (SYNT2) is similar, except for a few differences. Instead of having to re-write the code from scratch, he can reuse the code from programmer A and modify it exactly as needed. This is especially useful when syntaxes many different text formats are very similar.
There is no constructor for this module. You cannot create an instance of Text::Parser::RuleSpec. Therefore, all methods here can be called on the Text::Parser::RuleSpec directly.
Text::Parser::RuleSpec
Takes parser class name and returns a boolean representing if that class has any rules or not. Returns boolean true if the class has any rules, and a boolean false otherwise.
print "There are no class rules for MyFavorite::Parser.\n" if not Text::Parser::RuleSpec->class_has_rules('MyFavorite::Parser');
Takes a single string argument and returns the ordered list of rule names for the class.
my (@order) = Text::Parser::RuleSpec->class_rule_order('MyFavorite::Parser');
This takes a single string argument with the fully qualified rule name, and returns the actual rule object identified by that name.
my $rule = Text::Parser::RuleSpec->class_rule_object('MyFavorite::Parser/rule1');
Takes a single string argument and returns the actual rule objects of the given class name. This is a shortcut to first running class_rule_order and then running class_rule_object on each one of them.
class_rule_order
class_rule_object
my (@rules) = Text::Parser::RuleSpec->class_rules('MyFavorite::Parser');
Takes a string argument expected to be fully-qualified name of a rule. Returns a boolean that indicates if such a rule was ever compiled. The fully-qualified name of a rule is of the form Some::Class/rule_name. Any suffixes like @2 or @3 should be included to check the existence of any cloned rules.
Some::Class/rule_name
@2
@3
print "Some::Parser::Class/some_rule is a rule\n" if Text::Parser::RuleSpec->is_known_rule('Some::Parser::Class/some_rule');
Takes a parser class name as string argument. It populates the class rules according to the latest order of rules.
Text::Parser::RuleSpec->populate_class_rules('MyFavorite::Parser');
The following methods are exported into the namespace of your class by default, and may only be called outside the main namespace.
main
Takes one mandatory string argument - a rule name - followed by the options to create a rule. These are the same as the arguments to the add_rule method of Text::Parser class. Returns nothing. Exceptions will be thrown if any of the required arguments are not provided.
add_rule
applies_rule print_emails => ( if => '$1 eq "EMAIL:"', do => 'print $2;', dont_record => 1, continue_to_next => 1, );
The above call to create a rule print_emails in your class MyFavorite::Parser, will save the rule as MyFavorite::Parser/print_emails. So if you want to clone it in sub-classes or want to insert a rule before or after that in a sub-class, then this is the way to reference the rule.
print_emails
MyFavorite::Parser/print_emails
Optionally, one may provide one of before or after clauses to specify when this rule is to be executed.
after
applies_rule check_line_syntax => ( if => '$1 ne "SECTION"', do => '$this->check_syntax($this->current_section, $_);', before => 'Parent::Parser/add_line_to_data_struct', );
The above rule will apply
Exceptions will be thrown if the before or after rule does not have a class name in it, or if it is the same as the current class, or if the rule is not among the inherited rules so far. Only one of before or after clauses may be provided.
Clones an existing rule to make a replica, but you can add options to change any parameters of the rule.
applies_cloned_rule 'Some::SuperClass::Parser/some_rule' => ( add_precondition => '1; # add some tests returning boolean', before => 'MayBe::Another::Superclass::Parser/some_other_rule', ## Or even 'Some::SuperClass::Parser/another_rule' do => '## Change the do clause of original rule', );
The first argument must be a string containing the rule name to be cloned. You may clone a superclass rule, or even a rule from another class that you have only used in your code, but are not actually inheriting (using extends). You may even clone a rule from the present class if the rule has been defined already. If the rule name specified contains a class name, then the exact rule is cloned, modified according to other clauses, and inserted into the rule order. But if the rule name specified does not have a classname, then the function looks for a rule with that name in the current class, and clones that one.
use
You may use one of the before or after clauses just like in applies_rule. You may use any of the other rule creation options like if, do, continue_to_next, or dont_record. And you may optionally also use the add_precondition clause. In many cases, you may not need any of the rule-creation options at all and may use only add_precondition or any one of before or after clauses. If you do use any of the rule-creating options like do or if, then it will change those fields of the cloned copy of the original rule.
applies_rule
if
do
continue_to_next
dont_record
add_precondition
Note that when you clone a rule, you do not change the original rule itself. You actually make a second copy and modify that. So you retain the original rule along with the clone.
The new cloned rule created is automatically renamed by applies_cloned_rule. If a rule Some::Other::Class/my_rule_1 is cloned into your parser class MyFavorite::Parser, then the clone is named MyFavorite::Parser/my_rule_1. This way, the original rule is left unaffected. If such a name already exists, then the clone adds @2 suffix to the name, viz., MyFavorite::Parser/my_rule_1@2. If that also exists, it will be called MyFavorite::Parser/my_rule_1@3. And so on it goes on incrementing.
applies_cloned_rule
Some::Other::Class/my_rule_1
MyFavorite::Parser/my_rule_1
MyFavorite::Parser/my_rule_1@2
MyFavorite::Parser/my_rule_1@3
Takes a list of rule names, or regular expression patterns, or subroutine references to identify rules that are to be disabled. You cannot disable rules of the same class.
A string argument is expected to contain the full rule-name (including class name) in the format My::Parser::Class/my_rule. The / (slash) separating the class name and rule name is mandatory.
My::Parser::Class/my_rule
/
A regexp argument is tested against the full rule-name.
If a subroutine reference is provided, the subroutine is called for each rule in the class, and the rule is disabled if the subroutine returns a true value.
disables_superclass_rules qw(Parent::Parser::Class/parent_rule Another::Class/another_rule); disables_superclass_rules qr/Parent::Parser::Class\/comm.*/; disables_superclass_rules sub { my $rulename = shift; $rulename =~ /[@]/; };
This function may be used if one wants to specify a custom line-unwrapping routine. Takes a hash argument with mandatory keys as follows:
unwraps_lines_using( is_wrapped => sub { # Should return a boolean for each $line 1; }, unwrap_routine => sub { # Should return a string for each $last and $line my ($self, $last, $line) = @_; $last.$line; }, );
For the pair of routines to not cause unexpected undef results, they should return defined values always. To effectively unwrap lines, the is_wrapped routine should return a boolean 1 when it encounters the continuation character, and unwrap_routine should return a string that appropriately joins the last and current line together.
undef
is_wrapped
1
unwrap_routine
Text::Parser::Manual::ExtendedAWKSyntax - Read this manual to learn how to do cool things with this class
Text::Parser::Error - there is a change in how exceptions are thrown by this class. Read this page for more information.
Please report any bugs or feature requests on the bugtracker website http://github.com/balajirama/Text-Parser/issues
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
Balaji Ramasubramanian <balajiram@cpan.org>
This software is copyright (c) 2018-2019 by Balaji Ramasubramanian.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Text::Parser, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Parser
CPAN shell
perl -MCPAN -e shell install Text::Parser
For more information on module installation, please visit the detailed CPAN module installation guide.