The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Fabnewsru::Utils - Some useful methods for operating with Mojo::DOM objects

VERSION

version 0.01

SYNOPSIS

    use Fabnews::Utils qw(table2hash table2array_of_hashes merge_hashes);

    my $dom = Mojo::DOM->new('<div class="company-profile-table"><tr><td>key1</td><td>val1</td></tr></div>');
        warn Dumper table2hash($dom, ".company-profile-table");   # { key1 => 'val1' }


    my $h = table2hash($url, $table_container);
        my $h = table2hash("http://fabnews.ru/fablabs/item/ufo/", ".company-profile-table");  # .company-profile-table - container with <table> that is needed to be parsed
        my $arr = table2array_of_hashes("http://fabnews.ru/fablabs/", "table", ["name", "fabnews_subscribers", "fabnews_rating"]);

METHODS

table2hash

Accepts as input Mojo::DOM object

Convert table to hash. Each row will be represented as key - value pair

Key will be text at first <td> element, value - at second <td>

Example

Table

header1 | header2 ---------------- key1 | value1 key2 | value2

will be processed into a hash

{ key1 => value1, key2 => value2}

Assuming that strigs in $dom it already in internal format and with UTF8 flag set

table2array_of_hashes

my $arr = table2array_of_hashes($container, $fields_arr);

$res = table2array_of_hashes($dom, ".company-profile-table", ["name", "fabnews_subscribers", "fabnews_rating"]); $res = table2array_of_hashes($dom, ".company-profile-table");

Convert table to list of hashes.

You can pass at $fields_arr how will be hash keys called.

Otherwise (if no array provided) hash keys will be take n from <th> tag of <thead>

Example

Table

header1 | header2 ---------------- key1 | value1 key2 | value2

will be processed into a hash

[ { header1 => key1, header2 => value1 }, { header1 => key2, header2 => val2 } ]

Also if there will be any urls in table cells it will create a hash key with array val

E.g.

header1 | header2 ---------------- key1 | value1 key2 + url | value2

Result will be like

[ { header1 => key1, header2 => value1 }, { header1 => key2, header2 => val2, urls => [] } ]

merge_hashes

Intellectual merge of two hashes

Return new hash with keys from first hash ($fields) and values from second hash ($values)

All input hashes must be in Perl internal encoding

Useful when substitution of hash keys containing some non-ASCII characters with ASCII-only latin characters which are more universal

See unit tests for more examples

rm_spec_symbols_from_string

Set of regular expressions which are deleting typical unwanted symbols from string:

* [\$#@~!&;:] characters * any number of whitespaces in the beginning of string * any number of whitespaces in the end of string * replace a lot of space symbols into one space

This function is useful when post-processing HTML parsing results (in fact not all results looks good without post-processing)

AUTHOR

Pavel Serikov <pavelsr@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2016 by Pavel Serikov.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.