File::Canonicalizer - ASCII file canonicalizer
use File::Canonicalizer; $aref = [ 'replaced_pattern1', 'replacement1', 'replaced_pattern2', 'replacement2', ... ]; file_canonicalizer ('input_file','canonical_output_file', '',4,5,6,7,8,9,10, $aref);
Sometimes files must be compared semantically, that is their contents, not their forms are to be compared. Following two files have different forms, but contain identical information:
file_A
First name - Barack Last name - Obama Birth Date - 1961/8/4 Profession - President
file_B
last name : Obama first name: Barack profession: president # not sure Birth Date: 1961/08/04
Some differences between forms of these files are:
arbitrary line order
arbitrary character cases
arbitrary leading zeroes for numbers
arbitrary amounts of white characters
arbitrary comments
arbitrary empty lines
field separators
Using file_canonicalizer allows one to simplify both of these files, so that they can be compared with each other.
file_canonicalizer ( <input_file> # 1 default is STDIN , <output_file> # 2 default is STDOUT , remove_comments_started_with_<regular_express> # 3 if empty, ignore comments , 'replace_adjacent_tabs_and_spaces_with_1_space'# 4 , 'replace_adjacent_slashes_with_single_slash' # 5 , 'remove_white_characters_from_line_edges' # 6 , 'remove_empty_lines' # 7 , 'convert_to_lower_cased' # 8 , 'remove_leading_zeroes_in_numbers' # 9 , 'sort_lines_lexically' #10 , array_reference_to_pairs_replaced_replacement #11 );
All parameters, beginning with the 3rd, are interpreted as Boolean values true or false. A corresponding action will be executed only if its parameter value is true. This means, that each of literals between apostrophes '' can be shortened to single arbitrary character or digit 1-9.
List of parameters can be shortened, that is any amount of last parameters can be skipped. In this case the actions, corresponding skipped parameters, will not be executed.
Read from STDIN, write to STDOUT and remove all substrings, beginning with '#' :
file_canonicalizer ('','','#');
Create canonicalized cron table (on UNIX/Linux) in any of equivalent examples:
file_canonicalizer('path/cron_table','/tmp/cron_table.canonic','#',4,5,'e','empty_lin','',9,'sort'); file_canonicalizer('path/cron_table','/tmp/cron_table.canonic','#',4,5, 6, 7, '',9, 10); file_canonicalizer('path/cron_table','/tmp/cron_table.canonic','#',1,1, 1, 1, '',1, 1);
Canonicalization of files 'file_A' and 'file_B', shown in the section "DESCRIPTION":
file_canonicalizer('file_A','file_A.canonic','#',1,5,1,1,1,1,10, ['\s*-\s*',' : ', '^','<', '$','>']); file_canonicalizer('file_B','file_B.canonic','#',1,5,1,1,1,1,10, ['\s*:\s*',' : ', '^','<', '$','>']);
creates two identical files 'file_A.canonic' and 'file_B.canonic':
<birth date : 1961/8/4> <first name : barack> <last name : obama> <profession : president>
Mart E. Rivilis, rivilism@cpan.org
Please report any bugs or feature requests to bug-file-canonicalizer@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=File-Canonicalizer. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc File::Canonicalizer
You can also look for information at:
RT: CPAN's request tracker (report bugs here) http://rt.cpan.org/NoAuth/Bugs.html?Dist=File-Canonicalizer
AnnoCPAN: Annotated CPAN documentation http://annocpan.org/dist/File-Canonicalizer
CPAN Ratings http://cpanratings.perl.org/d/File-Canonicalizer
Search CPAN http://search.cpan.org/dist/File-Canonicalizer/
Copyright 2013 Mart E. Rivilis.
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0).
To install File::Canonicalizer, copy and paste the appropriate command in to your terminal.
cpanm
cpanm File::Canonicalizer
CPAN shell
perl -MCPAN -e shell install File::Canonicalizer
For more information on module installation, please visit the detailed CPAN module installation guide.