NAME

File::ByLine - Line-by-line file access loops

VERSION

version 1.003

SYNOPSIS

  use File::ByLine;

  #
  # Execute a routine for each line of a file
  #
  dolines { say "Line: $_" } "file.txt";
  forlines "file.txt", { say "Line: $_" };

  #
  # Grep (match) lines of a file
  #
  my (@result) = greplines { m/foo/ } "file.txt";

  #
  # Apply a function to each line and return result
  #
  my (@result) = maplines { lc($_) } "file.txt";

  #
  # Parallelized forlines/dolines routines
  #
  parallel_dolines { foo($_) } "file.txt", 10;
  parallel_forlines "file.txt", 10, { foo($_); };

  #
  # Parallelized maplines and greplines
  #
  my (@result) = parallel_greplines { m/foo/ } "file.txt", 10;
  my (@result) = parallel_maplines  { lc($_) } "file.txt", 10;

  #
  # Read an entire file, split into lines
  #
  my (@result) = readlines "file.txt";

DESCRIPTION

Finding myself writing the same trivial loops to read files, or relying on modules like Perl6::Slurp that didn't quite do what I needed (abstracting the loop), it was clear something easy, simple, and sufficiently Perl-ish was needed.

FUNCTIONS

dolines

  dolines { say "Line: $_" } "file.txt";
  dolines \&func, "file.txt";

This function calls a coderef once for each line in the file. The file is read line-by-line, removes the newline character(s), and then executes the coderef.

Each line (without newline) is passed to the coderef as the first parameter and only parameter to the coderef. It is also placed into $_.

This function returns the number of lines in the file.

This is similar to forlines(), except for order of arguments. The author recommends this form for short code blocks - I.E. a coderef that fits on one line. For longer, multi-line code blocks, the author recommends the forlines() syntax.

forlines

  forlines "file.txt", { say "Line: $_" };
  forlines "file.txt", \&func;

This function calls a coderef once for each line in the file. The file is read line-by-line, removes the newline character(s), and then executes the coderef.

Each line (without newline) is passed to the coderef as the first parameter and only parameter to the coderef. It is also placed into $_.

This function returns the number of lines in the file.

This is similar to dolines(), except for order of arguments. The author recommends this when using longer, multi-line code blocks, even though it is not orthogonal with the maplines()/greplines() routines.

parallel_dolines

  my (@result) = parallel_dolines { foo($_) } "file.txt", 10;

Three parameters are requied: a codref, a filename, and number of simultanious child threads to use.

This function performs similar to dolines(), except that it does its' operations in parallel using fork() and Parallel::WorkUnit. Because the code in the coderef is executed in a child process, any changes it makes to variables in high scopes will not be visible outside that single child. In general, it will be safest to not modify anything that belongs outside this scope.

Note that the file will be read in several chunks, with each chunk being processed in a different thread. This means that the child threads may be operating on very different sections of the file simultaniously and no specific order of execution of the coderef should be expected!

Because of the mechanism used to split the file into chunks for processing, each thread may process a somewhat different number of lines. This is particularly true if there are a mix of very long and very short lines. The splitting routine splits the file into roughly equal size chunks by byte count, not line count.

Otherwise, this function is identical to dolines(). See the documentation for dolines() or forlines() for information about how this might differ from parallel_forlines().

parallel_forlines

  my (@result) = parallel_forlines "file.txt", 10, { foo($_) };

Three parameters are requied: a filename, a codref, and number of simultanious child threads to use.

This function performs similar to forlines(), except that it does its' operations in parallel using fork() and Parallel::WorkUnit. Because the code in the coderef is executed in a child process, any changes it makes to variables in high scopes will not be visible outside that single child. In general, it will be safest to not modify anything that belongs outside this scope.

Note that the file will be read in several chunks, with each chunk being processed in a different thread. This means that the child threads may be operating on very different sections of the file simultaniously and no specific order of execution of the coderef should be expected!

Because of the mechanism used to split the file into chunks for processing, each thread may process a somewhat different number of lines. This is particularly true if there are a mix of very long and very short lines. The splitting routine splits the file into roughly equal size chunks by byte count, not line count.

Otherwise, this function is identical to forlines(). See the documentation for forlines() or dolines() for information about how this might differ from parallel_dolines().

greplines

  my (@result) = greplines { m/foo/ } "file.txt";

This function calls a coderef once for each line in the file, and, based on the return value of that coderef, returns only the lines where the coderef evaluates to true. This is similar to the grep built-in function, except operating on file input rather than array input.

Each line (without newline) is passed to the coderef as the first parameter and only parameter to the coderef. It is also placed into $_.

This function returns the lines for which the coderef evaluates as true.

parallel_greplines

  my (@result) = parallel_greplines { m/foo/ } "file.txt", 10;

Three parameters are requied: a coderef, filename, and number of simultanious child threads to use.

This function performs similar to greplines(), except that it does its' operations in parallel using fork() and Parallel::WorkUnit. Because the code in the coderef is executed in a child process, any changes it makes to variables in high scopes will not be visible outside that single child. In general, it will be safest to not modify anything that belongs outside this scope.

If a large amount of data is returned, the overhead of passing the data from child to parents may exceed the benefit of parallelization. However, if there is substantial line-by-line processing, there likely will be a speedup, but trivial loops will not speed up.

Note that the file will be read in several chunks, with each chunk being processed in a different thread. This means that the child threads may be operating on very different sections of the file simultaniously and no specific order of execution of the coderef should be expected! However, the results will be returned in the same order as greplines() would return them.

Because of the mechanism used to split the file into chunks for processing, each thread may process a somewhat different number of lines. This is particularly true if there are a mix of very long and very short lines. The splitting routine splits the file into roughly equal size chunks by byte count, not line count.

Otherwise, this function is identical to greplines().

maplines

  my (@result) = maplines { lc($_) } "file.txt";

This function calls a coderef once for each line in the file, and, returns an array of return values from those calls. This follows normal Perl rules - basically if the coderef returns a list, all elements of that list are added as distinct elements to the return value array. If the coderef returns an empty list, no elements are added.

Each line (without newline) is passed to the coderef as the first parameter and only parameter to the coderef. It is also placed into $_.

This is meant to be similar to the built-in map function.

Because of the mechanism used to split the file into chunks for processing, each thread may process a somewhat different number of lines. This is particularly true if there are a mix of very long and very short lines. The splitting routine splits the file into roughly equal size chunks by byte count, not line count.

This function returns the lines for which the coderef evaluates as true.

parallel_maplines

  my (@result) = parallel_maplines { lc($_) } "file.txt", 10;

Three parameters are requied: a coderef, filename, and number of simultanious child threads to use.

This function performs similar to maplines(), except that it does its' operations in parallel using fork() and Parallel::WorkUnit. Because the code in the coderef is executed in a child process, any changes it makes to variables in high scopes will not be visible outside that single child. In general, it will be safest to not modify anything that belongs outside this scope.

If a large amount of data is returned, the overhead of passing the data from child to parents may exceed the benefit of parallelization. However, if there is substantial line-by-line processing, there likely will be a speedup, but trivial loops will not speed up.

Note that the file will be read in several chunks, with each chunk being processed in a different thread. This means that the child threads may be operating on very different sections of the file simultaniously and no specific order of execution of the coderef should be expected! However, the results will be returned in the same order as maplines() would return them.

Otherwise, this function is identical to maplines().

readlines

  my (@result) = readlines "file.txt";

This function simply returns an array of lines (without newlines) read from a file.

AUTHOR

Joelle Maslak <jmaslak@antelope.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by Joelle Maslak.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.