The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

freqtable - Print frequency table of lines/words/characters/bytes/numbers

VERSION

This document describes version 0.008 of freqtable (from Perl distribution App-freqtable), released on 2023-12-28.

SYNOPSIS

 % freqtable [OPTIONS] < INPUT

Sample input:

 % cat input-lines.txt
 one
 one
 two
 three
 four
 five
 five
 five
 six
 seven
 eight
 eight
 nine

 % cat input-words.txt
 one one two three four five five five six seven eight eight nine

 % cat input-nums.txt
 9.99 cents
 9.99 dollars
 9 cents

Modes

Display frequency table (by default: lines):

 % freqtable input-lines.txt
 3       five
 2       eight
 2       one
 1       four
 1       nine
 1       seven
 1       six
 1       three
 1       two

Display frequency table (words):

 % freqtable -w input-words.txt
 3       five
 2       eight
 2       one
 1       four
 1       nine
 1       seven
 1       six
 1       three
 1       two

Display frequency table (characters):

 % freqtable -c input-words.txt
 12
 12      e
  7      i
  5      n
  4      f
  4      o
  4      t
  4      v
  3      h
  2      g
  2      r
  2      s
  1

  1      u
  1      w
  1      x

Display frequency table (nums):

 % freqtable -n input-nums.txt
 2      9.99
 1      9

Display frequency table (integers):

 % freqtable -i input-nums.txt
 3      9

-F option

Don't display the frequencies:

 % freqtable -F input-lines.txt
 five
 eight
 one
 four
 nine
 seven
 six
 three
 two

Filter by frequencies

Only display lines that appear three times:

 % freqtable -F input-lines.txt --freq 3
 3       five

Only display lines that appear more than once:

 % freqtable -F input-lines.txt --freq 2-
 3       five
 2       eight
 2       one

Only display lines that appear less than three times:

 % freqtable -F input-lines.txt --freq -2
 2       eight
 2       one
 1       four
 1       nine
 1       seven
 1       six
 1       three
 1       two

Sorting

Instead of the default sorting by frequency (descending order), if you specify --sort-sub (and optionally one or more --sort-arg) you can sort by the keys using one of Sort::Sub::* subroutines. Examples:

 # sort by keys, asciibetically
 % freqtable -F input-lines.txt --sort-sub asciibetically
 2       eight
 3       five
 1       four
 1       nine
 2       one
 1       seven
 1       six
 1       three
 1       two

 # sort by keys, asciibetically (descending order)
 % freqtable -F input-lines.txt --sort-sub 'asciibetically<r>'
 1       two
 1       three
 1       six
 1       seven
 2       one
 1       nine
 1       four
 3       five
 2       eight

 # sort by keys, randomly using perl code (essentially, shuffling)
 % freqtable -F input-lines.txt --sort-sub 'by_perl_code' --sort-arg 'code=int(rand()*3)-1'
 3       five
 1       three
 2       eight
 1       seven
 2       one
 1       six
 1       nine
 1       two
 1       four

DESCRIPTION

This utility counts the occurences of lines (or words/characters) in the input then display each unique lines along with their number of occurrences. You can also instruct it to only show lines that have a specified number of occurrences.

You can use the following Unix command to count occurences of lines:

 % sort input-lines.txt | uniq -c | sort -nr

and with a bit more work you can also use a combination of existing Unix commands to count occurrences of words/characters, as well as filter items that have a specified number of occurrences; freqtable basically offers convenience.

EXIT CODES

0 on success.

255 on I/O error.

99 on command-line options error.

OPTIONS

  • --bytes, -c

  • --chars, -m

  • --words, -w

  • --lines, -l

  • --number, -n

    Treat each line as a number. A line like this:

     9.99 cents

    will be regarded as:

     9.99
  • --integer, -i

    Treat each line as an integer. A line like this:

     9.99 cents

    will be regarded as:

     9
  • --ignore-case, -f

  • --no-print-freq, -F

    Will not print the frequencies.

  • --freq=s

    Filter by frequencies. N (e.g. --freq 5) means only display items that occur N times. M-N (e.g. --freq 5-10) means only display items that occur between M and N times. M- (e.g. --freq 5-) means only display items that occur at least M times. -N (e.g. --freq -10) means only display items that occur at most N times.

  • --sort-sub=s

    This will cause freqtable to sort by key name instead of by frequencies. You pass this option to specify a Sort::Sub routine, which is the name of a Sort::Sub::* module without the Sort::Sub:: prefix, e.g. asciibetically. The name can optionally be followed by <i>, or <r>, or <ir> to mean case-insensitive sorting, reverse order, and reverse order case-insensitive sorting, respectively. When you use one of these suffixes on the command-line, remember to quote since < and > can be intereprted by shell.

    Examples:

     asciibetically
     asciibetically<i>
     by_length<r>
  • --sort-arg=ARGNAME=ARGVALUE

    Pass argument(s) to the sort subroutine. Can be specified multiple times, once for every argument.

  • -a

    Shortcut for --sort=asciibetically.

  • --percent, -p

    Show frequencies as percentages.

FAQ

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/App-freqtable.

SOURCE

Source repository is at https://github.com/perlancar/perl-App-freqtable.

SEE ALSO

Unix commands wc, sort, uniq

wordstat from App::wordstat

csv-freqtable from App::CSVUtils

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

 % prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2023, 2022, 2018 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-freqtable

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.