The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

nauniq - Non-adjacent uniq

VERSION

This document describes version 0.09 of nauniq (from Perl distribution App-nauniq), released on 2015-07-30.

SYNOPSIS

 nauniq [OPTION]... [INPUT [OUTPUT]]

DESCRIPTION

nauniq is similar to the Unix command uniq but detects repeated lines even if they are not adjacent. To do this, nauniq must remember the lines being fed to it. It's basically a glorified form of something like these:

 % awk '!mem[$0]++' INPUT
 % perl -ne'print unless $mem{$_}++' INPUT

There are options to control memory usage: option to only remember a certain number of unique lines, option to remember a certain number of characters for each line, and option to only remember the MD5 hash (instead of the content) of each line.

OPTIONS

  • --repeated, -d

    Print only duplicate lines. The opposite of --unique.

  • --ignore-case, -i

    Ignore case.

  • --num-entries=N

    Number of unique entries to remember. The default is -1 (unlimited). This option is to control memory usage, but the consequence is that lines that are too far apart will be forgotten.

  • --skip-chars=N, -s

    Number of characters from the beginning of line to skip when checking uniqueness.

  • --unique, -u

    Print only unique lines. This is the default. The opposite of --repeated.

  • --check-chars=N, -w

    The amount of characters to check for uniqueness. The default is -1 (check all characters in a line).

  • --append

    Open output file in append mode. See also -a.

  • -a

    Equivalent to --append --read-output.

  • --forget-pattern=S

    This is an alternative to --num-entries. Instead of instructing nauniq to remember only a fixed number of entries, you can specify a regex pattern to trigger the forgetting the lines. An example use-case of this is when you have a file like this:

     * entries for 2014-03-13
     foo
     bar
     baz
     * entries for 2014-03-14
     foo
     baz

    and you want unique lines for each day (in which you'll specify --forget-pattern '^\*').

  • --md5

    Remember the MD5 hash instead of the actual characters of the line. Might be useful to reduce memory usage if the lines are long.

  • --read-output

    Whether to read output file first. This option works only with --append and is usually used via -a to append lines to file if they do not exist yet in the file.

EXIT CODES

0 on success.

255 on I/O error.

99 on command-line options error.

FAQ

How do I append lines to a file only if they do not exist in the file?

You cannot do this with uniq:

 % ( cat FILE ; produce-lines ) | uniq - FILE
 % ( cat FILE ; produce-lines ) | uniq >> FILE

as it will clobber the file first. But you can do this with nauniq:

 % produce-lines | nauniq -a - FILE

SEE ALSO

uniq

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/App-nauniq.

SOURCE

Source repository is at https://github.com/perlancar/perl-App-nauniq.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=App-nauniq

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

perlancar <perlancar@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2015 by perlancar@cpan.org.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.