The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

cdif - word context diff

SYNOPSIS

cdif [option] file1 file2

cdif [option] [diff-data]

Options:

        -c, -Cn         context diff
        -u, -Un         unified diff
        -i              ignore case
        -b              ignore space change
        -w              ignore whitespace
        -t              expand tabs
        -T              initial tabs
        --rcs           use rcsdiff
        -r<rev>, -q     rcs options

        -B, --char          char-by-char comparison
        --diff=command      specify diff command
        --subdiff=command   specify backend diff command
        --stat              show statistical information
        --colormap=s        specify color map
        --[no]color         color or not            (default true)
        --[no]256           ANSI 256 color mode     (default true)
        --[no]commandcolor  color for command line  (default true)
        --[no]markcolor     color for diff mark     (default true)
        --[no]textcolor     color for normal text   (default true)
        --[no]unknowncolor  color for unknown text  (default true)
        --[no]old           print old text          (default true)
        --[no]new           print new text          (default true)
        --[no]command       print diff command line (default true)
        --[no]unknown       print unknown line      (default true)
        --[no]mark          print mark or not       (default true)
        --[no]graph         read git --graph output (default true)
        --[no]mecab         use mecab tokenizer     (default false)

DESCRIPTION

cdif is a post-processor of the Unix diff command. It highlights deleted, changed and added words based on word context.

You may want to compare character-by-character rather than word-by-word. Option -B option can be used for that purpose.

If only one file is specified, cdif reads that file (stdin if no file) as a output from diff command.

Lines those don't look like diff output are simply ignored and printed.

OPTIONS

-[cCuUibwtT]

Almost same as diff command.

--rcs, -rrev, -q

Use rcsdiff instead of normal diff. Option --rcs is not required when -rrev is supplied.

-B, --char

Compare the data character-by-character context.

--diff=command

Specify the diff command to use.

--subdiff=command

Specify the backend diff command to get word differences. Accept normal and unified diff format.

If you want to use git diff command, don't forget to set -U0 option.

    --subdiff="git diff -U0 --no-index --histogram"
--[no]color

Use ANSI color escape sequence for output.

--colormap=colormap, --cm=colormap

Basic colormap format is :

    FIELD=COLOR

where the FIELD is one from these :

    COMMAND  Command line
    OMARK    Old mark
    NMARK    New mark
    OTEXT    Old text
    NTEXT    New text
    OCHANGE  Old change part
    NCHANGE  New change part
    APPEND   Appended part
    DELETE   Deleted part

and additional Common and Merged FIELDs for git-diff combined format.

    CMARK    Common mark
    CTEXT    Common text
    MMARK    Merged mark
    MTEXT    Merged text

You can make multiple fields same color joining them by = :

    FIELD1=FIELD2=...=COLOR

Also wildcard can be used for field name :

    *CHANGE=BDw

Multiple fields can be specified by repeating options

    --cm FILED1=COLOR1 --cm FIELD2=COLOR2 ...

or combined with comma (,) :

    --cm FILED1=COLOR1,FIELD2=COLOR2, ...

Color specification is a combination of single uppercase character representing 8 colors :

    R  Red
    G  Green
    B  Blue
    C  Cyan
    M  Magenta
    Y  Yellow
    K  Black
    W  White

and alternative (usually brighter) colors in lowercase :

    r, g, b, c, m, y, k, w

or RGB values and 24 grey levels if using ANSI 256 or full color terminal :

    (255,255,255)      : 24bit decimal RGB colors
    #000000 .. #FFFFFF : 24bit hex RGB colors
    #000    .. #FFF    : 12bit hex RGB 4096 colors
    000 .. 555         : 6x6x6 RGB 216 colors
    L00 .. L25         : Black (L00), 24 grey levels, White (L25)

    Begining # can be omitted in 24bit RGB notation.

    When values are all same in 24bit or 12bit RGB, it is converted to 24 grey level, otherwise 6x6x6 216 color.

or color names enclosed by angle bracket :

    <red> <blue> <green> <cyan> <magenta> <yellow>
    <aliceblue> <honeydue> <hotpink> <mooccasin>
    <medium_aqua_marine>

with other special effects :

    Z  0 Zero (reset)
    D  1 Double-struck (boldface)
    P  2 Pale (dark)
    I  3 Italic
    U  4 Underline
    F  5 Flash (blink: slow)
    Q  6 Quick (blink: rapid)
    S  7 Stand-out (reverse video)
    V  8 Vanish (concealed)
    J  9 Junk (crossed out)

    E    Erase Line

    ;    No effect
    X    No effect
    /    Toggle foreground/background
    ^    Reset to foreground

At first the color is considered as foreground, and slash (/) switches foreground and background. If multiple colors are given in the same spec, all indicators are produced in the order of their presence. Consequently, the last one takes effect.

If the spec start with plus (+) or minus (-) character, following characters are appneded/deleted from previous value. Reset mark (^) is inserted before appended string.

Effect characters are case insensitive, and can be found anywhere and in any order in color spec string. Because X and ; takes no effect, you can use them to improve readability, like SxD;K/544.

Defaults are :

    COMMAND => "555/222E"
    OMARK   => "CS"
    NMARK   => "MS"
    OTEXT   => "C"
    NTEXT   => "M"
    OCHANGE => "K/445"
    NCHANGE => "K/445"
    DELETE  => "K/544"
    APPEND  => "K/544"

    CMARK   => "GS"
    MMARK   => "YS"
    CTEXT   => "G"
    MTEXT   => "Y"

This is equivalent to :

    cdif --cm 'COMMAND=555/222E,OMARK=CS,NMARK=MS' \
         --cm 'OTEXT=C,NTEXT=M,*CHANGE=BD/445,DELETE=APPEND=RD/544' \
         --cm 'CMARK=GS,MMARK=YS,CTEXT=G,MTEXT=Y'
--[no]commandcolor, --cc
--[no]markcolor, --mc
--[no]textcolor, --tc
--[no]unknowncolor, --uc

Enable/Disable using color for the corresponding field.

--[no]old, --[no]new

Print or not old/new text in diff output.

--[no]command

Print or not command lines preceding diff output.

--[no]unknown

Print or not lines not look like diff output.

--[no]mark

Print or not marks at the top of diff output lines. At this point, this option is effective only for unified diff.

Next example produces the output exactly same as new except visual effects.

    cdif -U100 --no-mark --no-old --no-command --no-unknown old new

These options are prepared for watchdiff(1) command.

--[no]graph

Process git --graph option.

--[no]mecab

Use mecab command as a tokenizer. External command mecab is required.

--stat

Print statistical information at the end of output. It shows number of total appended/deleted/changed words in the context of cdif. It's common to have many insertions and deletions of newlines because of text filling process. So normal information is followed by modified number which ignores insert/delete newlines.

ENVIRONMENT

Environment variable CDIFOPTS is used to set default options.

AUTHOR

Kazumasa Utashiro
https://github.com/kaz-utashiro/sdif-tools

LICENSE

Copyright 1992-2019 Kazumasa Utashiro

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

sdif(1), watchdiff(1)

Getopt::EX::Colormap

BUGS

cdif is naturally not very fast because it uses normal diff command as a back-end processor to compare words.