The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Emacs::Rep - find & replace backend for rep.pl and in-turn rep.el

SYNOPSIS

  use Emacs::Rep qw( do_finds_and_reps  parse_perl_substitutions );

   my $substitutions =>>'END_S';
      s/jerk/iconoclast/
      s/conniving/shrewd/
      s/(t)asteless/$1alented/i
  END_S

  my $find_replaces_aref =
    parse_perl_substitutions( \$substitutions );

  my $locations_aref =
        do_finds_and_reps( \$text, $find_replaces_aref );

DESCRIPTION

Emacs::Rep is a module that acts as a back-end for the rep.pl script which in turn is used by the emacs library. rep.el.

It's purpose is to perform multiple perl substitution commands (e.g. s///g) on a given file, using emacs to interactively display and control the changes.

The end user isn't expected to need to use these routines (or even the rep.pl script) directly.

An application programmer might use these to add support for some other interactive front-end.

EXPORT

None by default. Any of the following may be requested (or all with the ':all' tag).

do_finds_and_reps

Does a series of finds and replaces on some text and returns the beginning and end points of each of the modfied regions, along with some other information about the matches.

Takes two arguments:

(1) A *reference* to the text to be modified. (2) A series of find and replace pairs in the form of an aref of arefs, e.g.

  $find_replaces_aref =
   [ ['jerk',            'iconoclast'],
     ['conniving',       'shrewd'].
     ['(?i)(t)asteless', '$1alented'].
   ]:

Example usage:

$locations_aref = do_finds_and_reps( \$text, $find_replaces_aref );

The returned history is an aref of aref of arefs, e.g.

 [
  [ [ 3,       9,   -4,  'alpha'],
    [ 39,     47,   10,  'ralpha'],
    [ 111,   130,    0,  'XXX'],
    [ 320,   332,  -33,  'blvd'],
  ],
  [ [ 12,     23,   6,  'widget'],
    [ 33,     80,   6,  'wadget'],
    [ 453,   532,   6,  'wandat'],
  ],
 ]

Each sub-array here contains the locations of changes made by each substitution, where each change is recorded as start and end points in the form of integers specifying the number of the character counting from the start of the file, where the first character is 1.

The third integer is the "delta", the change in length of the string after modification.

The fourth field is the the string that was matched, before it was modified.

These changed locations are recorded *during* each pass, which means that later passes can mess up the numbering. We then compensate for this internally, using the recorded deltas. See revise_locations.

revise_locations

Example usage (note, revises structure in-place):

  revise_locations( $locs );

Compensates for a problem in the change history recorded by do_finds_and_reps.

Later passes with another substitution command can move around the modified strings from previous passes.

This routine does some numerical magic, re-interpreting previous passes in the light of later ones.

An example of a change history:

 [
  [ [ 3,       9,   -4,  'alpha'],
    [ 39,     47,   10,  'ralpha'],
    [ 111,   130,    0,  'XXX'],
    [ 320,   332,  -33,  'blvd'],
  ],
  [ [ 12,     23,   6,  'widget'],
    [ 33,     80,   6,  'wadget'],
    [ 453,   532,   6,  'wandat'],
  ],
 ]

Given this data, we can see that the first pass needs to be shifted forward by a delta of 6, acting at the end-point of each changed region.

So any locations after 23 need to have 6 added to them (and locations after 80 need another 6 and ones after 532 -- if there were any -- would need another 6).

flatten_locs

Serialize the locations data structure into a text form to be passed to emacs.

The result is a block of text, where each line has four integers separated by colons, in this order:

  <pass>:<beg>:<end>:<delta>:<orig>;

The fields:

  pass  -- line number of the substitution command that made the change
  beg   -- beginning of the modified string, integer count starting at 1
  end   -- ending of the modified string, integer count starting at 1
  delta -- the change in character length due to the substitution
  orig  -- the original string that was replaced.

The trailine semi-colon in this format allows it to work easily on strings with embedded newlines, and embedded semi-colons as well. However, an embedded semi-colon with an immediately following embedded newline *must* be backslash escaped. This routine just escapes all semi-colons in this field.

TODO move this documentation to some place that talks about it as a data interchange format.

split_perl_substitutions

Split the text from the perl substitutions buffer up into an aref of individual strings, one for each substitution command.

Example usage:

  my $substitutions = split_perl_substitutions( \$substitutions_text );

This routine could *almost* just be replaces with a split on newlines:

   my $s_ref = [ split '\n', $substitutions ];

Except that we'd like to allow for multi-line substitutions.

define_nonbracketed_s_scraper_pat

This routine returns a scraper pattern that can parse substitutions of the form s///, and the usual variants, e.g. s###ims. It works only on the non-bracketed style (i.e. something like the s{}{} form must be handled some other way).

Captures: $1 separator character (e.g. '/') $2 find pattern $3 replace string $4 modifiers

parse_perl_substitutions

Scrape various forms of perl s///, and return the find_replaces data structure used by do_finds_and_args.

Takes one argument, an aref of "s///" strings. The bracketed form (e.g. "s{}{}" is also supported), however the (somewhat obscure) mixed form is not, (i.e. "s//{}" won't work).

TODO check if still true: End of line comments beginning with a "#" are allowed. (At present, everything after the close of the substitution is just ignored).

Example usage:

my $substitutions =>>'END_S'; s/pointy-haired boss/esteemed leader/ s/death spiral/minor adjustment/ END_S

my $find_replaces_aref = parse_perl_substitutions( \$substitutions );

Where the returned data should look like:

   [ ['pointy-haired boss', 'esteemed leader'],
     ['death spiral',       'minor adjustment'],
   ]
accumulate_find_reps

Example usage:

 accumulate_find_reps( \@find_reps, $find, $rep, $raw_mods );
strip_brackets

Removes any balanced pair of surrounding bracket characters from the referenced string. Returns 1 for success, 0 for failure.

Text::Balanced's extract_bracketed, in it's infinite wisdom does not extract what's inside the brackets, but instead includes the brackets in the output. This is a utility to deal with this oddity.

Example usage:

 if( strip_brackets( \$string ) ) {
    print "brackets removed, see: $string\n";
 }
dequote

Removes backwhack quoting, but only from the single character supplied as a second argument.

Operates on a string reference, modifying it in place.

Example usage:

  $find = '\/home\/doom';
  dequote( \$find, '/' );

  # $find now '/home/doom';

(Sometimes it's easier to roll your own that to find someone else's.) (Sometimes.)

SEE ALSO

This is the back-end for the script rep.pl which in turn is the back-end for the emacs lisp code rep.el.

If rep.el is not installed, look in the "elisp" sub-directory of this CPAN package.

A good discussion forum for projects such as this is:

  http://groups.google.com/group/emacs-perl-intersection

Web pages related to this can be found at:

  http://obsidianrook.com/rep

The code is available on github:

  http://github.com/doomvox/rep

AUTHOR

Joseph Brenner, <doom@kzsu.stanford.edu>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Joseph Brenner

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

BUGS

None reported... yet.