The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Regexp::Common::comment -- provide regexes for comments.

SYNOPSIS

    use Regexp::Common qw /comment/;

    while (<>) {
        /$RE{comment}{C}/       and  print "Contains a C comment\n";
        /$RE{comment}{C++}/     and  print "Contains a C++ comment\n";
        /$RE{comment}{PHP}/     and  print "Contains a PHP comment\n";
        /$RE{comment}{Java}/    and  print "Contains a Java comment\n";
        /$RE{comment}{Perl}/    and  print "Contains a Perl comment\n";
        /$RE{comment}{awk}/     and  print "Contains an awk comment\n";
        /$RE{comment}{HTML}/    and  print "Contains an HTML comment\n";
    }

    use Regexp::Common qw /comment RE_comment_HTML/;

    while (<>) {
        $_ =~ RE_comment_HTML() and  print "Contains an HTML comment\n";
    }

DESCRIPTION

Please consult the manual of Regexp::Common for a general description of the works of this interface.

Do not use this module directly, but load it via Regexp::Common.

This modules gives you regular expressions for comments in various languages.

THE LANGUAGES

Below, the comments of each of the languages are described. The patterns are available as $RE{comment}{LANG}, foreach language LANG. Some languages have variants; it's described at the individual languages how to get the patterns for the variants. Unless mentioned otherwise, {-keep} sets $1, $2, $3 and $4 to the entire comment, the opening marker, the content of the comment, and the closing marker (for many languages, the latter is a newline) respectively.

Ada

Comments in Ada start with --, and last till the end of the line.

Advisor

Advisor is a language used by the HP product glance. Comments for this language start with either # or //, and last till the end of the line.

Advsys

Comments for the Advsys language start with ; and last till the end of the line. See also http://www.wurb.com/if/devsys/12.

Alan

Alan comments start with --, and last till the end of the line. See also http://w1.132.telia.com/~u13207378/alan/manual/alanTOC.html.

Algol 60

Comments in the Algol 60 language start with the keyword comment, and end with a ;. See http://www.masswerk.at/algol60/report.htm.

Algol 68

In Algol 68, comments are either delimited by #, or by one of the keywords co or comment. The keywords should not be part of another word. See http://westein.arb-phys.uni-dortmund.de/~wb/a68s.txt. With {-keep}, only $1 will be set, returning the entire comment.

ALPACA

The ALPACA language has comments starting with /* and ending with */.

awk

The awk programming language uses comments that start with # and end at the end of the line.

B

The B language has comments starting with /* and ending with */.

BASIC

There are various forms of BASIC around. Currently, we only support the variant supported by mvEnterprise, whose pattern is available as $RE{comment}{BASIC}{mvEnterprise}. Comments in this language start with a !, a * or the keyword REM, and end till the end of the line. See http://www.rainingdata.com/products/beta/docs/mve/50/ReferenceManual/Basic.pdf.

Beatnik

The estoric language Beatnik only uses words consisting of letters. Words are scored according to the rules of Scrabble. Words scoring less than 5 points, or 18 points or more are considered comments (although the compiler might mock at you if you score less than 5 points). Regardless whether {-keep}, $1 will be set, and set to the entire comment. This pattern requires perl 5.8.0 or newer.

beta-Juliet

The beta-Juliet programming language has comments that start with // and that continue till the end of the line. See also http://www.catseye.mb.ca/esoteric/b-juliet/index.html.

Befunge-98

The estoric language Befunge-98 uses comments that start and end with a ;. See http://www.catseye.mb.ca/esoteric/befunge/98/spec98.html.

Brainfuck

The minimal language Brainfuck uses only eight characters, <, >, [, ], +, -, . and ,. Any other characters are considered comments. With {-keep}, $1 is set to the entire comment.

C

The C language has comments starting with /* and ending with */.

C++

The C++ language has two forms of comments. Comments that start with // and last till the end of the line, and comments that start with /*, and end with */. If {-keep} is used, only $1 will be set, and set to the entire comment.

Crystal Report

The formula editor in Crystal Reports uses comments that start with //, and end with the end of the line.

Dylan

There are two types of comments in Dylan. They either start with //, or are nested comments, delimited with /* and */. Under {-keep}, only $1 will be set, returning the entire comment. This pattern requires perl 5.6.0 or newer.

Eiffel

Eiffel comments start with --, and last till the end of the line.

False

In False, comments start with { and end with }. See http://wouter.fov120.com/false/false.txt

FPL

The FPL language has two forms of comments. Comments that start with // and last till the end of the line, and comments that start with /*, and end with */. If {-keep} is used, only $1 will be set, and set to the entire comment.

Forth

Comments in Forth start with \, and end with the end of the line. See also http://docs.sun.com/sb/doc/806-1377-10.

Fortran

There are two forms of Fortran. There's free form Fortran, which has comments that start with !, and end at the end of the line. The pattern for this is given by $RE{Fortran}. Fixed form Fortran, which has been obsoleted, has comments that start with C, c or * in the first column, or with ! anywhere, but the sixth column. The pattern for this are given by $RE{Fortran}{fixed}.

See also http://www.cray.com/craydoc/manuals/007-3692-005/html-007-3692-005/.

Funge-98

The estoric language Funge-98 uses comments that start and end with a ;.

fvwm2

Configuration files for fvwm2 have comments starting with a # and lasting the rest of the line.

Haifu

Haifu, an estoric language using haikus, has comments starting and ending with a ,. See http://www.dangermouse.net/esoteric/haifu.html.

Haskell

There are two types of comments in Haskell. They either start with at least two dashes, or are nested comments, delimited with {- and -}. Under {-keep}, only $1 will be set, returning the entire comment. This pattern requires perl 5.6.0 or newer.

HTML

In HTML, comments only appear inside a comment declaration. A comment declaration starts with a <!, and ends with a >. Inside this declaration, we have zero or more comments. Comments starts with -- and end with --, and are optionally followed by whitespace. The pattern $RE{comment}{HTML} recognizes those comment declarations (and hence more than a comment). Note that this is not the same as something that starts with <!-- and ends with -->, because the following will be matched completely:

    <!--  First  Comment   --
      --> Second Comment <!--
      --  Third  Comment   -->

Do not be fooled by what your favourite browser thinks is an HTML comment.

If {-keep} is used, the following are returned:

$1

captures the entire comment declaration.

$2

captures the MDO (markup declaration open), <!.

$3

captures the content between the MDO and the MDC.

$4

captures the (last) comment, without the surrounding dashes.

$5

captures the MDC (markup declaration close), <.

Hugo

There are two types of comments in Hugo. They either start with ! (which cannot be followed by a \), or are nested comments, delimited with !\ and \!. Under {-keep}, only $1 will be set, returning the entire comment. This pattern requires perl 5.6.0 or newer.

ILLGOL

The estoric language ILLGOL uses comments starting with NB and lasting till the end of the line. See http://www.catseye.mb.ca/esoteric/illgol/index.html.

Java

The Java language has two forms of comments. Comments that start with // and last till the end of the line, and comments that start with /*, and end with */. If {-keep} is used, only $1 will be set, and set to the entire comment. =item LaTeX

The documentation language LaTeX uses comments starting with % and ending at the end of the line.

LPC

The LPC language has comments starting with /* and ending with */.

Comments for the language LOGO start with ;, and last till the end of the line.

lua

Comments for the lua language start with --, and last till the end of the line. See also http://www.lua.org/manual/manual.html.

mutt

Configuration files for mutt have comments starting with a # and lasting the rest of the line.

Oberon

Comments in Oberon start with (* and end with *). See http://www.oberon.ethz.ch/oreport.html.

Pascal

There are many implementations of Pascal. Some of them are implemented by this module.

$RE{comment}{Pascal}

This is the pattern that recognizes comments according to the Pascal ISO standard. This standard says that comments start with either {, or (*, and end with } or *). This means that {*) and (*} are considered to be comments. Many Pascal applications don't allow this. See http://www.pascal-central.com/docs/iso10206.txt

$RE{comment}{Alice}

The Alice Pascal compiler accepts comments that start with { and end with }. Comments are not allowed to contain newlines. See http://www.templetons.com/brad/alice/language/.

$RE{comment}{Pascal}{Delphi}, $RE{comment}{Pascal}{Free} and $RE{comment}{Pascal}{GPC}

The Delphi Pascal, Free Pascal and the Gnu Pascal Compiler implementations of Pascal all have comments that either start with // and last till the end of the line, are delimited with { and } or are delimited with (* and *). Patterns for those comments are given by $RE{comment}{Pascal}{Delphi}, $RE{comment}{Pascal}{Free} and $RE{comment}{Pascal}{GPC} respectively. These patterns only set $1 when {-keep} is used, which will then include the entire comment.

See http://info.borland.com/techpubs/delphi5/oplg/, http://www.freepascal.org/docs-html/ref/ref.html and http://www.gnu-pascal.de/gpc/.

$RE{comment}{Pascal}{Workshop}

The Workshop Pascal compiler, from SUN Microsystems, allows comments that are delimited with either { and }, delimited with (*) and *), delimited with /*, and */, or starting and ending with a double quote ("). When {-keep} is used, only $1 is set, and returns the entire comment.

See http://docs.sun.com/db/doc/802-5762.

PEARL

Comments in PEARL start with a ! and last till the end of the line, or start with /* and end with */. With {-keep}, $1 will be set to the entire comment.

PHP

Comments in PHP start with either # or // and last till the end of the line, or are delimited by /* and */. With {-keep}, $1 will be set to the entire comment.

PL/B

In PL/B, comments start with either . or ;, and end with the next newline. See http://www.mmcctech.com/pl-b/plb-0010.htm.

PL/I

The PL/I language has comments starting with /* and ending with */.

Perl

Perl uses comments that start with a #, and continue till the end of the line.

Portia

The Portia programming language has comments that start with //, and last till the end of the line.

Python

Python uses comments that start with a #, and continue till the end of the line.

Q-BAL

Comments in the Q-BAL language start with ` (a backtick), and contine till the end of the line.

REBOL

Comments for the REBOL language start with ; and last till the end of the line.

Ruby

Comments in Ruby start with # and last till the end of the time.

Scheme

Scheme comments start with ;, and last till the end of the line. See http://schemers.org/.

shell

Comments in various shells start with a # and end at the end of the line.

Shelta

The estoric language Shelta uses comments that start and end with a ;. See http://www.catseye.mb.ca/esoteric/shelta/index.html.

slrn

Configuration files for slrn have comments starting with a % and lasting the rest of the line.

Smalltalk

Smalltalk uses comments that start and end with a double quote, ".

SMITH

Comments in the SMITH language start with ;, and last till the end of the line.

Squeak

In the Smalltalk variant Squeak, comments start and end with ". Double quotes can appear inside comments by doubling them.

SQL

Standard SQL uses comments starting with two or more dashes, and ending at the end of the line.

MySQL does not follow the standard. Instead, it allows comments that start with a # or -- (that's two dashes and a space) ending with the following newline, and comments starting with /*, and ending with the next ; or */ that isn't inside single or double quotes. A pattern for this is returned by $RE{comment}{SQL}{MySQL}. With {-keep}, only $1 will be set, and it returns the entire comment.

Tcl

In Tcl, comments start with # and continue till the end of the line.

TeX

The documentation language TeX uses comments starting with % and ending at the end of the line.

troff

The document formatting language troff uses comments starting with \", and continuing till the end of the line.

vi

In configuration files for the editor vi, one can use comments starting with ", and ending at the end of the line.

*W

In the language *W, comments start with ||, and end with !!.

zonefile

Comments in DNS zonefiles start with ;, and continue till the end of the line.

REFERENCES

[Go 90]

Charles F. Goldfarb: The SGML Handbook. Oxford: Oxford University Press. 1990. ISBN 0-19-853737-9. Ch. 10.3, pp 390-391.

HISTORY

 $Log: comment.pm,v $
 Revision 2.106  2003/03/12 22:25:42  abigail
 - More generic setup to define comments for various languages.
 - Expanded and redid the documentation for comment.pm.
 - Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B,
   BASIC (mvEnterprise), Forth, Fortran (both fixed and free form),
   fvwm2, mutt, Oberon, 6 versions of Pascal,
   PEARL (one of the at least four...), PL/B, PL/I, slrn, Squeak.

 Revision 2.105  2003/03/09 19:04:42  abigail
 - More generic setup to define comments for various languages.
 - Expanded and redid the documentation for comment.pm.
   Now every language has its own paragraph, describing its comment,
   and pointers to webpages.
 - Comments for Advisor, Advsys, Alan, Algol 60, Algol 68, B, BASIC
   (mvEnterprise), Forth, Fortran (both fixed and free form), fvwm2, mutt,
   Oberon, 6 versions of Pascal, PEARL (one of the at least four...), PL/B,
   PL/I, slrn, Squeak.

 Revision 2.104  2003/02/21 14:48:06  abigail
 Crystal Reports

 Revision 2.103  2003/02/11 09:39:08  abigail
 Added

 Revision 2.102  2003/02/07 15:23:54  abigail
 Lua and FPL

 Revision 2.101  2003/02/01 22:55:31  abigail
 Changed Copyright years

 Revision 2.100  2003/01/21 23:19:40  abigail
 The whole world understands RCS/CVS version numbers, that 1.9 is an
 older version than 1.10. Except CPAN. Curse the idiot(s) who think
 that version numbers are floats (in which universe do floats have
 more than one decimal dot?).
 Everything is bumped to version 2.100 because CPAN couldn't deal
 with the fact one file had version 1.10.

 Revision 1.19  2002/11/06 13:51:34  abigail
 Minor POD changes.

 Revision 1.18  2002/09/18 18:13:01  abigail
 Fixes for 5.005

 Revision 1.17  2002/09/04 17:04:24  abigail
 Q-BAL

 Revision 1.16  2002/08/27 16:50:50  abigail
 Patterns for Beatnik, Befunge-98, Funge-98 and W*.

 Revision 1.15  2002/08/22 17:04:03  abigail
 SMITH added

 Revision 1.14  2002/08/22 16:41:25  abigail
 + Added function 'id' and 'from_to' with associated data.
 + Added function 'combine' for languages having multiple syntaxes.
 + Added 'Shelta'

 Revision 1.13  2002/08/21 16:00:32  abigail
 beta-Juliet, Portia, ILLGOL and Brainfuck.

 Revision 1.12  2002/08/20 17:40:37  abigail
 - Created a 'nested' function (simplified version from
   Regexp::Common::balanced).
 - Comments that use 'from' to eol or balanced (nested) delimiters
   are now generated from a data array.
 - Added Hugo and Haifu.

 Revision 1.11  2002/08/05 12:16:58  abigail
 Fixed 'Regex::' and 'Rexexp::' typos to 'Regexp::'
 (Found my Mike Castle).

 Revision 1.10  2002/07/31 23:33:16  abigail
 Documented that Haskell and Dylan comments need at least 5.6.0.

 Revision 1.9  2002/07/31 23:12:29  abigail
 Dylan and Haskell comments can be nested, hence version 5.6.0 of Perl
 is needed to be able to make a regex matching them.

 Revision 1.8  2002/07/31 14:48:16  abigail
 Added LOGO (to please petdance)

 Revision 1.7  2002/07/31 13:06:41  abigail
 Dealt with -keep for Haskell and Dylan.

 Revision 1.6  2002/07/31 00:54:00  abigail
 Added comments for Haskell, Dylan, Smalltalk and MySQL.

 Revision 1.5  2002/07/30 16:38:23  abigail
 Added support for the languages: LaTeX, Tcl, TeX and troff.

 Revision 1.4  2002/07/26 16:48:12  abigail
 Simplied datastructure for the languages that use single line comments.

 Revision 1.3  2002/07/26 16:37:20  abigail
 Added new languages: Ada, awk, Eiffel, Java, LPC, PHP, Python,
 REBOL, Ruby, vi and zonefile.

 Revision 1.2  2002/07/25 22:37:44  abigail
 Added 'use strict'.
 Added 'no_defaults' to 'use Regex::Common' to prevent loaded of all
 defaults.

 Revision 1.1  2002/07/25 19:56:07  abigail
 Modularizing Regexp::Common.

SEE ALSO

Regexp::Common for a general description of how to use this interface.

AUTHOR

Damian Conway (damian@conway.org)

MAINTAINANCE

This package is maintained by Abigail (regexp-common@abigail.nl).

BUGS AND IRRITATIONS

Bound to be plenty.

For a start, there are many common regexes missing. Send them in to regexp-common@abigail.nl.

COPYRIGHT

     Copyright (c) 2001 - 2003, Damian Conway. All Rights Reserved.
       This module is free software. It may be used, redistributed
      and/or modified under the terms of the Perl Artistic License
            (see http://www.perl.com/perl/misc/Artistic.html)