The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Regexp::Common - regexps for Debian specific strings

SYNOPSIS

    use Regexp::Common qw(debian);
    #TODO:

DESCRIPTION

#TODO:

$RE{debian}{package}
    'the-very.strange.package+name' =~ $RE{debian}{package}{-keep};
    print "package is $1";

This is Debian package name. Rules are described in Section 5.6.7 of Debian policy.

$1 is a package
$RE{debian}{version}
    '10:1+abc~rc.2-ALPHA-rc25+w~t.f' =~ $RE{debian}{version}{-keep};
    $2 eq '10'               &&
    $3 eq '1+abc~rc.2-ALPHA' &&
    $4 eq 'rc25+w~t.f'       or die;

This is Debian version. Rules are described in Section 5.6.12 of Debian policy.

$1 is a debian_version
$2 is an epoch

if any. Oterwise -- undef.

$3 is an upstream_version

(caveat) A string like 0--1 will end up with $3 set to weird 0- (hopefully, Debian won't degrade to such versions; though YMMV).

$4 is a debian_revision

(bug) 0-1- will end up with $3 set to 0 and $4 set to 1 (such trailing hyphens will be missing in $1). 0- will end up with $4 undefed.

(bug) Either I don't perlre or I didn't tried hard enough. Anyway, I didn't find a way to parse Debian version the way R::C requires in context of perl5.8.8 (perl in stable, going to be oldstable). qr/(?|)/ saved perl5.10.0 (but see "R_C_d_version").

(caveat) The debian_revision is allowed to start with non-digit. This's solely my reading of Debian Policy.

R_C_d_version
    use Regexp::Common qw(debian);
    # though that works too
    # use Regexp::Common::debian;
    my $re = Regexp::Common::debian::R_C_d_version;
    $version =~ /^$re$/;
    $2                   and print "has epoch\n";
    $3 || $5 || $6 || $8 and print "has upstream_version\n";
    $4 || $7             and print "has debian_revision\n";
    $3 && !$4 || !$3 && $4 or die;
    $6 && !$7 || !$6 && $7 or die;
    $3 && !$5 && !$6 && !$8 or die;
           $5 && !$6 && !$8 or die;
                  $6 && !$8 or die;

That's a workaround for perl5.8.8 (read "$RE{debian}{version}" (look for (bug))). Look for (caveat) in "$RE{debian}{version}" -- those apply here too.

$1 is debian_version again
$2 is epoch always
Either $3, or $5, or $6, or $8 is upstream_version
Either $4 or $7 is debian_revision

That's the best what can be done with RE (in real world it's done functional way). Sorry.

(bug) It always grabs (should be configurable with setting like -keep). OTOH, look, within 2year (or so) (as soon as perl5.10.0 would be oldstable) that dirty piece will be dropped anyway.

$RE{debian}{architecture}
    $arch =~ $RE{debian}{architecture}{-keep};
    $2 && ($3 ||  $4)           and die;
           $3 && !$4            and die;
           $3 &&  $4 eq 'armel' and die;
    $2 and print "that's special: $2";
    $3 and print "OS is: $3";
    $4 and print "arch is: $4";

This is Debian architecture. Rules are described in Section 5.6.8 of Debian policy.

$1 is some of Debian's architectures
$2 is any special

Distinguishing special architectures (all, any, and source) and os-arch pairs is arguable. But I've decided that would be good to separate all and e.g. i386 (what in turn is actually linux-i386).

$3 is os

When !$3 && $4 is true then undefined $3 actually means linux. Since $digits are read-only yielding here anything but undef is impossible. More on that in Section 11.1 of Debian policy.

$4 is arch

Please note that there are architectures which are present only for linux os (namely armel and lpia, at time of writing).

(caveat) Debian policy by itself doesn't specify what os-arch pairs are valid (only specials are mentioned). In turn it relies on qx/dpkg-architecture -L/. In effect R::C::d can desinchronize; Hopefully, that wouldn't stay unnoticed too long.

$RE{debian}{archive}{binary}
    'abc_1.2.3-512_all.deb' =~ $RE{debian}{archive}{binary}{-keep};
    print "     package is -> $2";
    print "     version is -> $3";
    print "architecture is -> $4";

This is Debian binary archive (even if there's no binary file (in -B sense) inside it's called "binary" anyway). The naming convention isn't described in Debian policy; Instead it refers to format understood by dpkg (Preface of Chapter 3). (Hopefully, someday here will be references to code inside dpkg and dpkg-deb codebase that does those nasty things with package, version, and arch composing in and decomposing out of filenames.)

$1 is deb-filename

That's the whole archive filename with .deb suffix included

$2 is package
$3 is version

There's a big deal of WTF. Filename: in *_Packages miss epoch at all. Archives in pool/ miss them too. Archives in /var/cache/apt/archives ... That seems to be apt-get specific (I don't have reference to code though). As a feature $RE{d}{a}{binary} provides an epoch hack in filenames.

$4 is architecture

That would match surprising source or any. Sorry. That'll improve in future. Actually that's even worse: OS can prepend any arch or special.

For the sake of symmetry $RE{d}{a}{binary} has trailing anchor -- negative look-ahead for any character that can be found in version string.

$RE{debian}{archive}{source}
    'xyz_1-ab.25~6.orig.tar.gz' =~ $RE{debian}{archive}{source}{-keep};
    print "package is $2";
    index($3, '-') && $4 eq 'tar' and die;
    $4 eq 'orig.tar'              and "print there should be patch";

This is Debian upstream (or Debian-native) source tarball. Naming source archives is outside Debian policy; although

  • Section 5.6.21 mentions that "the exact forms of the filenames are described in" Section C.3.

  • Section C.3 points that source archive must be in form package_upstream-version.orig.tar.gz.

  • Naming Debian-native packages is left completely.

  • dpkg-source(1) (1.14.23) in Section SOURCE PACKAGE FORMATS mentions some bits of naming (Debian-native packages are left too).

Welcome to the real life. $RE{d}{a}{source} knows only Format: 1.0 naming.

$1 is tarball-filename

Since there's no other suffix, but .gz it's present only in $1

$2 is package
$3 is version
$4 is type

This can hold one of 2 strings (orig.tar (regular package) or tar (Debian-native package)).

Since dot (.) is used as separator and can be in version the whole thing is implicitly anchored (negative-lookahead for version-forming character) (The idea is that 0.orig.tar.gz can be a very strange version) and version itself is stressed to be as short as possible.

$RE{debian}{archive}{patch}
    'abc_0cba-12.diff.gz' =~ $RE{debian}{archive}{patch}{-keep};
    print "package is $2";
    -1 == index $3, '-' and die;
    print "debian revision is ", (split /-/, $3)[-1];

This is "debianization diff" (Section C.3 of Debian policy). Naming patches is outside Debian policy; So we're back to guessing. There're rumors (or maybe trends) that Format 1.0 will be deprecated (or maybe obsolete).

$1 is patch-filename

Since there's no other suffix, but .diff.gz it's present only in $1

$2 is package
$3 is version

(caveat) Consider this. A Debian-native package misses a patch and hyphen in version. A regular package has a patch and must have hyphen in version. $RE{d}{a}{patch} is absolutely ignorant about that (we are about matching but verifying after all).

The very same considerations covered in discussion trailing $RE{d}{a}{source} entry apply to $RE{d}{a}{patch} as well (consider: 0.diff.gz can be a version).

$RE{debian}{archive}{dsc}
    'abc_0cba-12.dsc' =~ $RE{debian}{archive}{dsc}{-policy=real};
    print "package is $2";
    print "version is $3";

This is "Debian source control" (Section 5.4 describes its contents but naming). Statistically based guessing, you know (once I'll elaborate to point exact lines in dpkg-dev bundle where it's in use (creating and parsing)).

$1 is dsc-filename

As usual, since the only suffix can be .dsc it's present in $1 only.

$2 is package
$3 is version

blah-blah refering to $RE{d}{a}{source} (consider: 0.dsc can be version).

$RE{debian}{archive}{changes}
    'abc_0cba-12.changes' =~ $RE{debian}{archive}{changes}{-policy=real};
    print "package is $2";
    print "version is $3";

This is "Debian changes file" (Section 5.5 describes its contents but naming). Statistically based guessing, you know (once I'll elaborate to point exact lines in dpkg-dev bundle where it's in use (creating and parsing)) (should be a template).

$1 is changes-filename

As usual, since the only suffix can be .changes it's present in $1 only.

$2 is package
$3 is version

blah-blah refering to $RE{d}{a}{source} (consider: 0.changes can be version).

BUGS AND CAVEATS

Grep this pod for (bug) and/or (caveat). They all are placed in appropriate sections.

AUTHOR

Eric Pozharski, <whynot@cpan.org>

COPYRIGHT AND LICENSE

Copyright 2008 by Eric Pozharski

This library is free in sense: AS-IS, NO-WARANRTY, HOPE-TO-BE-USEFUL. This library is released under LGPLv3.

SEE ALSO

Regexp::Common, http://www.debian.org/doc/debian-policy,