The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

URL::RegexMatching - A library of utility methods for matching URLs with regex patterns.

SYNOPSIS

        #!/usr/bin/perl
        
        use strict;
        use warnings;
        
        use URL::RegexMatching qw(url_match_regex http_url_match_regex);
        
        my $text = <<SAMPLE;
        This is some sample text with links like
        <http://foo.com/blah_blah/> and others like WWW.EXAMPLE.COM
        and bit.ly/foo. And what about something like a
        mailto:name\@example.com pattern?
        SAMPLE
        
        my $url_regex = url_match_regex; 
        my $http_regex = http_url_match_regex;
        
        print "Using this sample text:\n";
        print "$text\n";
        
        print "These strings are probably links:\n";
        while ($text =~m{$url_regex}g) {
                print "\t$1\n";
        }
        
        print "\nWeb URLs:\n";
        while ($text =~m{$http_regex}g) {
                print "\t$1\n";
        }
        
        $text =~s{$http_regex}{<a href="$1">$1</a>}g;
        
        print "\n\n";
        print "Convert only HTTP links to HTML links using http_url_match_regex:\n";
        print "$text\n";
        

DESCRIPTION

This package is based on regular expression patterns initially developed by John Gruber of Daring Fireball fame. This module is simply a packaging of his work to make utilization by the Perl community easier.

METHODS

url_match_regex

This method takes no arguments and returns a compiled regular expression matching pattern. The pattern will liberally match string that appear to be various HTTP, HTTPS and mailto including a best attempt to identify relative URLs.

This method can be exported by request.

http_url_match_regex

This method takes no arguments and returns a compiled regular expression matching pattern. This pattern will liberally match only web URLs -- http, https and relative forms such as www.example.com

This method can be exported by request.

KNOWN ISSUES

Both regular expression patterns are known to fail against URL strings such as:

http://example.com/quotes-are-“part”
✪df.ws/1234
example.com
example.com/

When using the http_url_match_regex method it is likely to match link strings whose domain/file path looks like a web URL, but uses a different protocol such as 'ftp://www.example.com/foo.txt' where the match would capture all but the 'ftp://' part.

SUPPORT

Bugs should be reported via the GitHub project issues tracking system: http://github.com/tima/perl-url-regexmatching/issues

AUTHOR

Timothy Appnel <tima@cpan.org>

SEE ALSO

http://daringfireball.net/2010/07/improved_regex_for_matching_urls

COPYRIGHT AND LICENCE

This module is based on the work of John Gruber of Daring Fireball. John writes "this pattern is free for anyone to use, no strings attached. Consider it public domain."

The software is released under the Artistic License. The terms of the Artistic License are described at http://www.perl.com/language/misc/Artistic.html.

Except where otherwise noted, URL::RegexMatching is Copyright 2010, Timothy Appnel, tima@cpan.org. All rights reserved.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 100:

Non-ASCII character seen before =encoding in 'http://example.com/quotes-are-“part”'. Assuming UTF-8