The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Cookie::Domain - Domain Name Public Suffix Query Interface

SYNOPSIS

        use Cookie::Domain;
        my $dom = Cookie::Domain->new( min_suffix => 1, debug => 3 ) ||
            die( Cookie::Domain->error, "\n" );
        my $res = $dom->stat( 'www.example.or.uk' ) || die( $dom->error, "\n" );
        # Check for potential errors;
        die( $dom->error ) if( !defined( $res ) );
        # stat(9 returns an empty string if nothing was found
        print( "Nothing found\n" ), exit(0) if( !$res );
        print( $res->domain, "\n" ); # example.co.uk
        print( $res->name, "\n" ); # example
        print( $res->sub, "\n" ); # www
        print( $res->suffix, "\n" ); # co.uk

    # Load the public suffix. This is done automatically, so no need to do it
        $dom->load_public_suffix( '/some/path/on/the/filesystem/data.txt' ) || 
            die( $dom->error );
        # Then, save it as json data for next time
        $dom->save_as_json( '/var/domain/public_suffix.json' ) || 
            die( $dom->error, "\n" );
        say $dom->suffixes->length, " suffixes data loaded.";

VERSION

    v0.1.2

DESCRIPTION

This is an interface to query the Public Suffix list courtesy of the mozilla project.

This list contains all the top level domains, a.k.a. zones and is used to determine what part of a domain name constitute the top level domain, what part is the domain, a.k.a. label and what part (the rest) constitute the subdomain.

Consider www.example.org. In this example, org is the top level domain, example is the name, example.org is the domain, and www is the subdomain.

This is easy enough, but there are cases where it is tricky to know which label (or part) is the domain part or the top level domain part. For example, www.example.com.sg, com.sg is the top level domain, example the name, example.com.sg is the domain, and www the subdomain.

This module will use a json cache data file to speed up the loading of the suffixes, a.k.a, top level domains, data.

By default the location of this json file will be public_suffix.json under your system temporary directory, but you can override this by specifying your own location upon object instantiation:

    my $dom = Cookie::Domain->new( json_file => '/home/joe/var/public_suffix.json' );

METHODS

new

This initiates the package and take the following parameters either as an hash or hash reference:

debug

Optional. If set with a positive integer, this will activate verbose debugging message

file

Specify the location of the Public Suffix data file. The default one is under the same directory as this module with the file name public_suffix_list.txt

You can download a different (new) version and specify with this parameter where it will be found.

json_file

Specify the location of the json cache data file. The default location is set using Module::Generic::File to get the system temporary directory and the file name public_suffix.json.

This json file is created once upon initiating an object and if it does not already exist. See the "json_file" method for more information.

min_suffix

Sets the minimum suffix length required. Default to 0.

no_load

If this is set to true, this will prevent the object instantiation method from loading the public suffix file upon object instantiation. Normally you would not want to do that, unless you want to control when the file is loaded before you call "stat". This is primarily used by "cron_fetch"

cron_fetch

You need to have installed the package LWP::UserAgent to use this method.

This method can also be called as a package subroutine, such as Cookie::Domain::cron_fetch

Its purpose is to perform a remote connection to https://publicsuffix.org/list/effective_tld_names.dat and check for an updated copy of the public suffix data file.

It checks if the remote file has changed by using the http header field Last-Modified in the server response, or if there is already an etag stored in the cache, it performs a conditional http query using If-None-Matched. See Mozilla documentation for more information on those types of query.

This is important to save bandwidth and useless processing.

If the file has indeed changed, "save_as_json" is invoked to refresh the cache.

It returns the object it was called with for chaining.

decode

Takes a domain name, or rather called a host name, such as www.東京.jp or 今何年.jp and this will return its punycode ascii representation prefixed with a so-called ASCII Compatible Encoding, a.k.a. ACE. Thus, using our previous examples, this would produce respectively www.xn--1lqs71d.jp and xn--wmq0m700b.jp

Even if the host name contains non-ascii dots, they will be recognised. For example www。東京。jp would still be successfully decoded to www.xn--1lqs71d.jp

If the host name provided is not an international domain name (a.k.a. IDN), it is simply returned as is. Thus, if www.example.org is provided, it would return www.example.org

If an error occurred, it sets an error object and returns "undef" in perlfunc. The error can then be retrieved using "error" in Module::Generic inherited by this module.

It uses "domain_to_ascii" in Net::IDN::Encode to perform the actual decoding.

encode

This does the reverse operation from "decode".

It takes a domain name, or rather called a host name, already decoded, and with its so called ASCII Compatible Encoding a.k.a. ACE prefix xn-- such as xn--wmq0m700b.jp and returns its encoded version in perl internal utf8 encoding. Using the previous example, and this would return 今何年.jp. The ACE prefix is required to tell apart international domain name (a.k.a. IDN) from other pure ascii domain names.

Just like in "decode", if a non-international domain name is provided, it is returned as is. Thus, if www.example.org is provided, it would return www.example.org

Note that this returns the name in perl's internal utf8 encoding, so if you need to save it to an utf8 file or print it out as utf8 string, you still need to encode it in utf8 before. For example:

    use Cookie::Domain;
    use open ':std' => ':utf8';
    my $d = Cookie::Domain->new;
    say $d->encode( "xn--wmq0m700b.jp" );

Or

    use Cookie::Domain;
    use Encode;
    my $d = Cookie::Domain->new;
    my $encoded = $d->encode( "xn--wmq0m700b.jp" );
    say Encode::encode_utf8( $encoded );

If an error occurred, it sets an error object and returns "undef" in perlfunc. The error can then be retrieved using "error" in Module::Generic inherited by this module.

It uses "domain_to_unicode" in Net::IDN::Encode to perform the actual encoding.

file

Sets the file path to the Public Suffix file. This file is a public domain file at the initiative of Mozilla Foundation and its latest version can be accessed here: https://publicsuffix.org/list/

json_file

Sets the file path of the json cache data file. THe purpose of this file is to contain a json representation of the parsed data from the Public Suffix data file. This is to avoid re-parsing it each time and instead load the json file using the XS module JSON

load

This method takes no parameter and relies on the properties set with "file" and "json_file".

If the hash data is already accessibly in a module-wide variable, the data is taken from it.

Otherwise, if json_file is set and accessible, this will load the data from it, otherwise, it will load the data from the file specified with "file" and save it as json.

If the json file meta data enclosed, specifically the property db_last_modified has a unix timestamp value lower than the last modification timestamp of the public suffix data file, then, "load" will reload that data file and save it as json again.

That way, all you need to do is set up a crontab to fetch the latest version of that public suffix data file.

For example, to fetch it every day at 1:00 in the morning:

    0 1 * * * perl -MCookie::Domain -e 'Cookie::Domain::cron_fetch' >/dev/null 2>&1

But if you want to store the public suffix data file somewhere other than the default location:

    0 1 * * * perl -MCookie::Domain -e 'my $d=Cookie::Domain->new(file=>"/some/system/file.txt"); $d->cron_fetch' >/dev/null 2>&1

See your machine manpage for crontab for more detail.

The data read are loaded into "suffixes".

It returns the current object for chaining.

load_json

This takes a file path to the json cache data as the only argument, and attempt to read its content and set it onto the data accessible with "suffixes".

If an error occurs, it set an error object using "error" in Module::Generic and returns "undef" in perlfunc

It returns its current object for chaining.

load_public_suffix

This is similar to the method "load_json" above.

This takes a file path to the Public Suffix data as the only argument, read its content, parse it using the algorithm described at https://publicsuffix.org/list/ and set it onto the data accessible with "suffixes" and also onto the package-wide global variable to make the data available across object instantiations.

If an error occurs, it set an error object using "error" in Module::Generic and returns "undef" in perlfunc

It returns its current object for chaining.

meta

Returns an hash object of meta information pertaining to the public suffix file. This is used primarily by "cron_fetch"

min_suffix

Sets or gets the minimum suffix required as an integer value.

It returns the current value as a Module::Generic::Number object.

no_load

If this is set to true, this will prevent the object instantiation method from loading the public suffix file upon object instantiation. Normally you would not want to do that, unless you want to control when the file is loaded before you call "stat". This is primarily used by "cron_fetch"

save_as_json

This takes as sole argument the file path where to save the json cache data and save the data accessible with "suffixes".

It returns the current object for chaining.

If an error occurs, it set an error object using "error" in Module::Generic and returns "undef" in perlfunc

stat

This takes a domain name, such as www.example.org and optionally an hash reference of options and returns:

undef()

If an error occurred.

    my $rv = $d->stat( 'www.example.org' );
    die( "Error: ", $d->error ) if( !defined( $rv ) );
empty string

If there is no data available such as when querying a non existing top level domain.

A Cookie::Domain::Result object

An object with the following properties and methods, although not all are necessarily defined, depending on the results.

Accessed as an hash property and this return a regular string, but accessed as a method and they will return a Module::Generic::Scalar object.

name

The label that immediately follows the suffix (i.e. on its lefthand side).

For example, in www.example.org, the name would be example

    my $res = $dom->stat( 'www.example.org' ) || die( $dom->error );
    say $res->{name}; # example
    # or alternatively
    say $res->name; # example
sub

The sub domain or sub domains that follows the domain on its lefthand side.

For example, in www.paris.example.fr, www.paris is the sub and example the name

    my $res = $dom->stat( 'www.paris.example.fr' ) || die( $dom->error );
    say $res->{sub}; # www.paris
    # or alternatively
    say $res->sub; # www.paris
suffix

The top level domain or suffix. For example, in example.com.sg, com.sg is the suffix and example the name

    my $res = $dom->stat( 'example.com.sg' ) || die( $dom->error );
    say $res->{suffix}; # com.sg
    # or alternatively
    say $res->suffix; # com.sg

What constitute a suffix varies from zone to zone or country to country, hence the necessity of this public domain suffix data file.

Cookie::Domain::Result objects inherit from Module::Generic::Hash, thus you can do:

    my $res = $dom->stat( 'www.example.org' ) || die( $dom->error );
    say $res->length, " properties set.";
    # which should say 3 since we alway return suffix, name and sub

The following additional method is also available as a convenience:

domain

This is a read only method which returns and empty Module::Generic::Scalar object if the name property is empty, or the properties name and suffix join by a dot '.' and returned as a new Module::Generic::Scalar object.

    my $res = $dom->stat( 'www.example.com.sg' ) || die( $dom->error );
    say $res->domain; # example.com.sg
    say $res->domain->length; # 14

The options accepted are:

add

This is an integer, and represent the additional length to be added, for the domain name.

min_suffix

This is an integer, and if provided, will override the default value set with "min_suffix"

suffixes

This method is used to access the hash repository of all the public suffix data.

It is actually an Module::Generic::Hash object. So you could do:

    say "There are ", $dom->suffixes->length, " rules.";

AUTHOR

Jacques Deguest <jack@deguest.jp>

SEE ALSO

Cookie, Cookie::Jar, Mozilla::PublicSuffix, Domain::PublicSuffix, Net::PublicSuffixList

https://publicsuffix.org/list/

COPYRIGHT & LICENSE

Copyright (c) 2021 DEGUEST Pte. Ltd.

You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.