The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::Gazetteer::HeavensAbove - Find location of world towns and cities

VERSION

version 0.22

SYNOPSIS

 use WWW::Gazetteer::HeavensAbove;

 my $atlas = WWW::Gazetteer::HeavensAbove->new;

 # simple query using ISO 3166 codes
 my @towns = $atlas->find( 'Bacton', 'GB' );
 print $_->{name}, ", ", $_->{elevation}, $/ for @towns;

 # simple query using heavens-above.com codes
 my @towns = $atlas->query( 'Bacton', 'UK' );
 print $_->{name}, ", ", $_->{elevation}, $/ for @towns;

 # big queries can use a callback (and return nothing)
 $atlas->find(
     'Bacton', 'GB',
     sub { print $_->{name}, ", ", $_->{elevation}, $/ for @_ }
 );

 # find() returns an arrayref in scalar context
 $cities = $atlas->find( 'Paris', 'FR' );
 print $cities->[1]{name};

 # the heavens-above.com site supports complicated queries
 my @az = $atlas->find( 'a*z', 'FR' );

 # and you can naturally use callbacks for those!
 my ($c, n);
 $atlas->find( 'N*', 'US', sub { $c++; $n += @_ }  );
 print "$c web requests needed for finding $n cities";

 # or use your own UserAgent
 my $ua = LWP::UserAgent->new;
 $atlas = WWW::Gazetteer::HeavensAbove->new( ua => $ua );

 # another way to create a new object
 use WWW::Gazetteer;
 my $g = WWW::Gazetteer->new('HeavensAbove');

DESCRIPTION

A gazetteer is a geographical dictionary (as at the back of an atlas). The WWW::Gazetteer::HeavensAbove module uses the information at http://www.heavens-above.com/countries.asp to return geographical location (longitude, latitude, elevation) for towns and cities in countries in the world.

Once a WWW::Gazetteer::HeavensAbove objects is created, use the find() method to return lists of hashrefs holding all the information for the matching cities.

A city tructure looks like this:

 $lesparis = {
     iso        => 'FR',
     latitude   => '45.633',
     regionname => 'Region',
     region     => 'Rhône-Alpes',
     elevation  => '508',            # meters
     longitude  => '5.733',
     name       => 'Paris',
 };

Note: the 'regioname' attribute is the local name of a region (this can change from country to country).

Due to the way heavens-above.com's database was created, cities from the U.S.A. are handled as a special case. The region field is the state, and a special field named county holds the county name.

Here is an example of an American city:

 $newyork = {
     iso        => 'US',
     latitude   => '39.685',
     regionname => 'State',
     region     => 'Missouri',
     county     => 'Caldwell',    # this is only for US cities
     elevation  => '244',
     longitude  => '-93.927',
     name       => 'New York'
 };

Methods

new()

Return a new WWW::Gazetteer::HeavensAbove user-agent, ready to find() cities for you.

The constructor can be given a list of parameters. Currently supported parameters are:

ua - the LWP::UserAgent used for the web requests

retry - the number of times a failed connection will be retried

You can also use the generic WWW::Gazetteer module to create a new WWW::Gazetteer::HeavenAbove object:

 use WWW::Gazetteer;
 my $g = WWW::Gazetteer->new('HeavensAbove');

You can also pass it inialisation parameters:

 use WWW::Gazetteer;
 my $g = WWW::Gazetteer->new('HeavensAbove',  retry => 3);
find( $city, $country [, $callback ] )

Return a list of cities matching $city, within the country with ISO 3166 code $code (not all codes are supported by heavens-above.com).

This method always returns an array of city structures. If the request returns a lot of cities, you can pass a callback routine to find(). This routine receives the list of city structures as @_. If a callback method is given to find(), find() will return an empty list.

A single call to find() can lead to several web requests. If the query returns more than 200 answeris, heavens-above.com cuts at 200. WWW::Gazetteer::HeavensAbove picks as many data as possible from this first answer and then refines the query again and again.

Here's an excerpt from heavens-above.com documentation:

    You can use "wildcard" characters to match several towns if you're not sure of the exact name. These characters are '*' which means "match any sequence of characters", and '?' which means "match any single character". The search is not case-sensitive.

    Diacritic characters, such as ü and Ä can either be entered directly from the keyboard (assuming you have the appropriate keyboard), or simply enter the letter without diacritic (e.g. you can enter 'a' for 'ä', 'à', 'á', 'â', 'ã' and 'å'). If you need a special character which is not on your keyboard, and is not a diacritic (e.g. the german 'ß', and scandinavian 'æ'), simply enter a "?" instead, and all characers will be matched.

Note: heavens-above.com doesn't use ISO 3166 codes, but its own country codes. If you want to use those directly, please see the query() method. (And read the source for the full list of HA codes.)

fetch( $searchstring, $code [, $callback ] );

fetch() is a synonym for find(). It is kept for backward compatibility.

query( $searchstring, $code [, $callback ] );

This method is the actual method called by find().

The only difference is that $code is the heavens-above.com specific country code, instead of the ISO 3166 code.

Callbacks

The find() and query() methods both accept a optionnal coderef as their third argument. This method is used as a callback each time a batch of cities is returned by a web query to heavens-above.com.

This can be very useful if a query with a joker returns more than 200 answers. WWW::Gazetteer::HeavensAbove breaks it into new requests that return a smaller number of answers. The callback is called with the results of the subquery after each web request.

This method is called in void context, and is passed a list of hashrefs (the cities found by the last query).

An example callback is (from eg/city.pl):

 # print a tab separated list of cities
 my $cb = sub {
     local $, = "\t";
     local $\ = $/;
     print @$_{qw(name region latitude longitude elevation)} for @_;
 };

Please note that, due to the nature of the queries, your callback can (and will most probably) be called with an empty @_.

ALGORITHM

The web site returns only the first 200 answers to any query. To handle huge requests like '*' (the biggest possible), WWW::Gazetteer::HeavensAbove splits the requests in several parts.

Example, looking for pa* in France:

  • pa* returns more than 200 answers, the last ones being:

        195 Paques, Rhône-Alpes
        196 Paquier, Rhône-Alpes
        197 Paradiso, Corse
        198 Paradou, Provence-Alpes-Côte d'Azur
        199 Paraise, Bourgogne
        200 Paraize (Paraise), Bourgogne

    The algorithm keeps the 196 first ones, because they match pa* and not par* (r is the first character matched by * in the last city matched).

  • The next sub-query is computed as par* (176 cities)

  • It is followed by pas* (64), pat* (12), pau* (44), pav* (11), paw* (0), pay* (18) and paz* (5).

There is at least one query that cannot be completely fulfilled: there are more than 200 cities named Buenavista in Mexico. The web site limitation of 200 cities per query prevents us to get the other Benavistas in Mexico. WWW::Gazetteer::HeavensAbove as of version 0.11 includes a workaround to continues with the global query, and fetch only the first 200 Buenavistas. (This will work with other similarly broken answers.)

TODO

Handle the case where a query with more than one joker (*?) returns more than 200 answers. For now, it stops at 200.

BUGS

Network errors croak after the maximum retry count has been reached. This can be a problem when making big queries (that return more than 200 answers) which results are passed to a callback, because part of the data has been already processed by the callback when the script dies. And even if you can catch the exception, you cannot easily guess where to start again.

Bugs in the database are not from heavens-above.com, since they "put together and enhanced" data from the following two sources: US Geological Survey (http://geonames.usgs.gov/index.html) for the USA and dependencies, and The National Imaging and Mapping Agency (http://www.nima.mil/gns/html/index.html) for all other countries.

See also: http://www.heavens-above.com/ShowFAQ.aspx?FAQID=100

ACKNOWLEDGEMENTS

This module was a script, before I found out about Leon Brocard's WWW::Gazetteer module. Thanks! And, erm, bits of the documentation were stolen from WWW::Gazetteer.

Thanks to Alain Zalmanski (of http://www.fatrazie.com/ fame) for asking me for all that geographical data in the first place.

SEE ALSO

"How I captured thousands of Afghan cities in a few hours", one of my lightning talks at YAPC::Europe 2002 (Munich). You had to be there.

WWW::Gazetteer and WWW::Gazetteer::Calle, by Leon Brocard.

The use Perl discussion that had me write this module from the original script: http://use.perl.org/~acme/journal/8079

The module master repository is held at: http://git.bruhat.net/r/WWW-Gazetteer-HeavensAbove.git and http://github.com/book/WWW-Gazetteer-HeavensAbove.

BUGS

Please report any bugs or feature requests on the bugtracker website http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Gazetteer-HeavensAbove or by email to bug-git-repository@rt.cpan.org.

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

AUTHOR

Philippe Bruhat (BooK) <book@cpan.org>

COPYRIGHT

Copyright 2002-2013 Philippe Bruhat (BooK).

LICENSE

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 491:

Non-ASCII character seen before =encoding in ''Rhône-Alpes','. Assuming CP1252