The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sport::Analytics::NHL::Scraper - Scrape and crawl the NHL website for data

SYNOPSIS

Scrape and crawl the NHL website for data

  use Sport::Analytics::NHL::Scraper
  my $schedules = crawl_schedule({
    start_season => 2016,
    stop_season  => 2017
  });
  ...
  my $contents = crawl_game(
    { season => 2011, stage => 2, season_id => 0001 }, # game 2011020001 in NHL accounting
    { game_files => [qw(BS PL)], retries => 2 },
  );

IMPORTANT VARIABLE

Variable @GAME_FILES contains specific definitions for the report types. Right now only the boxscore javascript has any meaningful non-default definitions; the PB feed seems to have become unavailable.

FUNCTIONS

scrape
 A wrapper around the LWP::Simple::get() call for retrying and control.
 Arguments: hash reference containing
   * url => URL to access
   * retries => Number of retries
   * validate => sub reference to validate the download
 Returns: the content if both download and validation are successful
          undef otherwise.
crawl_schedule

Crawls the NHL schedule. The schedule is accessed through a minimalistic live api first (only works for post-2010 seasons), then through the general /api/

 Arguments: hash reference containing
  * start_season => the first season to crawl
  * stop_season  => the last season to crawl
Returns: hash reference of seasonal schedules where seasons are the keys, and decoded JSONs are the values.
get_game_url_args
  Sets the arguments to populate the game URL for a given report type and game
  Arguments: document name, currently one of qw(BS PB RO ES GS PL)
             game hashref containing
             * season    => YYYY
             * stage     => 2|3
             * season ID => NNNN
  Returns: a configured list of arguments for the URL.
crawl_game
  Crawls the data for the given game
  Arguments: game data as hashref:
             * season    => YYYY
             * stage     => 2|3
             * season ID => NNNN
             options hashref:
             * game_files => hashref of types of reports that are requested
             * force      => 0|1 force overwrite of files already present in the system
             * retries    => N number of the retries for every get call

AUTHOR

More Hockey Stats, <contact at morehockeystats.com>

BUGS

Please report any bugs or feature requests to contact at morehockeystats.com, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sport::Analytics::NHL::Scraper. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Sport::Analytics::NHL::Scraper

You can also look for information at: