The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Sport::Analytics::SimpleRanking - This module provides a method that calculate Doug Drinen's simple ranking system.

VERSION

Version 0.21

SYNOPSIS

This module provides a method that calculates Doug Drinen's simple ranking system. It also provides access to some other useful team and season stats.

    use Sport::Analytics::SimpleRanking;
    my $stats = Sport::Analytics::SimpleRanking->new();
    my $games = [
        "Boston,13,Atlanta, 27",
        "Dallas,17,Chicago,21",
        "Eugene,30,Fairbanks,41",
        "Atlanta,15,Chicago,3",
        "Eugene,21,Boston,24",
        "Fairbanks,17,Dallas,7",
        "Dallas,19,Atlanta,7",
        "Boston,9,Fairbanks,31",
        "Chicago,10,Eugene,30",
    ];
    $stats->load_data( $games );
    my  $srs = $stats->simpleranking( verbose => 1 );
    my $mov = $stats->mov;
    my $sos = $stats->sos;
    for ( keys %$srs ) {
        print "Team $_ has a srs of ", $srs->{$_};
        print " and a mov of ",$mov->{$_},"\n";
    }

DESCRIPTION

The simple ranking system is one based on rates of scoring, generally by starting with team margin of victory (i.e. average point spread). It is perhaps the simplest model of the form

 Team Strength = a x (Mov) + b x (Opponent Strength)

In the simple ranking system, a = 1 and b = 1/(number of opponents played). Matrix solutions of this linear equation tend to be very unstable, whereas an iterative solution rapidly converges to a stable answer. This object implements the iterative solution, and since doing that much work means the object can calculate a number of other useful values on the data set, it does so as well.

One more note, though commonly described as N equations in N unknowns, an additional constraint is required to solve to a single unique answer, and that is that the sum of all simple rankings must add up to 0.0. This also guarantees that the average club in a season has a ranking of zero.

METHODS

CREATION

new()

 my $stats = Sport::Analytics::SimpleRanking->new()

Output: a working SimpleRanking object.

ACCESSORS

Unless otherwise specified, success returns the value (or values) requested and failure is carped and returns a reference to an empty hash. Failures in the accessors happen when data have not been successfully loaded.

total_games()

 my $total_games = $stats->total_games();

Input: none required

Output: The number of games total in the data set loaded.

total_teams()

 my $total_teams = $stats->total_teams();

Input: none required

Output: The number of teams total in the data set loaded.

total_wins()

 my $total_wins = $stats->total_wins();

Input: none required

Output: The number of wins total in the data set loaded.

home_wins()

 my $home_wins = $stats->home_wins();

Input: none required

Output: The number of wins by home teams in the data set loaded.

home_win_pct()

 my $home_win_percent = $stats->home_win_pct();

Input: none required

Output: Percentage number of wins by home teams in the data set loaded.

win_margin()

 my $win_margin = $stats->win_margin();

Input: none required

Output: Average margin of victory if a team does win.

win_score()

 my $average_winnning_score = $stats->win_score();

Input: none required

Output: Average winning score if a team does win.

loss_score()

 my $average_losing_score = $stats->loss_score();

Input: none required

Output: Average losing score if a team does lose.

avg_score()

 my $average_score = $stats->avg_score();

Input: none required

Output: Average score under any circumstance.

team_stats()

 my $teams = $stats->team_stats();
 for (sort keys %$teams) {
     printf "%s:  %3d-%3d-%3d\n", $_, $team{$_}{wins}, $team{$_}{losses}, $team{$_}{ties};
 }

Input: none required

Output: A reference to a hash of statistics per team. These include wins losses ties games_played points_for points_against point_spread win_pct mov (also known as average point spread).

This function will return an empty hash reference if data have not yet been loaded.

pythag()

The Pythagorean formula is a rule of thumb that estimates winning percentage from points scored and points allowed.

 Estimated Winning Percentage = (Pts Scored)**N/( (Pts Scored)**N + (Pts Allowed)**N )

In the original Bill James formulation, the power of the Pythagorean formula, N, is 2. This implementation can calculate the Pythagorean power from the game data set itself.

 my $teams = $stats->team_stats();
 my $predicted = $stats->pythag();
 for (sort keys %$teams) {
     printf "%s:  %6.2f %6.2f\n", $_, $team{$_}{win_pct}, $predicted{$_};
 }

Input: If none given, will assume N = 2.

 my $predicted = $stats->pythag();

If input is a number, that number will be used to calculate the power of the Pythagorean prediction.

 my $predicted = $stats->pythag(2.5);

If input is a reference to a scalar, and the option 'best => 1' is used, then this program will use a golden mean search to find the best fit value of N, and return the value in the reference provided.

 my $predicted = $stats->pythag( \$exp, best => 1 );

Output:

A hash reference with team names as keys and predicted winning percentage as values.

This function will return an empty hash reference if data have not yet been loaded.

ALGORITHM COMPONENTS

mov()

 my $mov = $stats->mov();
 for (sort keys %$mov) {
     printf "team %s: margin of victory: %6.2f\n", $_, $mov{$_};
 }

Input: none required

Output: a hash of mov values (margin of victory, or average point spread) per team. This function will return an empty hash reference if data have not yet been loaded.

sos()

Strength of schedule is the sum of the simple rankings of all teams that played a specific team, divided by the total number of teams that played the team.

 my $sos = $stats->sos();
 for (sort keys %$sos) {
     printf "team %s: strength of schedule: %6.2f\n", $_, $sos{$_};
 }

Input: none required

Output: a hash of sos values (strength of schedule) per team. This function will return an empty hash reference if data have not yet been calculated.

simpleranking()

Input: none required, options possible.

Example:

 my $stats = Sport::Analytics::SimpleRanking->new();
 $stats->load_data( \@games );
 my $srs = $stats->simpleranking( verbose => 1 );
 my $mov = $stats->mov();
 my $sos = $stats->sos();
 for (sort keys %$srs) {
     printf "team %s: simple ranking: %6.2f = margin of victory: %6.2f", $_, $srs{$_},$mov{$_};
     printf " + strength of schedule: %6.2f\n",$sos{$_};
 }

Options:

    epsilon => value

    This is a convergence criterion. Usually you won't need to set this.

    maxiter => value

    A stopgap to prevent runaways. Usually unnecessary as this algorithm converges rapidly.

    verbose => value

    Set this on to visually watch values converge.

Output: The simple rankings of the data as a hash of values per team name. This function will return an empty hash reference if data have not yet been calculated.

DATA LOADING

There are two methods provided, load_data() and add_data(). The method load_data() can only be used once, then add_data() thereafter.

load_data()

Input: a reference to an array of comma separated strings of the form:

"visting team,score,home team,score"

Example:

    use Sport::Analytics::SimpleRanking;
    my $stats = Sport::Analytics::SimpleRanking->new();
    my $games = [
        "Boston,13,Atlanta, 27",
        "Dallas,17,Chicago,21",
        "Eugene,30,Fairbanks,41",
        "Atlanta,15,Chicago,3",
        "Eugene,21,Boston,24",
        "Fairbanks,17,Dallas,7",
        "Dallas,19,Atlanta,7",
        "Boston,9,Fairbanks,31",
        "Chicago,10,Eugene,30",
    ];
    $stats->load_data( $games );

This calculation requires at least two teams, and then at least two games per team in order to be successful.

Output: returns 1 on success, croaks on failure.

add_data()

Input: a reference to an array of comma separated strings of the form:

"visting team,score,home team,score"

Example:

    use Sport::Analytics::SimpleRanking;
    my $stats = Sport::Analytics::SimpleRanking->new();
    # first two weeks games.
    my $games = [
        "Boston,13,Atlanta, 27",
        "Dallas,17,Chicago,21",
        "Eugene,30,Fairbanks,41",
        "Atlanta,15,Chicago,3",
        "Eugene,21,Boston,24",
        "Fairbanks,17,Dallas,7",
    ];
    $stats->load_data( $games );
    # add another week of games.
    my $newgames = [
        "Dallas,19,Atlanta,7",
        "Boston,9,Fairbanks,31",
        "Chicago,10,Eugene,30",
    ];
    $stats->add_data( $newgames ); 

This calculation requires at least two teams, and then at least two games per team in order to be successful.

Output: returns 1 on success, croaks on failure.

DIAGNOSTICS

accessors and calculations

 No data are loaded presently.

Data need to be loaded before this value can be returned.

 No data are calculated presently.

Data need to be loaded and simpleranking needs to be run first.

load_data()

 You can only load data once into this object. Use add_data to add more data.

Code attempts to use load_data more than once. Use add_data instead.

 Method load_data requires a reference to a games array.

Either no data passed to load_data, or the wrong kind of data has been passed to load_data. Arrays should be dereferenced: \@array.

 The home score is undefined in array element X. Perhaps you have missed a comma?

This happens when there are less than 3 commas in a data string passed to the method.

 The visitor score field in array element X needs to be a number.

The second field in a game string needs to be a number.

 The home score field in array element X needs to be a number.

The fourth field in a game string needs to be a number.

 Method load_data requires at least two games to analyze data.
 Method load_data requires at least two teams.
 Method load_data requires at least as many games as teams.
 Method load_data requires team T to have played at least two games.

There are certain minimum data requirements for this program to function.

 The number of teams in this data set is exceptionally large.

Happens if you pass more than 1000 teams to this method.

 The number of games in this data set is exceptionally large.
 

Happens if you pass more than 1,000,000 games to this method.

add_data()

 Method add_data requires a reference to a games array.

Either no data passed to add_data, or the wrong kind of data has been passed to add_data. Arrays should be dereferenced: \@array.

 The home score is undefined in array element X. Perhaps you have missed a comma?
 

This happens when there are less than 3 commas in a data string passed to the method.

 The visitor score field in array element X needs to be a number.

The second field in a game string needs to be a number.

 The home score field in array element X needs to be a number.

The fourth field in a game string needs to be a number.

 The number of teams in this data set is exceptionally large.

Happens if you pass more than 1000 teams to this method.

 The number of games in this data set is exceptionally large.
 

Happens if you pass more than 1,000,000 games to this method.

CONFIGURATION AND ENVIRONMENT

No specific issues to note.

DEPENDENCIES

To build, Test::More. The modules List::Util and Carp are needed to build and to run this code.

INCOMPATIBILITIES

None known at this time.

AUTHOR

David Myers, <dwm042 at email.com>

REFERENCES

  algorithm: L<http://www.pro-football-reference.com/blog/?p=37>
  original Perl implementation: L<http://wp.me/p1m41i-8p>
  Pythagorean formula: L<http://en.wikipedia.org/wiki/Pythagorean_expectation>

BUGS AND LIMITATIONS

No known bugs at this time.

Please report any bugs or feature requests to bug-sport-analytics-simpleranking at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Sport-Analytics-SimpleRanking. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

The algorithm requires at least two teams, and at least two games per team to calculate a simple ranking. If you have N teams, a minimum of N games are required in order to do the simple ranking calculation. It could be more, depending on who has played whom.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Sport::Analytics::SimpleRanking

You can also look for information at:

ACKNOWLEDGEMENTS

To Doug Drinen, who manages the Pro Football Reference site, and who has published and promoted the use of the simple rankings system. To GrandFather at Perl Monks, who suggested many improvements in the design of the first versions of this module.

LICENSE AND COPYRIGHT

Copyright (c) 2011 David Myers, <dwm042 at email.com>. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.