Statistics::Covid::Analysis::Model::Simple - Fits the data to various models
Version 0.23
This package contains routine(s) for modelling 2D data. It can be used to model how markers in Statistics::Covid::Datum, like confirmed, etc. vary with time by fitting the series of time, value pairs to a polynomial (c0+c1*x+c2*x^2+...cn*x^n), or an exponential (c0 * c1^x) model.
confirmed
time, value
c0+c1*x+c2*x^2+...cn*x^n
c0 * c1^x
use Statistics::Covid; use Statistics::Covid::Datum; use Statistics::Covid::Utils; use Statistics::Covid::Analysis::Model::Simple; # read data from db $covid = Statistics::Covid->new({ 'config-file' => 't/config-for-t.json', 'debug' => 2, }) or die "Statistics::Covid->new() failed"; # retrieve data from DB for selected locations (in the UK) # data will come out as an array of Datum objects sorted wrt time # (the 'datetimeUnixEpoch' field) my $objs = $covid->select_datums_from_db_for_specific_location_time_ascending( #{'like' => 'Ha%'}, # the location (wildcard) ['Halton', 'Havering'], #{'like' => 'Halton'}, # the location (wildcard) #{'like' => 'Havering'}, # the location (wildcard) 'UK', # the belongsto (could have been wildcarded) ); # create a dataframe my $df = Statistics::Covid::Utils::datums2dataframe({ 'datum-objs' => $objs, 'groupby' => ['name'], 'content' => ['confirmed', 'datetimeUnixEpoch'], }); # convert all 'datetimeUnixEpoch' data to hours, the oldest will be hour 0 for(sort keys %$df){ Statistics::Covid::Utils::discretise_increasing_sequence_of_seconds( $df->{$_}->{'datetimeUnixEpoch'}, # in-place modification 3600 # seconds->hours ) } # do an exponential fit my $ret = Statistics::Covid::Analysis::Model::Simple::fit({ 'dataframe' => $df, 'X' => 'datetimeUnixEpoch', # our X is this field from the dataframe 'Y' => 'confirmed', # our Y is this field 'initial-guess' => {'c1'=>1, 'c2'=>1}, # initial values guess 'exponential-fit' => 1, 'fit-params' => { 'maximum_iterations' => 100000 } }); # fit to a polynomial of degree 10 (max power of x is 10) my $ret = Statistics::Covid::Analysis::Model::Simple::fit({ 'dataframe' => $df, 'X' => 'datetimeUnixEpoch', # our X is this field from the dataframe 'Y' => 'confirmed', # our Y is this field # initial values guess (here ONLY for some coefficients) 'initial-guess' => {'c1'=>1, 'c2'=>1}, 'polynomial-fit' => 10, # max power of x is 10 'fit-params' => { 'maximum_iterations' => 100000 } }); # fit to an ad-hoc formula in 'x' # (see L<Math::Symbolic::Operator> for supported operators) my $ret = Statistics::Covid::Analysis::Model::Simple::fit({ 'dataframe' => $df, 'X' => 'datetimeUnixEpoch', # our X is this field from the dataframe 'Y' => 'confirmed', # our Y is this field # initial values guess (here ONLY for some coefficients) 'initial-guess' => {'c1'=>1, 'c2'=>1}, 'formula' => 'c1*sin(x) + c2*cos(x)', 'fit-params' => { 'maximum_iterations' => 100000 } }); # this is what fit() returns # $ret is a hashref where key=group-name, and # value=[ 3.4, # <<<< mean squared error of the fit # [ # ['c1', 0.123, 0.0005], # <<< coefficient c1=0.123, accuracy 0.00005 (ignore that) # ['c2', 1.444, 0.0005] # <<< coefficient c1=1.444 # ] # and group-name in our example refers to each of the locations selected from DB # in this case data from 'Halton' in 'UK' was fitted on 0.123*1.444^time with an m.s.e=3.4 # This is what the dataframe looks like: # { # Halton => { # confirmed => [0, 0, 3, 4, 4, 5, 7, 7, 7, 8, 8, 8], # datetimeUnixEpoch => [ # 1584262800, # 1584349200, # 1584435600, # 1584522000, # 1584637200, # 1584694800, # 1584781200, # 1584867600, # 1584954000, # 1585040400, # 1585126800, # 1585213200, # ], # }, # Havering => { # confirmed => [5, 5, 7, 7, 14, 19, 30, 35, 39, 44, 47, 70], # datetimeUnixEpoch => [ # 1584262800, # 1584349200, # 1584435600, # 1584522000, # 1584637200, # 1584694800, # 1584781200, # 1584867600, # 1584954000, # 1585040400, # 1585126800, # 1585213200, # ], # }, # } # and after converting the datetimeUnixEpoch values to hours and setting the oldest to t=0 # { # Halton => { # confirmed => [0, 0, 3, 4, 4, 5, 7, 7, 7, 8, 8, 8], # datetimeUnixEpoch => [0, 24, 48, 72, 104, 120, 144, 168, 192, 216, 240, 264], # }, # Havering => { # confirmed => [5, 5, 7, 7, 14, 19, 30, 35, 39, 44, 47, 70], # datetimeUnixEpoch => [0, 24, 48, 72, 104, 120, 144, 168, 192, 216, 240, 264], # }, # }
Tries to fit a model on some 2D data using Algorithm::CurveFit. It knows how to do an exponential fit (c0 * c1^x), a polynomial fit (c0+c1*x+c2*x^2+...cn*x^n) or any other formula Math::Symbolic supports.
It takes a hashref of parameters:
dataframe
$df = { 'China' => { 'confirmed' => [1,2,3], 'datetimeUnixEpoch' => [1584262800, 1584264800, 1584266800], }, 'Italy' => { 'confirmed' => [5,6,7], 'datetimeUnixEpoch' => [1584265800, 1584267800, 1584269800], },
'China' and 'Italy' are completely independent, their datetimeUnixEpoch need not be the same. Such a dataframe can hold any type of data. In our example it's data from this situation. The number of 1st-level and 2nd-level keys can be 1 or more (not just 2 as in the above example). Such a dataframe can be converted from an array of Statistics::Covid::Datum objects using Statistics::Covid::Utils::datums2dataframe. An example of creating it is in the SYNOPSIS, above.
exponential-fit
polynomial-fit
x
c1*x + c2*x^2
a*sin(x) + b*cos(x)
^
**
X
datetimeUnixEpoch
Y
accuracy
formula
initial-guess
Inf
groups
On failure it returns undef. On success it returns a hashref where key=group-name, and # value=[ 3.4, # <<<< mean squared error of the fit # [ # ['c1', 0.123, 0.0005], # <<< coefficient c1=0.123, accuracy 0.00005 (ignore that) # ['c2', 1.444, 0.0005] # <<< coefficient c1=1.444 # ... # for all the coefficients in the input formula (or polynomial) # ] # and group-name in our example refers to each of the locations selected from DB # in this case data from 'Halton' in 'UK' was fitted on 0.123*1.444^time with an m.s.e=3.4
undef
None by default. But Statistics::Covid::Analysis::Model::Simple::fit() is the sub to call. Also the $DEBUG can be set to 1 or more for more verbose output, like $Statistics::Covid::Analysis::Model::Simple::DEBUG=1;
Statistics::Covid::Analysis::Model::Simple::fit()
$DEBUG
$Statistics::Covid::Analysis::Model::Simple::DEBUG=1;
This package relies heavily on Algorithm::CurveFit. The formula notation is exactly the one used by Math::Symbolic.
Statistics::Regression and Statistics::LineFit can be used to do linear regression. Which is a far simpler method that the symbolic approach we take in this package. However, the benefit of our approach is that it can try to fit data with any formula, any model. The cost is that it is slower (for complex cases) and may lack robustness.
Andreas Hadjiprocopis, <bliako at cpan.org>, <andreashad2 at gmail.com>
<bliako at cpan.org>
<andreashad2 at gmail.com>
This module has been put together very quickly and under pressure. There are must exist quite a few bugs.
Please report any bugs or feature requests to bug-statistics-Covid at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Covid. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-statistics-Covid at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc Statistics::Covid::Analysis::Model::Simple
You can also look for information at:
github repository which will host data and alpha releases
RT: CPAN's request tracker (report bugs here)
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Covid
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Statistics-Covid
CPAN Ratings
http://cpanratings.perl.org/d/Statistics-Covid
Search CPAN
http://search.cpan.org/dist/Statistics-Covid/
Information about the basis module DBIx::Class
http://search.cpan.org/dist/DBIx-Class/
Almaz
Copyright 2020 Andreas Hadjiprocopis.
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:
http://www.perlfoundation.org/artistic_license_2_0
Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.
If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.
This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.
This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.
Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
To install Statistics::Covid, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Statistics::Covid
CPAN shell
perl -MCPAN -e shell install Statistics::Covid
For more information on module installation, please visit the detailed CPAN module installation guide.