Statistics::Autocorrelation - Coefficients for any lag, as correlogram, with significance tests
Version 0.06
use Statistics::Autocorrelation 0.06; $acorr = Statistics::Autocorrelation->new(); $coeff = $acorr->coefficient(data => \@data, lag => integer (from 1 to N-1), exact => 0, unbias => 1); # or load one or more data, optionally update, and test each discretely: $acorr->load(\@data1, \@data2); $coeff = $acorr->coeff(index => 0, lag => 1); # default lag => 0
Calculates autocorrelation coefficients for a single series of numerical data, for any valid length of lag.
$acorr = Statistics::Autocorrelation->new();
Return a new class object for accessing its methods. This ISA Statistics::Data object, so all the methods for loading, adding, saving, dumping, etc., data in that package are available here.
$coeff = $autocorr->coefficient(data => \@data, lag => integer (from 1 to N-1), exact => 0|1, unbias => 1|0, circular => 1|0); $coeff = $autocorr->coefficient(lag => 1); # using loaded data, and default args (exact = 0, unbias = 1, circular = 0)
Alias: coeff, acf
coeff
acf
Returns the autocorrelation coefficient, the ratio of the autocovariance to variance of a sequence at any particular lag, ranging from -1 to +1, as in Chatfield (1975) and Kendall (1973). Specifically,
where k is the lag (see below).
Data can be previously loaded or sent directly here (see Statistics::Data). There must be at least two elements in the data array. A croak will be heard if no data have been loaded or given here.
Options are:
An integer to define how many indices ahead or behind to start correlating the data to itself, as in how many time-intervals separate one value from another. If lag is greater than or equal to number of observations, returns empty string. If the value of lag is less than zero, the calculation is made with its absolute value, given that
for all k (so that a coefficient for a lag of -k is equal in magnitude and sign to that for +k). If a value is not given for lag, it is set to the default value of 0.
Boolean value, default = 0. In calculating the autocorrelation coefficient, the convention -- as in corporate stats programs (e.g., SPSS/PASW), and published examples of autocorrelation (e.g., nist.gov), and texts such as Chatfield (1975), and Box and Jenkins (1976) -- is to calculate the sum-of-squares for the autocovariance (the numerator term in the autocorrelation coefficient) from the residuals for each observation x from trial t = 1 (index = 0) to N - k (the lag) relative to the mean of the whole sequence:
rather than the means for each sub-sequence as lagged, and (2) the sum-of-squares for the variance in the denominator as that of the whole sequence:
instead of using completely pairwise products. This convention assumes that the series is stationary (has no linear or curvilinear trend, no periodicity), and that the number of observations, N, in the sample is "reasonably large". You get the autocorrelation coefficient with these assumptions, with the above formulations, by default; but if you specify exact => 1, then you get the coefficient as calculated by Kendall (1973) Eq. 3.35, where the sums use not the overall sample mean, but the mean for the first to the N - k elements, and the mean from the k to N elements:
Taking each observation relative to these means, the autocovariance in the numerator, and variance in the denominator, are calculated as follows to give the autocorrelation coefficient:
Boolean, default = 1. In calculating the approximate autocovariance, it is conventional to divide the sum-product of residuals (as given above) by N, but some sources divide by N - lag for less biased estimation, so that
For the latter, set unbias => 0. This is only effective where circular => 0 and exact => 0.
Boolean value, default = 0: For circularized lagging, set circular => 1.
$covar = $autocorr->autocovariance(data => \@data, lag => integer (from 1 to N-1), exact => 0|1, unbias => 1|0, circular => 1|0); $covar = $autocorr->autocovariance(lag => 1); # using loaded data, and default args (exact = 0, unbias = 1, circular = 0)
Alias: autocov, acvf
autocov
acvf
Returns the autocovariance; see coefficient for definition and options.
$href = $autocorr->correlogram(nlags => integer, exact => 1|0, unbias => 1|0, circular => 1|0); # assuming data are loaded $href = $autocorr->correlogram(nlags => integer, exact => 1|0, unbias => 1|0, circular => 1|0); # assuming data are loaded $href = $autocorr->correlogram(); # use defaults, with loaded data $href = $autocorr->correlogram(data => \@data); # same as either of above, but give data here ($lags, $coeffs) = $autocorr->correlogram(); # with args as for either of the above
Alias: coeff_list
coeff_list
Returns the autocorrelation coefficients for lags from 0 to a limit, or (by default) over all possible lags, from 0 to N - 1. If called in array context, returns two references: to an array of the lags, and an array of their respsective coefficients. Otherwise, returns a hash-reference of the coefficients keyed by their respective lags. The limit is given by argument nlags giving the number of lags to return, including the zero lag, as permitted by the data to be referenced. Options are exact, unbias and circular, as defined above for coefficient. The autocorrelation function being symmetric about lag zero, the correlogram is based only on positive lags.
Experimental method to print a .png file of the correlogram.
$bool = $acorr->ctest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options ($crit, $coeff, $bool) = $acorr->ctest_bartlett(lag => integer, tails => 1|2);
Performs a 95% confidence test of the null hypothesis of no autocorrelation, assuming that the series was generated by a Gaussian white noise process. Following Bartlett (1946), it compares the value of a single correlation coefficient for a given lag with the critical values given tails => 2 (default) or 1:
where s is a constant equalling 1.96 for a two-tailed, or 1.645 for a one-tailed test. If the absolute value of the sample correlation coefficient falls beyond this critical value, the null hypothesis is rejected at the 95% level.
Returns, if called in array context, a list comprising the critical value, the sample coefficient, and a boolean as to whether the null hypothesis is rejected; otherwise, just the latter boolean.
Accepts all the options as given for coefficient. Note that the critical value is not calculated with respect to the particular value of lag - see ctest_anderson for this.
$bool = $acorr->ctest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options ($crit, $coeff, $bool) = $acorr->ctest_b(lag => integer, tails => 1|2);
Performs a 95% confidence test of the null hypothesis of no autocorrelation, assuming that the series was generated by a Gaussian white noise process. Following Anderson (1941), it compares the value of a single correlation coefficient for a given lag with the critical values given tails => 2 (default) or 1:
If the sample correlation coefficient falls outside these bounds, the null hypothesis is rejected at the 95% level.
Accepts all the options as given for coefficient. Note that the critical value is calculated with respect to the particular value of lag - unlike ztest_bartlett.
$p_value = $acorr->ztest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options ($z_value, $p_value) = $acorr->ztest_bartlett(lag => integer, tails => 1|2);
Returns the 2- or 1-tailed probability, given tails => 2 (default) or 1, respectively, for the deviation of the observed autocorrelation coefficient at the given lag from the expected value of zero, relative to the variance 1 / N, assuming that the series was generated by a Gaussian white noise process. If called in array context, returns both the actual Z-value and then the p-value. Other options, and methods of assigning the data to test, are as for coefficient.
$p_value = $acorr->qtest(nlags => integer); # assuming data are loaded, or see above for alternative and extra options ($q_value, $df, $p_value) = $acorr->qtest(nlags => integer);
Returns the Q statistic for testing whether a range of autocorrelation coefficients differs from zero, and so if the series was produced by a random process (Box & Pierce, 1970). If called in array context, returns a list giving the value of Q, and, assuming chi-square distribtution, its degrees of freedom (= nlags) and p-value; returns the p-value only if called in scalar context. Other options, and methods of assigning the data to test, are as for coefficient. The range is (by default) over all possible lags from 1 to N - 1. The statistic is defined as follows:
where M is the largest lag-value to test (= nlags).
Anderson, R.L. (1941). Distribution of the serial correlation coefficients. Annals of Mathematical Statistics, 8, 1-13.
Bartlett M.S. (1946). On the theoretical specification of sampling properties of autocorrelated time series. Journal of the Royal Statistical Society, 27.
Box, G.E, & Jenkins, G. (1976). Time series analysis: Forecasting and control. San Francisco, US: Holden-Day.
Box, G.E., & Pierce D. (1970). Distribution of residual autocorrelations in ARIMA time series models. Journal of the American Statistical Association, 65, 1509-1526.
Chatfield, C. (1975). The analysis of time series: Theory and practice. London, UK: Chapman and Hall.
Kendall, M. G. (1973). Time-series. London, UK: Griffin.
Statistics::SerialCorrelation (at cpan). Returns single autocorrelation coefficient which, with the present modules, would be given by coefficient given lag => 1, circular => 1 (and the defaults exact => 0, unbias => 0).
Roderick Garton, <rgarton at cpan.org>
<rgarton at cpan.org>
Croaked by most methods if they do not receive data as given in the call by an array ref, or as pre-loaded as per Statistics::Data.
Croaked by correlogram when the nlags is not valid: should be no more than the number of data elements less 1.
Croaked by correlogram_chart when it tries to print the chart.
Statistics::Data - used: base
Statistics::Lite - used: mean
List::AllUtils - used: mesh
Statistics::Zed - required if calling ztest_bartlett
Math::Cephes - required for igamc method is calling qtest
Report to bug-statistics-autocorrelation-0.06 at rt.cpan.org or http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Autocorrelation-0.06.
bug-statistics-autocorrelation-0.06 at rt.cpan.org
To do: rho_ctest, rho_ztest
Find documentation for this module with the perldoc command:
perldoc Statistics::Autocorrelation
Also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Autocorrelation-0.06
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/Statistics-Autocorrelation-0.06
CPAN Ratings
http://cpanratings.perl.org/d/Statistics-Autocorrelation-0.06
Search CPAN
http://search.cpan.org/dist/Statistics-Autocorrelation-0.06/
Copyright 2011-2014 Roderick Garton.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Statistics::Autocorrelation, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Statistics::Autocorrelation
CPAN shell
perl -MCPAN -e shell install Statistics::Autocorrelation
For more information on module installation, please visit the detailed CPAN module installation guide.