NAME

Statistics::Autocorrelation - Coefficients for any lag, as correlogram, with significance tests

VERSION

Version 0.06

SYNOPSIS


            
              
              use Statistics::Autocorrelation 0.06;
$acorr = Statistics::Autocorrelation->new();
$coeff = $acorr->coefficient(data => \@data, lag => integer (from 1 to N-1), exact => 0, unbias => 1);
# or load one or more data, optionally update, and test each discretely:
$acorr->load(\@data1, \@data2);
$coeff = $acorr->coeff(index => 0, lag => 1); # default lag => 0

DESCRIPTION

Calculates autocorrelation coefficients for a single series of numerical data, for any valid length of lag.

SUBROUTINES/METHODS

new


            
              
              $acorr = Statistics::Autocorrelation->new();

Return a new class object for accessing its methods. This ISA Statistics::Data object, so all the methods for loading, adding, saving, dumping, etc., data in that package are available here.

coefficient


            
              
              $coeff = $autocorr->coefficient(data => \@data, lag => integer (from 1 to N-1), exact => 0|1, unbias => 1|0, circular => 1|0);
$coeff = $autocorr->coefficient(lag => 1); # using loaded data, and default args (exact = 0, unbias = 1, circular = 0)

Alias: coeff, acf

Returns the autocorrelation coefficient, the ratio of the autocovariance to variance of a sequence at any particular lag, ranging from -1 to +1, as in Chatfield (1975) and Kendall (1973). Specifically,

ρ_k =

γ_k

σ²_k

where k is the lag (see below).

Data can be previously loaded or sent directly here (see Statistics::Data). There must be at least two elements in the data array. A croak will be heard if no data have been loaded or given here.

Options are:

lag

An integer to define how many indices ahead or behind to start correlating the data to itself, as in how many time-intervals separate one value from another. If lag is greater than or equal to number of observations, returns empty string. If the value of lag is less than zero, the calculation is made with its absolute value, given that

ρ_k =

ρ_–k

for all k (so that a coefficient for a lag of -k is equal in magnitude and sign to that for +k). If a value is not given for lag, it is set to the default value of 0.

exact

Boolean value, default = 0. In calculating the autocorrelation coefficient, the convention -- as in corporate stats programs (e.g., SPSS/PASW), and published examples of autocorrelation (e.g., nist.gov), and texts such as Chatfield (1975), and Box and Jenkins (1976) -- is to calculate the sum-of-squares for the autocovariance (the numerator term in the autocorrelation coefficient) from the residuals for each observation x from trial t = 1 (index = 0) to N - k (the lag) relative to the mean of the whole sequence:

γ_k =

_N–k

^t=1

(x_t – x)(x_t+k – x)

rather than the means for each sub-sequence as lagged, and (2) the sum-of-squares for the variance in the denominator as that of the whole sequence:

σ²_k =

_N–k

^t=1

(x_t – x)²

instead of using completely pairwise products. This convention assumes that the series is stationary (has no linear or curvilinear trend, no periodicity), and that the number of observations, N, in the sample is "reasonably large". You get the autocorrelation coefficient with these assumptions, with the above formulations, by default; but if you specify exact => 1, then you get the coefficient as calculated by Kendall (1973) Eq. 3.35, where the sums use not the overall sample mean, but the mean for the first to the N - k elements, and the mean from the k to N elements:

x_k =

N–k

_N–k

^t=1

x_t

, and

x_k´ =

N–k

_N–k

^t=1

x_t+k

Taking each observation relative to these means, the autocovariance in the numerator, and variance in the denominator, are calculated as follows to give the autocorrelation coefficient:

ρ_k =

_N–k

^t=1

(x_t – x_k)(x_t+k – x_k´)

[

_N–k

^t=1

(x_t – x_k)²

]^½ [

_N–k

^t=1

(x_t+k – x_k´)²

]^½

unbias

Boolean, default = 1. In calculating the approximate autocovariance, it is conventional to divide the sum-product of residuals (as given above) by N, but some sources divide by N - lag for less biased estimation, so that

γ_k =

N–k

_N–k

^t=1

(x_t – x)(x_t+k – x)

For the latter, set unbias => 0. This is only effective where circular => 0 and exact => 0.

circular

Boolean value, default = 0: For circularized lagging, set circular => 1.

autocovariance


            
              
              $covar = $autocorr->autocovariance(data => \@data, lag => integer (from 1 to N-1), exact => 0|1, unbias => 1|0, circular => 1|0);
$covar = $autocorr->autocovariance(lag => 1); # using loaded data, and default args (exact = 0, unbias = 1, circular = 0)

Alias: autocov, acvf

Returns the autocovariance; see coefficient for definition and options.

correlogram


            
              
              $href = $autocorr->correlogram(nlags => integer, exact => 1|0, unbias => 1|0, circular => 1|0); # assuming data are loaded
$href = $autocorr->correlogram(nlags => integer, exact => 1|0, unbias => 1|0, circular => 1|0); # assuming data are loaded
$href = $autocorr->correlogram(); # use defaults, with loaded data
$href = $autocorr->correlogram(data => \@data); # same as either of above, but give data here
($lags, $coeffs) = $autocorr->correlogram(); # with args as for either of the above

Alias: coeff_list

Returns the autocorrelation coefficients for lags from 0 to a limit, or (by default) over all possible lags, from 0 to N - 1. If called in array context, returns two references: to an array of the lags, and an array of their respsective coefficients. Otherwise, returns a hash-reference of the coefficients keyed by their respective lags. The limit is given by argument nlags giving the number of lags to return, including the zero lag, as permitted by the data to be referenced. Options are exact, unbias and circular, as defined above for coefficient. The autocorrelation function being symmetric about lag zero, the correlogram is based only on positive lags.

correlogram_chart

Experimental method to print a .png file of the correlogram.

ctest_bartlett


            
              
              $bool = $acorr->ctest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options
($crit, $coeff, $bool) = $acorr->ctest_bartlett(lag => integer, tails => 1|2);

Performs a 95% confidence test of the null hypothesis of no autocorrelation, assuming that the series was generated by a Gaussian white noise process. Following Bartlett (1946), it compares the value of a single correlation coefficient for a given lag with the critical values given tails => 2 (default) or 1:

r_k,.95 =

N^½

where s is a constant equalling 1.96 for a two-tailed, or 1.645 for a one-tailed test. If the absolute value of the sample correlation coefficient falls beyond this critical value, the null hypothesis is rejected at the 95% level.

Returns, if called in array context, a list comprising the critical value, the sample coefficient, and a boolean as to whether the null hypothesis is rejected; otherwise, just the latter boolean.

Accepts all the options as given for coefficient. Note that the critical value is not calculated with respect to the particular value of lag - see ctest_anderson for this.

ctest_anderson


            
              
              $bool = $acorr->ctest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options
($crit, $coeff, $bool) = $acorr->ctest_b(lag => integer, tails => 1|2);

Performs a 95% confidence test of the null hypothesis of no autocorrelation, assuming that the series was generated by a Gaussian white noise process. Following Anderson (1941), it compares the value of a single correlation coefficient for a given lag with the critical values given tails => 2 (default) or 1:

r_k,.95(2-tailed) =

–1 ±1.96(N – k – 1)^½

N – k

r_k,.95(1-tailed) =

–1 + 1.645(N – k – 1)^½

N – k

If the sample correlation coefficient falls outside these bounds, the null hypothesis is rejected at the 95% level.

Returns, if called in array context, a list comprising the critical value, the sample coefficient, and a boolean as to whether the null hypothesis is rejected; otherwise, just the latter boolean.

Accepts all the options as given for coefficient. Note that the critical value is calculated with respect to the particular value of lag - unlike ztest_bartlett.

ztest_bartlett


            
              
              $p_value = $acorr->ztest_bartlett(lag => integer, tails => 1|2); # assuming data are loaded, or see above for alternative and extra options
($z_value, $p_value) = $acorr->ztest_bartlett(lag => integer, tails => 1|2);

Returns the 2- or 1-tailed probability, given tails => 2 (default) or 1, respectively, for the deviation of the observed autocorrelation coefficient at the given lag from the expected value of zero, relative to the variance 1 / N, assuming that the series was generated by a Gaussian white noise process. If called in array context, returns both the actual Z-value and then the p-value. Other options, and methods of assigning the data to test, are as for coefficient.

qtest, boxpierce


            
              
              $p_value = $acorr->qtest(nlags => integer); # assuming data are loaded, or see above for alternative and extra options
($q_value, $df, $p_value) = $acorr->qtest(nlags => integer);

Returns the Q statistic for testing whether a range of autocorrelation coefficients differs from zero, and so if the series was produced by a random process (Box & Pierce, 1970). If called in array context, returns a list giving the value of Q, and, assuming chi-square distribtution, its degrees of freedom (= nlags) and p-value; returns the p-value only if called in scalar context. Other options, and methods of assigning the data to test, are as for coefficient. The range is (by default) over all possible lags from 1 to N - 1. The statistic is defined as follows:

Q =

^k=1

ρ_k²

where M is the largest lag-value to test (= nlags).

REFERENCES

Anderson, R.L. (1941). Distribution of the serial correlation coefficients. Annals of Mathematical Statistics, 8, 1-13.

Bartlett M.S. (1946). On the theoretical specification of sampling properties of autocorrelated time series. Journal of the Royal Statistical Society, 27.

Box, G.E, & Jenkins, G. (1976). Time series analysis: Forecasting and control. San Francisco, US: Holden-Day.

Box, G.E., & Pierce D. (1970). Distribution of residual autocorrelations in ARIMA time series models. Journal of the American Statistical Association, 65, 1509-1526.

Chatfield, C. (1975). The analysis of time series: Theory and practice. London, UK: Chapman and Hall.

Kendall, M. G. (1973). Time-series. London, UK: Griffin.

AUTHOR

Roderick Garton, <rgarton at cpan.org>

DIAGNOSTICS

No data are available: Croaked by most methods if they do not receive data as given in the call by an array ref, or as pre-loaded as per Statistics::Data.
Value given for argument 'nlags' is not valid: Croaked by correlogram when the nlags is not valid: should be no more than the number of data elements less 1.
file opening/printing errors: Croaked by correlogram_chart when it tries to print the chart.

DEPENDENCIES

Statistics::Data - used: base

Statistics::Lite - used: mean

List::AllUtils - used: mesh

Statistics::Zed - required if calling ztest_bartlett

Math::Cephes - required for igamc method is calling qtest

BUGS AND LIMITATIONS

Report to bug-statistics-autocorrelation-0.06 at rt.cpan.org or http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Autocorrelation-0.06.

To do: rho_ctest, rho_ztest

SUPPORT

Find documentation for this module with the perldoc command:


            
              
              perldoc Statistics::Autocorrelation

Also look for information at:

RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Autocorrelation-0.06
AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Statistics-Autocorrelation-0.06
CPAN Ratings

http://cpanratings.perl.org/d/Statistics-Autocorrelation-0.06
Search CPAN

http://search.cpan.org/dist/Statistics-Autocorrelation-0.06/

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

To install Statistics::Autocorrelation, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Statistics::Autocorrelation

CPAN shell

perl -MCPAN -e shell
install Statistics::Autocorrelation

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

SUBROUTINES/METHODS

new

coefficient

autocovariance

correlogram

correlogram_chart

ctest_bartlett

ctest_anderson

ztest_bartlett

qtest, boxpierce

REFERENCES

SEE ALSO

AUTHOR

DIAGNOSTICS

DEPENDENCIES

BUGS AND LIMITATIONS

SUPPORT

LICENSE AND COPYRIGHT

Module Install Instructions