# NAME

Statistics::Sequences::Joins - The Joins Test: Wishart-Hirschfeld statistics for frequency of alternations in a dichotomous sequence

# VERSION

This is documentation for Version 0.20 of Statistics::Sequences::Joins.

# SYNOPSIS

`````` use Statistics::Sequences::Joins 0.20;
my \$joins = Statistics::Sequences::Joins->new();
\$joins->load([1, 0, 0, 0, 1, 1, 0, 1, 1, 1]); # bi-valued sequence
my \$val = \$joins->observed(); # or give "data => AREF" to stat methods
\$val = \$joins->expected(trials => 10, prob => .5); # sufficient, independent of data
\$val = \$joins->variance(trials => 10, prob => .5); # same
\$val = \$joins->z_value(tails => 1, ccorr => 1); # use loaded data
my (\$z, \$p) = \$joins->z_value(tails => 1, ccorr => 1); # as above, but wantarray for z- and p-value
\$p = \$joins->p_value(tails => 1); # using loaded data
\$val = \$joins->z_value(trials => 10, observed => 4, tails => 1, ccorr => 1); # sufficicent
my \$href = \$joins->stats_hash(values => {observed => 1, p_value => 1}); # or other methods as attribs in the hashref
# print values to STDOUT:
\$joins->dump(values => {observed => 1, expected => 1, p_value => 1}, format => 'line', flag => 1, precision_s => 3, precision_p => 7);``````

# DESCRIPTION

A sequence of dichotomous, binary-valued, two-element events consists of zero or more alternations (or "joins") of those events. For example, joins are marked out with asterisks for the following sequence:

`````` 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 1 1 0 0
* *     * *       *   *     *       *``````

So there's a join (of 0 and 1) at indices 1 and 2 (from zero), then immediately another join (of 1 and 0) at indices 2 and 3, and then another join at 5 and 6 ... for a total joincount of eight.

This module provides methods to calculate and return this observed joincount, and also the expected joincount and its variance for the number of trials and probability of each event, following the limiting form of the probability distribution of the number of joins in a binary-valued sequence given by Wishart and Hirschfeld (1936). This assumes that the probability that an event can take one or another value at each trial is constant over all trials. The concept might seem similar to runs but runs are counted for each continuous segment between alternations, while it is blind to the length of these repetitions and even to event-probabilities.

# METHODS

Methods include those described in Statistics::Sequences, and have the same form as those in its other sub-modules, but naturally have specific operations as follows.

## new

`` \$joins = Statistics::Sequences::Joins->new();``

Returns a new Joins object. Expects/accepts no arguments but the classname.

`````` \$joins->load(@data); # anonymously
\$joins->load('sample1' => \@data); # labelled whatever``````

Loads data anonymously or by name - see load in the Statistics::Data manpage for details on the various ways data can be loaded and then retrieved (more than shown here). Here, the data are checked to ensure that they contain no more than two unique elements--if not, a `carp` and return of 0 occurs. Every load unloads all previous loads and any additions to them.

Alternatively, skip this action; data don't have to be pre-loaded to use the stats methods here (see below).

See Statistics::Data for these additional operations on data that have been loaded.

## observed

`````` \$count = \$joins->observed(); # assumes data have already been loaded
\$count = \$joins->observed(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1]);``````

Returns the number of joins (or alternations) in a sequence - i.e., when, from the second trial onwards, the event on trial i doesn't equal the event on trial i - 1. For example, the following sequence adds up to 7 joins:

`````` Sequence:  1 0 0 0 1 0 0 1 0 1 1 0
JoinCount: 0 1 1 1 2 3 3 4 5 6 6 7``````

Formally, for a sequence A = {a_i} indexed from zero,

J
 N–1 Σ i=1
}
 0, ai = ai–1 1, otherwise

The sequence to test can have been already loaded, or it can be sent directly to this method, keyed as data. If no data are found by either of these ways, a `croak` is heard.

## expected

`````` \$val = \$joins->expected(); # assumes data already loaded, uses default prob value (.5)
\$val = \$joins->expected(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1]); # count these data, use default prob value (.5)
\$val = \$joins->expected(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1], prob => .2); # count these data, use given prob value
\$val = \$joins->expected(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1], state => 1); # count off trial numbers and prob. of event
\$val = \$joins->expected(prob => .2, trials => 10); # use this trial number and probability of one of the 2 events``````

Returns the expected number of joins between the two possible elements of the given data, or for data of the given attributes, from Wishart and Hirschfeld (1936, p. 228):

E[J] = 2(N – 1)pq

where N is the number of observations/trials, p is the expected probability of the joined event taking on its observed value, and q is (1 - p), the expected probability of the joined event not taking on its observed value.

The data to test can already have been loaded, or you send it directly keyed as data. The data are only needed to count off the number of trials, and the proportion of 1s (or other given state of the two), if the trials and prob attributes are not defined. If state is defined, then prob is worked out from the actual data (as long as there are some, or 1/2 is assumed). If state is not defined, prob takes the value you give to it, or, if it too is not defined, then 1/2 (assuming equiprobability of the two events).

Counting up the observed number of joins needs some data to count through, but getting the expectation and variance for the joincount can just be fed with the number of trials, and the probability of one of the two events.

## variance

`````` \$val = \$joins->variance(); # assume data already "loaded" for counting
\$val = \$joins->variance(data => \$aref); # use inplace array reference, will use default prob of 1/2
\$val = \$joins->variance(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1]); # count off trial numbers and prob. of event
\$val = \$joins->variance(data => [1, 0, 0, 0, 1, 0, 0, 1, 0, 1], prob => prob); # specify the event prob (recommended)
\$val = \$joins->variance(trials => number, prob => prob); # sufficient statistics``````

Returns the expected variance in the number of joins for the given data, as estimated in Wishart and Hirschfeld (1936, p. 232), with a correction for small N (the second term) given by Burdick and Kelly (1977, p. 106, Eq. 20) that is trivial for very large N:

V[J] = 4Npq(1 – 3pq) – 2pq(3 – 10pq)

with variables defined as above for expected. The default operation applies the Burdick-Kelly correction; this can be dodged by specifying ncorr => 0.

The data to test can already have been loaded, or it is given directly, keyed as data. The data are only needed to count off the number of trials, and estimate the expected probability of the joined event, if the trials and prob attributes aren't defined. If state is defined, then prob is worked out from the actual data (as long as there are some, or expect a `croak`). If state is not defined, prob takes the given value or, if it too is not defined, then 1/2 (assuming equiprobability of the two events).

## obsdev

`````` \$v = \$joins->obsdev(); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
\$v = \$joins->obsdev(data => [qw/blah bing blah blah blah/]); # use these data
\$v = \$joing->obsdev(observed => NUM, trials => NUM, prop => PROB); # sufficient``````

Returns the observed deviation: the observed less expected joincount for the loaded/given sequence (O - E). Alias: `observed_deviation`. Alternatively, the observed value might be given (as observed => NUM), and so the method only has to get the expected value (as specified in expected).

## stdev

`````` \$v = \$joins->stdev(); # use data already loaded - anonymously; or specify its "label" or "index" - see observed()
\$v = \$joins->stdev(data => [qw/blah bing blah blah blah/]);``````

Returns the standard deviation (square-root of the variance). Alias: `stantard_deviation`.

## z_value

`````` \$val = \$joins->z_value(); # data already loaded, use default windows and prob
\$val = \$joins->z_value(data => \$aref, prob => .5, ccorr => 1, ncorr => 1);
(\$zvalue, \$pvalue) =  \$joins->z_value(data => \$aref, prob => .5, ccorr => 1, tails => 2); # same but wanting an array, get the p-value too``````

Returns the Z-score from a test of joincount deviation, taking the joincount expected away from that observed and dividing by the root expected joincount variance, by default with a continuity correction (ccorr) to expectation. Called in list context, returns the Z-score with its p-value for the tails (1 or 2) specified (2 by default).

The data to test can already have been loaded, or it is given directly, keyed as data.

Other options are precision_s (for the z_value) and precision_p (for the p_value), and ncorr for the (default) correction for small N.

## p_value

`````` \$p = \$joins->p_value(); # using loaded data and default args
\$p = \$joins->p_value(ccorr => 0|1, tails => 1|2); # as above, with options
\$p = \$joins->p_value(data => [1, 0, 1, 1, 0]); #  directly giving data (by-passing load and read)
\$p = \$joins->p_value(trials => NUM, observed => NUM, prob => PROB); # without using data``````

Returns the normal probability value for Z-value given by taking the joincount expected away from that observed and dividing by the root expected joincount variance, by default with a continuity correction (ccorr) to expectation and with tails => 2. Data are those already loaded, or as directly keyed as data. In the absence of "data", the sufficient statistics of trials and prob are required (or, by default, prob => 1/2 is used).

## stats_hash

`` \$href = \$joins->stats_hash(values => {observed => 1, expected => 1, variance => 1, z_value => 1, p_value => 1}, prob => .5, ccorr => 1);``

Returns a hashref for the counts and stats as specified in its "values" argument, and with any options for calculating them (e.g., exact for p_value). See "stats_hash" in Statistics::Sequences for details. If calling via a "joins" object, the option "stat => 'joins'" is not needed (unlike when using the parent "sequences" object).

## dump

`` \$joins->dump(values => { observed => 1, variance => 1, p_value => 1}, exact => 1, flag => 1,  precision_s => 3); # among other options``

Print Joins-test results to STDOUT. See dump in the Statistics::Sequences manpage for details.

# EXAMPLE

## Seating at the diner

This is the data from Swed and Eisenhart (1943) also given as an example for the Runs test, Vnomes (serial) test and Turns test. It lists the occupied (O) and empty (E) seats in a row at a lunch counter. Have people taken up their seats on a random basis - or do they show some social phobia (more sparsely seated than "chance"), or are they trying to pick up (more compactly seated than "chance")?

`````` use Statistics::Sequences::Joins;
my \$joins = Statistics::Sequences::Joins->new();
\$joins->load([qw/E O E E O E E E O E E E O E O E/]); # as per Statistics::Data
\$joins->dump(
format => 'labline',
flag => 1,
precision_s => 3,
precision_p => 3,
verbose => 1,
);``````

This prints:

`` Joins: observed = 10.000, p_value = 0.302``

So, the observed number of joins in the seating arrangements did not differ from that expected within the bounds of chance, at the .05 level.This test is, then, more conservative for these data than the the Runs, Turns, and Vnomes (trinomes) tests, which showed marginal significance. Checking the number of joins expected ( = 7.5) suggests only a small and inconsistent tendency for people to take their seats apart from each other.

## Score fluctuation

Rhine et al. (1943, App. 8, p. 381) describe an application of the Wishart-Hirschfeld test for testing the consistency of a sequence of values about a criterion value. Specifically, they test for fluctuation of a set of scores derived from runs of a guessing task with a constant probability of success. In their example, there are 25 trials-per-run, each run with a mean chance expectation (MCE) of 5. To test if the scores deviate about MCE more or less often than expected by chance, they count the joins as occurring when two consecutive scores fall below and then above, or above and then below, the MCE. So, with a bar in the following sequence of 15 run-scores, there are 4 joins: 78656|455012|6|45|8. The test can be made by transforming the data dichotomously (see Statistics::Data::Dichotomize). The Joins test so becomes something akin to Kendall's Turns test although that test is sensitive to trial-by-trial fluctuations, i.e., about neighbouring values in the sequence, rather than, as with this application of the Joins test, to fluctuations of each and every score about a criterion value (that might not necessarily even appear in the sequence).

Burdick, D. S., & Kelly, E. F. (1977). Statistical methods in parapsychological research. In B. B. Wolman (Ed.), Handbook of parapsychology (pp. 81-130). New York, NY, US: Van Nostrand Reinhold.

Pratt, J. G., Rhine, J. B., Smith, B. M., Stuart, C. E., & Greenwood, J. A. (1940). Extra-sensory perception after sixty years. New York, NY, US: Henry Holt.

Wishart, J. & Hirschfeld, H. O. (1936). A theorem concerning the distribution of joins between line segments. Journal of the London Mathematical Society, 11, 227-235. doi:10.1112/jlms/s1-11.3.227

Statistics::Sequences::Runs : An analogous test.

Statistics::Sequences::Pot : Another, more recent test of sequential structure.

Statistics::Data::Dichotomize for transforming numerical or categorical non-dichotomous data into a dichotomous, two-element sequence.

# SUPPORT

You can find documentation for this module with the perldoc command.

``    perldoc Statistics::Sequences::Joins``

You can also look for information at:

# AUTHOR

Roderick Garton, `<rgarton at cpan.org>`