- NAME
- SYNOPSIS
- DESCRIPTION
- README
- INSTALLATION
- PREREQUISITES
- CHANGES
- VARIABLES
- CLASS METHODS
- INSTANCE METHODS
- COPYRIGHT
- AUTHOR
- VERSION
- SEE ALSO
- TODO

# NAME

Bio::SAGE::Comparison - Compares data from serial analysis of gene expression (SAGE) libraries.

# SYNOPSIS

```
use Bio::SAGE::Comparison;
$sage = Bio::SAGE::Comparison->new();
```

# DESCRIPTION

This module provides several tools for comparing data generated from serial analysis of gene expression (SAGE) libraries.

# README

**BACKGROUND**

Serial analysis of gene expression (SAGE) is a molecular technique for generating a near-global snapshot of a cell population’s transcriptome. Briefly, the technique extracts short sequences at defined positions of transcribed mRNA. These short sequences are then paired to form ditags. The ditags are concatamerized to form long sequences that are then cloned. The cloned DNA is then sequenced. Bioinformatic techniques are then employed to determine the original short tag sequences, and to derive their progenitor mRNA. The number of times a particular tag is observed can be used to quantitate the amount of a particular transcript. The original technique was described by Velculescu et al. (1995) and utilized an ~14bp sequence tag. A modified protocol was introduced by Saha et al. (2002) that produced ~21bp tags.

**PURPOSE**

This module facilitates the comparison of SAGE libraries. Specifically:

```
1. Calculations for determining the statistical
significance of expression differences.
2. Dynamically convert longer-tag libraries to
a shorter type for comparison (e.g. comparing
a LongSAGE vs. a regular SAGE library).
```

Both regular SAGE (14mer tag) and LongSAGE (21mer tag) are supported by this module.

Statistical significance in library comparisons is calculated using the method described by Audic and Claverie (1997). Code was generated by directly porting the authors' original C source.

**REFERENCES**

```
Velculescu V, Zhang L, Vogelstein B, Kinzler KW. (1995)
Serial analysis of gene expression. Science. 270:484-487.
Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B,
Kinzler KW, Velculescu V. (2002) Using the transcriptome
to annotate the genome. Nat. Biotechnol. 20:508-512.
Audic S, Claverie JM. (1997) The significance of digital
gene expression profiles. Genome Res. 7:986-995.
```

# INSTALLATION

Follow the usual steps for installing any Perl module:

```
perl Makefile.PL
make test
make install
```

# PREREQUISITES

None.

# CHANGES

` 1.00 2004.05.24 - Initial release.`

# VARIABLES

**Settings**

*$DEBUG*

` Prints debugging output if value if >= 1.`

# CLASS METHODS

## new

Constructor for a new Bio::SAGE::Comparison object.

**Arguments**

` None.`

**Usage**

` my $sage = Bio::SAGE::Comparison->new();`

## calculate_significance $x, $y, $Nx, $Ny, <$signedValue>

Determines the statistical significance of the difference in tag count (expression) between two libraries. This function uses the method described by Audic and Claverie (1997). This method can be called on an instantiated object, as well as statically.

**Arguments**

*$x,$y*

```
The number of tags in the x- and y-axis
libraries, respectively.
```

*$Nx,$Ny*

```
The total number of tags in the x- and y-axis
libraries, respectively.
```

*$signedValue* (optional)

```
A boolean value (>=1 is FALSE). If this flag is
set to TRUE, downregulated comparisons will return
a both p-value and either +1, -1, or 0 to indicate
up/down/same-regulation (i.e. -1 if the expression
ratio of tags in the x-axis library(s) is greater
than that of the y-axis library(s)). This flag
is FALSE by default.
```

**Returns**

```
The p-value for the observation. A lower number is
more significant. Typically, observations with
p <= 0.05 are considered statistically significant.
If $signedValue is set to TRUE, the function also
returns a 0, -1 or +1 to indicate same/down/up-regulation.
```

**Usage**

```
# the function is static, so it can be accessed directly
my $p = Bio::SAGE::Comparison::calculate_significance( 3, 10, 50000, 60000 );
# or:
my ( $p, $sign ) = Bio::SAGE::Comparison::calculate_significance( 3, 10, 50000, 60000, 1 );
if( $p <= 0.05 ) {
if( $sign == +1 ) { print "Significantly upregulated.\n"; }
if( $sign == -1 ) { print "Significantly downregulated.\n"; }
if( $sign == 0 ) { die( "Same expression should never be significant!" ); }
}
```

# INSTANCE METHODS

## load_library $handle

Takes a Perl handle to SAGE library data and prepares a tag data hash (format: [tag]<whitespace>[count]).

**Arguments**

*$handle*

```
A Perl handle (ie. filehandle, STDIN, etc.) that
can be used to read in SAGE library data.
```

**Returns**

```
A hashref containing tag sequences
as keys and the number of times that tag was
observed as the value.
```

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
my %data = %{$sage->load_library( *STDIN )};
# print data in descending order of tag count
map { print join( "\t", $_, $data{$_} ) . "\n"; } sort { $data{$b} <=> {$a} } keys %data;
```

## add_library $label, \%tagData

Adds a library to this object.

**Arguments**

*$label*

```
A unique label for this library data. This label can
then be used to refer to the data.
```

*\%tagData*

```
A hashref containing the library data. The keys
are tag sequences, the values are tag counts. A
properly formatted hash can be created using
load_library.
```

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
my %data = %{$sage->load_library( *STDIN )};
# or alternatively:
my %data = ( 'AACGACTGTT' => 100,
'CAGATACAAG' => 23,
'AGATAAAGAC' => 45 );
$sage->add_library( 'MYLIB', \%data );
```

## get_library_labels

Gets the labels that identify the currently added libraries.

**Arguments**

None.

**Returns**

` An array of library labels.`

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
my %data = %{$sage->load_library( *STDIN )};
$sage->add_library( 'MYLIB', \%data );
print "LIBRARY_NAMES: " . join( ", " , $sage->get_library_labels() ) . "\n";
```

## get_library_size $label

Gets the total number of tags (the sum of all observed tag counts for a library(s)).

**Arguments**

*$label*

```
This can be: a) a string denoting the library label,
b) a reference to a string denoting the library
label, or c) an arrayref containing several
library labels that are pooled to calculate total
size.
```

**Returns**

```
The total number of observed tags in the library(s)
specified.
```

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
my %data = %{$sage->load_library( *STDIN )};
$sage->add_library( 'MYLIB', \%data );
print "LIBRARY_SIZE: " . $sage->get_library_size( 'MYLIB' ) . "\n";
```

## get_number_tag_sequences $label

Gets the number of discrete tag sequences present in the specified library(s).

**Arguments**

*$label*

```
This can be: a) a string denoting the library label,
b) a reference to a string denoting the library
label, or c) an arrayref containing several
library labels that are pooled to calculate the
number of tag sequences.
```

**Returns**

```
The number of different tags in the library(s)
specified.
```

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
my %data = %{$sage->load_library( *STDIN )};
$sage->add_library( 'MYLIB', \%data );
print "TAG_SEQUENCES: " . $sage->get_number_tag_sequences( 'MYLIB' ) . "\n";
```

## get_library_comparison $x_axis_libraries, $y_axis_libraries

Creates a comparison between two libraries. This method returns a hash reference containing the library data and a p-value for determining statistical signifance.

If the libraries contain tags of different sizes (i.e. comparing a regular SAGE library vs. a LongSAGE library) the data is converted to the shortest tag length of the libraries specified prior to comparison.

**Arguments**

*$x_axis_libraries,$y_axis_libraries*

```
Library labels for the x- and y-axis, respectively.
This can be: a) a string denoting the library label,
b) a reference to a string denoting the library
label, or c) an arrayref containing several
library labels that are pooled in the comparison.
```

**Returns**

```
A hashref is returned where the keys are tag sequences,
and the values are hashrefs with the keys 'x'
(tags in x-axis library(s)), 'y' (tags in y-axis
library(s)), 'reg' (0,-1,+1 denoting same/down/up-regulation),
and 'p' (statistical significance).
i.e. $HASH{$tag}->{'p'} = 0.05;
```

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
open( LIB1, "lib1.tags.txt" );
$sage->add_library( 'LIB1', $sage->load_library( *LIB1 ) );
close( LIB1 );
open( LIB2, "lib2.tags.txt" );
$sage->add_library( 'LIB2', $sage->load_library( *LIB2 ) );
close( LIB2 );
my %comparison = %{$sage->get_library_comparison( 'LIB1', 'LIB2' );
# or alternatively:
my %comparison = %{$sage->get_library_comparison( ['LIB1'], ['LIB2'] );
# print results in ascending order of p-value (more significant first)
foreach my $tag ( sort { $comparison{$a}->{'p'} <=> $comparison{$b}->{'p'} } keys %comparison ) {
print join( "\t", $tag, map { $comparison{$tag}->{$_} } ( 'x','y','reg','p' ) ) . "\n";
}
```

## print_library_comparison \%comparison

Prints a report based on the supplied comparison result hash.

**Arguments**

*\%comparison*

```
A properly formed hashref containing library
comparison results. This structure can be created
with get_library_comparison.
A sample output looks like:
Tag N(x) N(y) reg p
AGATCAAGAT 3388 50 -1 0
GATAACACTT 11481 186 -1 0
TATAACACCA 4 607 1 4.1136713480649e-306
... etc.
```

**Usage**

```
my $sage = Bio::SAGE::Comparison->new();
# load library data
# ...
$sage->print_library_comparison( $sage->get_library_comparison( 'LIB1','LIB2' ) );
```

# COPYRIGHT

Copyright(c)2004 Scott Zuyderduyn <scottz@bccrc.ca>. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

# AUTHOR

Scott Zuyderduyn <scottz@bccrc.ca> BC Cancer Research Centre

# VERSION

` 1.00`

# SEE ALSO

perl(1).

# TODO

` Nothing yet.`

1 POD Error

The following errors were encountered while parsing the POD:

- Around line 33:
Non-ASCII character seen before =encoding in 'population’s'. Assuming ISO8859-1