The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

SYNOPSIS

VERSION

version 0.002

    my $foo = Bio::SNP::Inherit->new(
        manifest_filename => 'manifest.tab',
        data_filename     => 'data.tab'
    );

    #Upon object construction, this outputs a summary file
    #   'data.tab_summary.tab' and a detailed file 'data.tab_abh.tab'
    #   containing parental allele designations for each sample that has
    #   parents defined for it in the manifest file

DESCRIPTION

This is a module for converting Single Nucleotide Polymorphism (SNP) genotype data to parental allele designations. This helps with creating files suitable for mapping, identifying and characterizing crossovers, and also helps with quality control.

SUBROUTINES/METHODS

BUILD

    Since the integrity of the data in the manifest file is absolutely vital,
    building an object fails if there are duplicate sample ids in the
    manifest file.

ATTRIBUTES

manifest_filename

    Name of the file containing information for each sample id

    Required in the constructor

    The first line contains headers and the remaining lines contain
        tab-delimited fields in the following order:

        sample id     or "Institute Sample Label"    (e.g. "WG0096796-DNAA05" )
        sample name   or "Sample name"               (e.g. "B73xB97"          )
        group name    or "Group"                     (e.g. "NAM F1"           )
        parentA       or "Mother"                    (e.g. "WG0096795-DNAA01" )
        parentB       or "Father"                    (e.g. "WG0096796-DNAF01" )
        replicate of  or "Replicate(s)"    (id of sample that this replicates 
                                              e.g. "WG0096796-DNAA05"         ) 
        AxB F1        or "F1 of parentA and parentB" (e.g. "WG0096795-DNAA02" )

    The last four fields can be blank, if they are not applicable. However,
        being blank when they are applicable will result in failure of the
        program to analyze the data properly

data_filename

    Name of the tab-delimited file containing the data to be processed.

    Required in the constructor.

    The text '[Data]' in a line indicates that remaining lines are all data.
    The next line contains column headers, which are in fact the sample ids.
        Sample ids missing from the manifest file will not be processed.
    The next line contains the name of the SNP in the first field and data in
        the remaining fields.

    Data must be in the format of SNP_name{tab}AA{tab}GG{tab}.

OUTPUT FILES

    Upon object construction, two files are produced: one that summarizes the
    input and another that that describes the genotypes of samples in terms of
    their "parents". For example, a sample with a genotype of "CG" whose
    'parentA' has a genotype of "CC" and whose 'parentB' has a genotype of
    "GG" would have a heterozygous genotype, labeled as 'H'.

    Here are the possible allele designations that result:

        Allele designations for informative genotypes:
            A = parentA genotype
            B = parentB genotype
            H = heterozygous genotype

        Allele designations for noninformative genotypes:
            ~ = nonpolymorphic parents (i.e. both parents have same genotype)
            - = missing data
            -- = missing data for at least one parental
            % = polymorphic parent

        Error codes:
            # = conflict of nonpolymorphic expectation, meaning both parents
                    have the same genotype, but the sample has a different
                    genotype. For example, parentA and parentB both have the
                    genotype 'CC', but the sample has a genotype of 'TT'.

            ! = nonparental genotype, meaning each parent has a different
                    genotype, but the sample has at least one allele not seen
                    in either parent. For example, getting 'AG' for the
                    offspring when the parents have 'GG' and 'TT'.
                    (This should not even be seen when the data was obtained
                    from a biallelic assay.)

            !! = genotype of the F1 for parentA x parentB is incongruent with
                    the genotype for parentA

    See the bundled tests for examples.

TODO

    Output report detailing which samples have been processed and in what way.
    Also give descendents and ancestor relationships.

    Document ability to process files using F1 and parentA info (i.e. in the
    absence of parentB info).

    Add simple means of adding map info so that distances and chromosomes are
    output along with the marker names.

    Give crossover info?

    Give introgressions/regions attributable to specific ancestor(s).

    Use benchmarking to find out which (if any) to memoize:
    _nonredundant_chars
    _trim
    _is_comprised_from
    _sorted_characters
    _sort_and_join
    _chars_from
    _sorted_first_two_char

    Test bad file names

DIAGNOSTICS

    TODO

CONFIGURATION AND ENVIRONMENT

   TODO

DEPENDENCIES

   TODO

INCOMPATIBILITIES

   TODO

BUGS

Please report any you find. None have been reported as of the current release.

LIMITATIONS

This is ALPHA code. Use at your own risk. There are some major changes that I want to do to it.

Be consciencious with the preparation of your input files (i.e. manifest file and data file). Correct results depend on correct input files.

AUTHOR

Christopher Bottoms, <molecules at cpan.org>

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Bio::SNP::Inherit

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.

Copyright 2010 Christopher Bottoms.