The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Chado::Schema::Result::Sequence::Feature

DESCRIPTION

A feature is a biological sequence or a section of a biological sequence, or a collection of such sections. Examples include genes, exons, transcripts, regulatory regions, polypeptides, protein domains, chromosome sequences, sequence variations, cross-genome match regions such as hits and HSPs and so on; see the Sequence Ontology for more. The combination of organism_id, uniquename and type_id should be unique.

ACCESSORS

feature_id

  data_type: 'integer'
  is_auto_increment: 1
  is_nullable: 0
  sequence: 'feature_feature_id_seq'

dbxref_id

  data_type: 'integer'
  is_foreign_key: 1
  is_nullable: 1

An optional primary public stable identifier for this feature. Secondary identifiers and external dbxrefs go in the table feature_dbxref.

organism_id

  data_type: 'integer'
  is_foreign_key: 1
  is_nullable: 0

The organism to which this feature belongs. This column is mandatory.

name

  data_type: 'varchar'
  is_nullable: 1
  size: 255

The optional human-readable common name for a feature, for display purposes.

uniquename

  data_type: 'text'
  is_nullable: 0

The unique name for a feature; may not be necessarily be particularly human-readable, although this is preferred. This name must be unique for this type of feature within this organism.

residues

  data_type: 'text'
  is_nullable: 1

A sequence of alphabetic characters representing biological residues (nucleic acids, amino acids). This column does not need to be manifested for all features; it is optional for features such as exons where the residues can be derived from the featureloc. It is recommended that the value for this column be manifested for features which may may non-contiguous sublocations (e.g. transcripts), since derivation at query time is non-trivial. For expressed sequence, the DNA sequence should be used rather than the RNA sequence. The default storage method for the residues column is EXTERNAL, which will store it uncompressed to make substring operations faster.

seqlen

  data_type: 'integer'
  is_nullable: 1

The length of the residue feature. See column:residues. This column is partially redundant with the residues column, and also with featureloc. This column is required because the location may be unknown and the residue sequence may not be manifested, yet it may be desirable to store and query the length of the feature. The seqlen should always be manifested where the length of the sequence is known.

md5checksum

  data_type: 'char'
  is_nullable: 1
  size: 32

The 32-character checksum of the sequence, calculated using the MD5 algorithm. This is practically guaranteed to be unique for any feature. This column thus acts as a unique identifier on the mathematical sequence.

type_id

  data_type: 'integer'
  is_foreign_key: 1
  is_nullable: 0

A required reference to a table:cvterm giving the feature type. This will typically be a Sequence Ontology identifier. This column is thus used to subclass the feature table.

is_analysis

  data_type: 'boolean'
  default_value: false
  is_nullable: 0

Boolean indicating whether this feature is annotated or the result of an automated analysis. Analysis results also use the companalysis module. Note that the dividing line between analysis and annotation may be fuzzy, this should be determined on a per-project basis in a consistent manner. One requirement is that there should only be one non-analysis version of each wild-type gene feature in a genome, whereas the same gene feature can be predicted multiple times in different analyses.

is_obsolete

  data_type: 'boolean'
  default_value: false
  is_nullable: 0

Boolean indicating whether this feature has been obsoleted. Some chado instances may choose to simply remove the feature altogether, others may choose to keep an obsolete row in the table.

timeaccessioned

  data_type: 'timestamp'
  default_value: current_timestamp
  is_nullable: 0
  original: {default_value => \"now()"}

For handling object accession or modification timestamps (as opposed to database auditing data, handled elsewhere). The expectation is that these fields would be available to software interacting with chado.

timelastmodified

  data_type: 'timestamp'
  default_value: current_timestamp
  is_nullable: 0
  original: {default_value => \"now()"}

For handling object accession or modification timestamps (as opposed to database auditing data, handled elsewhere). The expectation is that these fields would be available to software interacting with chado.

RELATIONS

analysisfeatures

Type: has_many

Related object: Bio::Chado::Schema::Result::Companalysis::Analysisfeature

cell_line_features

Type: has_many

Related object: Bio::Chado::Schema::Result::CellLine::CellLineFeature

elements

Type: has_many

Related object: Bio::Chado::Schema::Result::Mage::Element

type

Type: belongs_to

Related object: Bio::Chado::Schema::Result::Cv::Cvterm

dbxref

Type: belongs_to

Related object: Bio::Chado::Schema::Result::General::Dbxref

organism

Type: belongs_to

Related object: Bio::Chado::Schema::Result::Organism::Organism

feature_cvterms

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::FeatureCvterm

feature_dbxrefs

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::FeatureDbxref

feature_expressions

Type: has_many

Related object: Bio::Chado::Schema::Result::Expression::FeatureExpression

feature_genotype_features

Type: has_many

Related object: Bio::Chado::Schema::Result::Genetic::FeatureGenotype

feature_genotype_chromosomes

Type: has_many

Related object: Bio::Chado::Schema::Result::Genetic::FeatureGenotype

featureloc_features

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::Featureloc

featureloc_srcfeatures

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::Featureloc

feature_phenotypes

Type: has_many

Related object: Bio::Chado::Schema::Result::Phenotype::FeaturePhenotype

featurepos_feature

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurepos

featurepos_map_features

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurepos

featureprops

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::Featureprop

feature_pubs

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::FeaturePub

featurerange_leftendfs

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurerange

featurerange_rightstartfs

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurerange

featurerange_rightendfs

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurerange

featurerange_leftstartfs

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurerange

featurerange_features

Type: has_many

Related object: Bio::Chado::Schema::Result::Map::Featurerange

feature_relationship_subjects

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::FeatureRelationship

feature_relationship_objects

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::FeatureRelationship

feature_synonyms

Type: has_many

Related object: Bio::Chado::Schema::Result::Sequence::FeatureSynonym

library_features

Type: has_many

Related object: Bio::Chado::Schema::Result::Library::LibraryFeature

phylonodes

Type: has_many

Related object: Bio::Chado::Schema::Result::Phylogeny::Phylonode

studyprop_features

Type: has_many

Related object: Bio::Chado::Schema::Result::Mage::StudypropFeature

ADDITIONAL RELATIONSHIPS

parent_relationships

Type: has_to_many

Returns a list of parent relationships.

Related object: Bio::Chado::Schema::Result::Sequence::FeatureRelationship

child_relationships

Type: has_to_many

Returns a list of child relationships.

Related object: Bio::Chado::Schema::Result::Sequence::FeatureRelationship

primary_dbxref

Alias for dbxref

MANY-TO-MANY RELATIONSHIPS

parent_features

Type: many_to_many

Returns a list of parent features (i.e. features that are the object of feature_relationship rows in which this feature is the subject).

Related object: Bio::Chado::Schema::Result::Sequence::Feature

child_features

Type: many_to_many

Returns a list of child features (i.e. features that are the subject of feature_relationship rows in which this feature is the object).

Related object: Bio::Chado::Schema::Result::Sequence::Feature

synonyms

Type: many_to_many

Related object: Bio::Chado::Schema::Result::Sequence::Synonym

dbxrefs_mm

Type: many_to_many

Related object: Bio::Chado::Schema::Result::General::Dbxref (i.e. dbxref table) Bio::Chado::Schema::Result::Sequence::FeatureDbxref (feature_dbxref table)

secondary_dbxrefs

Alias for dbxrefs_mm

ADDITIONAL METHODS

create_featureprops

  Usage: $set->create_featureprops({ baz => 2, foo => 'bar' });
  Desc : convenience method to create feature properties using cvterms
          from the ontology with the given name
  Args : hashref of { propname => value, ...},
         options hashref as:
          {
            autocreate => 0,
               (optional) boolean, if passed, automatically create cv,
               cvterm, and dbxref rows if one cannot be found for the
               given featureprop name.  Default false.

            cv_name => cv.name to use for the given featureprops.
                       Defaults to 'feature_property',

            db_name => db.name to use for autocreated dbxrefs,
                       default 'null',

            dbxref_accession_prefix => optional, default
                                       'autocreated:',
            definitions => optional hashref of:
                { cvterm_name => definition,
                }
             to load into the cvterm table when autocreating cvterms

             allow_duplicate_values => default false.
                If true, allow duplicate instances of the same cvterm
                and value in the properties of the feature.  Duplicate
                values will have different ranks.
          }
  Ret  : hashref of { propname => new featureprop object }

search_featureprops

  Status  : public
  Usage   : $feat->search_featureprops( 'description' )
            # OR
            $feat->search_featureprops({ name => 'description'})
  Returns : DBIx::Class::ResultSet like other search() methods
  Args    : single string to match cvterm name,
            or hashref of search criteria.  This is passed
            to $chado->resultset('Cv::Cvterm')
                     ->search({ your criteria })

  Convenience method to search featureprops for a feature that
  match to Cvterms having the given criterion hash

Bio::PrimarySeqI METHODS

The methods below are intended to provide some compatibility with BioPerl's Bio::PrimarySeqI interface, so that a feature may be used as a sequence. Note that Bio::PrimarySeqI only provides identifier, accession, and sequence information, no subfeatures, ranges, or the like.

Support for BioPerl's more complete Bio::SeqI interface, which includes those things, still needs to be implemented. If you are interested in helping with this, please contact GMOD!

id, primary_id, display_id

These are aliases for name(), which just returns the contents of the feature.name field

seq

  Alias for $feature->residues()

subseq( $start, $end )

Same as Bio::PrimarySeq subseq method, with one important exception. If the residues column is not set (null) for this feature, it checks for a featureprop of type large_residues (irrespective of the type's CV membership), and uses its value as the sequence if it is present.

So, you can store large (i.e. megabase or greater) sequences in a large_residues featureprop, and use this subseq() method to fetch pieces of them, with the sequences never being entirely stored in memory or transferred in total from the database server to the app server. This is implemented behind the scenes by using SQL substring operations on the featureprop's value.

trunc

Same as subseq above, but return a sequence object rather than a bare string.

accession, accession_number

  Usage: say $feature->accession_number
  Desc : get an "<accession>.<version>"-style string.  gets this from
         either the primary dbxref, or the first secondary_dbxref
         found
  Args : none
  Ret : string of the form "accession.version" formed from the
        accession and version fields of either the primary or
        secondary dbxrefs

length

No arguments, returns the seqlen(), or length( $feature->residues ) if that is not defined.

desc, description

No arguments, returns the value of the first 'description' featureprop found for this feature.

alphabet

Returns "protein" if the feature's type name is "polypeptide". Otherwise, returns "dna". This is not very correct, but works in most of the use cases we've seen so far.