Family Type Base Class
This is the base class for family types. A FamilyType object is used by Bio::KBase::CDMI::CDMILoadFamilies.pl to determine how to load a protein family.
Protein families have a great deal of commonality, but there are variations. They tend to have additional files with data not present in all family types. Most load proteins, but some contain features instead.
The base class will assume the most common response for each question the load program needs to ask.
The following fields are present in the object.
- type
-
protein family type (used in the Family record)
- release
-
protein family release code (used in the Family record)
- feature
-
TRUE if this is a feature family, FALSE if it is purely a protein family
Special Methods
new
my $familyType = Bio::KBase::CDMI::FamilyType->new($type, $release, $feature);
Construct a new family type object.
- type
-
Protein family type name (e.g. FIGFam, equivalog).
- release
-
Protein family release code.
- feature (optional)
-
TRUE if this family type stores features; FALSE if it stores proteins. The default is FALSE.
Init
$familyType->Init($loader, $directory);
Perform special initialization. This method is called after the basic data structures are created but before any data is processed from the input directory.
- loader
-
Bio::KBase::CDMI::CDMILoader object for the current load.
- directory
-
Name of the directory containing the load files.
Query Methods
typeName
my $typeName = $familyType->typeName;
Return the type name to be used for these protein families in the Family records.
release
my $release = $familyType->release;
Return the release identifier to be used for these protein families in the Family records.
featureBased
my $featureFlag = $familyType->featureBased;
Return TRUE if the family contains features, FALSE if it contains proteins only.
Virtual Methods
ResolveProteinMember
my $idHash = $familyType->ResolveProteinMember($loader, $memberID);
Compute the KBase ID for the specified protein member ID. The translation generally depends on the type of protein family. The default method assumes that the IDs are already in the MD5 format used by KBase and we only need to verify that the protein is already in the database.
- loader
-
Bio::KBase::CDMI::CDMILoader object for this load.
- memberID
-
Family member ID to translate.
- RETURN
-
Returns the KBase protein ID for the member, or
undef
if the protein is not in the database.
ResolveFeatureMember
my ($kbaseID, $proteinID, $genomeID) = $familyType->ResolveFeatureMember($loader, $memberID);
Compute the KBase ID for a feature member of a family along with its associated protein ssequence ID and genome ID. The method for doing this depends on the type of family, since the member IDs are usually in a dialect peculiar to the family type. Only feature-based families need to override this method.
The default presumes that all the member IDs belong to a source type that has been set as the source type of the loader object in the "Init" method.
- loader
-
Bio::KBase::CDMI::CDMILoader object for this load.
- memberID
-
The family member ID, usually a feature ID in the source's dialect.
- RETURN
-
Returns a list containing (0) the feature's KBase ID, (1) the ID of the associated protein sequence, and (2) the ID of the associated genome. If a member does not exist in the KBase, nothing will be returned.
ProcessAdditionalFiles
$familyType->ProcessAdditionalFiles($loader, $directory);
Process additional files in the specified directory. This method handles files aside from the two standard files used to load families. These contain additional data such as coupling information, alignments, or probability models.
- loader
-
Bio::KBase::CDMI::CDMILoader object for the current load.
- directory
-
Name of the directory containing the load files.