The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

ClusterRankSmotifs

VERSION

Version 0.01

SYNOPSIS

Script to rank smotifs in the database by their loop signature and chemical shift difference as compared to a query smotif. Two parallel clustering and ranking methods are used: (a) cluster on the go and rank based on population (b) Joe's clusters and rank using diversity.

INPUT ARGUMENTS 1) $pdbcode : 4-letter name of the folder where the experimental chemical shift data is stored 2) $smotif : smotif number in the pdb

INPUT FILES In the <pdbcode> folder: 1) shiftcands<pdbcode><motnum>_<looplength><smotif type>.csv : Files containing results of comparing the query smotifs against the database. Each file (corresponding to each smotif) includes the number of residues compared, the chemical shift difference value, the RMSD (if structure is included), the loop length, the smotif NID, the secondary structure RMSD, secondary structure lengths, and loop structural signatures for the query and database motif and their overlap.

OUTPUT FILES In the <pdbcode> folder: 1) <pdbcode>_motifs_best_XX.csv : File containing a list of smotif candidates for each query smotif in the unknown protein 2) <pdbcode>_motifs_rmsd_XX.csv : File containing rmsds of smotif candidates for each query smotif in the unknown protein. XX = Query Smotif number.

Usage:

    use ClusterRankSmotifs;

    ClusterRankSmotifs ($pdb,$smotif);

EXPORT

A list of functions that can be exported. You can delete this section if you don't export anything, such as for a purely object-oriented module.

SUBROUTINES

        rank_smotifs
        findranks_by_cs_clustered
        get_cluster_on_the_go
        getseq
        checkseqblosum
        read_clusters
        get_clusters
        read_joe_clusters

rank_smotifs

        Subroutine to cluster and rank the smotifs from the library based on the 
        chemical difference and phi/psi signature match between the library Smotif
        and the query Smotif. 
    die "rank_smotifs: no file like $nam*csv was found in $pdbcode"
        unless @found;
    # Let's assume that just ONE file like $pdbcode/$nam*csv was found.
    # $nam2  = 1aab/shiftcands1aab_01_8HH.csv
    my $nam2 = $found[0];

findranks_by_cs_clustered

        Subroutine to cluster and rank the smotifs from the library based on the 
        chemical difference and phi/psi signature match between the library Smotif
        and the query Smotif.

getseq

    Subroutine to get the loop sequence for a given smotif by reading through 
    the <pdbcode>.out file
    
    $filename = 1aab_01_8HH
    $pdbcode  = 1aab

    more 1aab/1aab.out
    Name     Chain   Type  Start   Looplength  SS1length SS2length   Sequence
    1aab.pdb A       HH    14      8           15        12          SYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERW
    1aab.pdb A       HH    37      4           12        22          FSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREM

checkseqblosum

        Subroutine to find the per-residue BLOSUM62 score between two sequences

read_clusters

        Subroutine to read smotif clusters obtained from get_cluster_on_the_go

get_clusters

        Subroutine to get clusters using Phylip

remove_nid

     It will read a two-column file with format like
     

brinda@everest test_brinda]$head /tmp/motifclusters Cluster4: nid_376468 Cluster6: nid_167076 Cluster7: nid_096416 nid_371611 Cluster8: nid_343341 Cluster24: nid_356687 nid_318838 nid_016570 nid_229923 nid_003768 nid_091937

    nid_ will removed from the second columns and output will written 
    to an output file with format like:
    

brinda@everest test_brinda]$head /tmp/motifclusters0 Cluster4: 376468 Cluster6: 167076 Cluster7: 096416 371611 Cluster8: nid_343341 Cluster24: 356687 318838 016570 229923 003768 091937

        # Cluster8: nid_343341 
        # Cluster24: nid_356687 nid_318838 nid_016570 nid_229923 nid_003768 nid_091937 
        
        # my $cluster= ($line =~ /(Cluster\d+:)\s+/)[0];
        
        # If you are after a single match you use a scalar in 
        # list context as the L-VALUE i.e
        #
        # my ($cluster) = $line =~ /(Cluster\d+:)\s+/;
        #
        # Note if you forget the ( ) around $scalar and you get a match 
        # $scalar will contain the integer value 1 so don't forget the ( ). 
        # The ( ) gets you list context which you need.

read_joe_clusters

        Subroutine to read Joe's Smotif cluster classification (from files)

get_cluster_on_the_go

    Subroutine to obtain Smotif clusters from the top 200 Smotifs identified using 
    chemical shift difference. 

        Input: 
        1. 4-letter pdb code (directory where all files in the modeling pipeline are saved). 
        2. Smotif number of the query Smotif under consideration
        3. RMSD threshold for clustering Smotifs (default=2.0 A). 
        4. Array of library Smotifs sorted by chemical shift difference. 

        Output: 
        Array of upto 200 library Smotifs, ranked by a compound score obtained from
        cluster size and chemical shift difference
        

AUTHOR

Fiserlab Members , <andras at fiserlab.org>

BUGS

Please report any bugs or feature requests to bug-. at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=.. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc ClusterRankSmotifs

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2015 Fiserlab Members .

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.