The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

SmotifTF Template-free Modeling Method - The pre-requisites

SYNOPSIS

SmotifTF carries out template-free structure prediction using a dynamic library of supersecondary structure fragments obtained from a set of remotely related PDB structures.

This perl script runs all the pre-requisites (HHblits/HHsearch, Psi-Blast, Delta-Blast and Psipred) required for modeling. The input is the query protein sequence in fasta format and the outputs are:

1. A dynamic library of supersecondary structure motifs, tailor-made for the query protein (obtained using HHblits/HHsearch, Psi-Blast and Delta-Blast).

2. Definitions for putative Smotifs in the query protein (obtained from Psipred).

Once the pre-requisites are completed, modeling can be carried out using the script smotiftf.pl (use "perldoc smotiftf.pl" for instructions).

PRE-REQUISITES

The Smotif-based modeling algorithm requires the query protein sequence as input.

Software / data:

1. Psipred (http://bioinf.cs.ucl.ac.uk/psipred/)

2. HHSuite (ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/)

3. Psiblast and Delta-blast (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download)

4. Modeller (version 9.14 https://salilab.org/modeller/)

5. DSSP (http://swift.cmbi.ru.nl/gv/dssp/)

6. Local PDB directory (central or user-designated from http://www.rcsb.org). Many PDB structures are incomplete with missing residues. The SmotifTF algorithm performs best when the PDB structures are complete. Hence, we use Modeller (https://salilab.org/modeller/) to model the missing residues in the PDB to obtain complete structures. The algorithm can work with incomplete PDB structures but the performance may not be as expected. The SMotifTF software can handle gzipped (.gz) or unzipped (.ent) PDB structure files.

   The software for remodeling the missing residues can be obtained from our website at: 
   http://fiserlab.org/remodel_pdb.tar.gz
   This can be used to remodel missing residues in the entire PDB and these remodeled
   structures can be used in the SmotifTF package. The SmotifTF package can handle both
   regular and remodeled PDB database.

Download and install the above mentioned software / data according to their instructions.

Note: Psipred may require legacy blast and Psiblast and Delta-blast are part of the Blast+ package. .ncbirc file may be required in the home directory for Psipred.

Databases required:

1. PDBAA blast database is required (ftp://ftp.ncbi.nlm.nih.gov/blast/db/).

2. HHsuite databases NR20 and PDB70 are required (ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/databases/hhsuite_dbs/)

SMOTIFTF DOWNLOAD AND INSTALLATION

Download SmotifTF package from CPAN:

http://search.cpan.org/dist/SmotifTF/

Installation of the software (also available in the README file):

 tar -zxvf SmotifTF-version.tar.gz

 cd SmotifTF-version/

 perl Makefile.PL PREFIX=/home/user/MyPerlLib/

 make

 make test

 make install

SETUP CONFIGURATION FILE

The configuration file, smotiftf_config.ini has all the information regarding the required library files and other pre-requisite software.

Set all the paths and executables in this file correctly.

Set environment varible in .bashrc file:

export SMOTIFTF_CONFIG_FILE=/home/user/MyPerlLib/share/perl5/SmotifTF-version/smotiftf_config.ini

PRE-REQUISITES STEPS

      -------------------------------------------------------
     | Run Pre-requisites:                                   |
     | Psipred, HHblits+HHsearch, Psiblast, Delta-blast      |
     |                                                       |
     | Single-core job                                       |
     | Usage:                                                |
     |   perl smotiftf_prereq.pl --sequence_file=1zzz.fasta  |
     |         --dir=1zzz --step=all                         |
      -------------------------------------------------------

HOW TO RUN THE PRE-REQUISITES

1. If installed locally, provide the correct path name to the SmotifTF perl library in this perl script (line 14).

2. Create a subdirectory with a dummy pdb file name (eg: 1abc or 1zzz).

3. Put the query fasta file (1zzz.fasta) in this directory.

4. Run the pre-requisites step first. This runs Psipred, HHblits+HHsearch, Psiblast and Delta-blast. It will then generate the dynamic database of Smotifs and the list of putative Smotifs in the query protein. For more information about the pre-requisites use: perl smotiftf_prereq.pl -help

   Usage: perl smotiftf_prereq.pl --sequence_file=1zzz.fasta --dir=1zzz --step=all

5. Next, run smotiftf.pl according to the instructions given there. For more information use: perl smotiftf.pl -help

REFERENCE

Vallat BK, Fiser A. Modularity of protein folds as a tool for template-free modeling of sequences Manuscript under review.

AUTHORS

Brinda Vallat, Carlos Madrid and Andras Fiser.

OPTIONS

-help

Print a brief help message and exits.

-man

Prints the manual page and exits.

--step

1,2,3,4,5,6,7,8 or all (to run all steps consecutively)

--sequence_file

Give the name of the fasta file.

--dir

Give 4-letter dummy pdb_code or any other directory where the fasta file is present.

DESCRIPTION

SmotifTF will carry out template-free structure prediction of a protein from its sequence to model its complete structure using the Smotif library.