The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

WWW::PDB - Perl interface to the Protein Data Bank

SYNOPSIS

  use WWW::PDB;
  my $pdb = new WWW::PDB;
  
  my $fh = $pdb->get_structure('2ili');
  print while <$fh>;
  
  for($pdb->keyword_query('carbonic anhydrase')) {
      printf(
          "%s\t%s\t[%s]\n",
          $_,
          $pdb->get_primary_citation_title($_),
          join(', ', $pdb->get_chains($_))
      );
  }

  my $seq = q(
      VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTK
      TYFPHFDLSHGSAQVKGHGKKVADALTAVAHVDDMPNAL
  );
  print $pdb->blast($seq, 10.0, 'BLOSUM62', 'HTML');

INTRODUCTION

The Protein Data Bank (PDB) was established in 1971 as a repository of the atomic coordinates of protein structures (Bernstein et al., 1997). It has since outgrown that role, proving invaluable not only to the research community but also to students and educators (Berman et al., 2002).

DESCRIPTION

This module is an object-oriented interface to the Protein Data Bank. It provides methods for retrieving files, optionally caching them locally. Additionally, it wraps the functionality of the PDB's SOAP web services.

CONSTRUCTOR

new ( [ OPTIONS ] )

Prepares a new WWW::PDB object with the specified options, which are passed in as key-value pairs, as in a hash. Accepted options are:

uri - URI for the PDB web services. Defaults to <http://www.pdb.org/pdb/services/pdbws>.

proxy - Proxy for the PDB web services. Defaults to <http://www.pdb.org/pdb/services/pdbws>.

ftp - Host name for the PDB FTP archive. Defaults to ftp.wwpdb.org.

cache - Local cache directory. If defined, the object will look for files here first and also use this directory to store any downloads.

Options not listed above are ignored, and, appropriately, all options are optional.

METHODS

This module is object-oriented, so all methods should be called on a WWW::PDB instance.

FILE RETRIEVAL

Each of the following methods takes a PDB ID as input and returns a file handle (or undef on failure).

get_structure ( PDBID )

Retrieves the structure in PDB format.

get_structure_factors ( PDBID )

Retrieves the structure factors file.

UTILITY

This section is dedicated to utility methods.

service

Hopefully you don't need to play directly with the backing SOAP::Lite object, but if you do, this is how.

PDB WEB SERVICES

The following methods are the interface to the PDB web services.

blast ( SEQUENCE , CUTOFF , MATRIX , OUTPUT_FORMAT )
blast ( PDBID , CHAINID, CUTOFF , MATRIX , OUTPUT_FORMAT )
blast ( SEQUENCE , CUTOFF )
blast ( PDBID , CHAINID , CUTOFF )

Performs a BLAST against sequences in the PDB and returns the output of the BLAST program. XML is used if the output format is unspecified.

fasta ( SEQUENCE , CUTOFF )
fasta ( PDBID , CHAINID , CUTOFF )

Takes a sequence or PDB ID and chain identifier and runs FASTA using the specified cut-off. The results are overloaded to give PDB IDs when used as strings, but they can also be explicitly probed for a pdbid or FASTA cutoff:

  printf("%s %s %s\n", $_, $_->pdbid, $_->cutoff)
      for $pdb->fasta("2ili", "A");
get_chain_length ( PDBID , CHAINID )

Returns the length of the specified chain.

get_chains ( PDBID )

Returns a list of all the chain identifiers for a given structure, or a reference to such a list in scalar context.

get_cif_chain ( PDBID , CHAINID )

Converts the specified author-assigned chain identifier to its mmCIF equivalent.

get_cif_chain_length ( PDBID , CHAINID )

Returns the length of the specified chain, just like get_chain_length, except it expects the chain identifier to be the mmCIF version.

get_cif_chains ( PDBID )

Returns a list of all the mmCIF chain identifiers for a given structure, or a reference to such a list in scalar context.

get_cif_residue ( PDBID , CHAINID , RESIDUEID )

Converts the specified author-assigned residue identifier to its mmCIF equivalent.

get_current_pdbids ( )

Returns a list of the identifiers (PDB IDs) corresponding to "current" structures (i.e. not obsolete, models, etc.), or a reference to such a list in scalar context.

get_ec_nums ( PDBIDS )
get_ec_nums ( )

Retrieves the Enzyme Classification (EC) numbers associated with the specified PDB IDs or with all PDB structures if called with no arguments.

get_entities ( PDBID )

Returns a list of the entity IDs for a given structure, or a reference to such a list in scalar context.

get_genome_details ( )

Retrieves genome details for all PDB structures.

get_kabsch_sander ( PDBID , CHAINID )

Finds secondary structure for the given chain.

get_obsolete_pdbids ( )

Returns a list of the identifiers (PDB IDs) corresponding to obsolete structures, or a reference to such a list in scalar context.

get_primary_citation_title ( PDBID )

Finds the title of the specified structure's primary citation (if it has one).

get_pubmed_ids ( )

Retrieves the PubMed IDs associated with all PDB structures.

get_pubmed_id ( PDBID )

Retrieves the PubMed ID associated with the specified structure.

get_release_dates ( PDBIDS )

Maps the given PDB IDs to their release dates.

get_sequence ( PDBID , CHAINID )

Retrieves the sequence of the specified chain.

get_space_group ( PDBID )

Returns the space group of the specified structure (the symmetry.space_group_name_H_M field according to the mmCIF dictionary).

homology_reduction_query ( PDBIDS , CUTOFF )

Reduces the set of PDB IDs given as input based on sequence homology.

keyword_query ( KEYWORD_EXPR [, EXACT_MATCH , [ AUTHORS_ONLY ] ] )

Runs a keyword query with the specified expression. Search can be made stricter by requiring an exact match or restricting the search to authors. Both boolean arguments are optional and default to false. Returns a list of PDB IDs or a reference to such a list in scalar context.

pubmed_abstract_query ( KEYWORD_EXPR )

Runs a keyword query on PubMed Abstracts. Returns a list of PDB IDs or a reference to such a list in scalar context.

PDB ID STATUS METHODS

The following methods deal with the status of PDB IDs.

get_status ( PDBID )

Finds the status of the structure with the given PDB ID. Return is one of qw(CURRENT OBSOLETE UNRELEASED MODEL UNKNOWN).

is_current ( PDBID )

Checks whether or not the specified PDB ID corresponds to a current structure. Implemented for orthogonality, all this does is check if get_status returns CURRENT.

is_obsolete ( PDBID )

Checks whether or not the specified PDB ID corresponds to an obsolete structure. Defined by the PDB web services interface.

is_unreleased ( PDBID )

Checks whether or not the specified PDB ID corresponds to an unreleased structure. Implemented for orthogonality, all this does is check if get_status returns UNRELEASED.

is_model ( PDBID )

Checks whether or not the specified PDB ID corresponds to a model structure. Implemented for orthogonality, all this does is check if get_status returns MODEL.

is_unknown ( PDBID )

Checks whether or not the specified PDB ID is unknown. Implemented for orthogonality, all this does is check if get_status returns UNKNOWN.

UNTESTED

The following methods are defined by the PDB web services interface, so they are wrapped here, but they have not been tested.

get_annotations ( STATE_FILE )

Given a string in the format of a ViewState object from Protein Workshop, returns another ViewState object.

get_atom_site ( PDBID )

Returns the first atom site object for a structure.

get_atom_sites ( PDBID )

Returns the atom site objects for a structure.

get_domain_fragments ( PDBID , CHAINID , METHOD )

Finds all structural protein domain fragments for a given structure.

get_first_struct_conf ( PDBID )

Finds the first struct_conf for the given structure.

get_first_struct_sheet_range ( PDBID )

Finds the first struct_sheet_range for the given structure.

get_struct_confs ( PDBID )

Finds the struct_confs for the given structure.

get_struct_sheet_ranges ( PDBID )

Finds the struct_sheet_ranges for the given structure.

get_structural_genomics_pdbids ( )

Finds info for structural genomics structures.

xml_query ( XML )

Runs any query that can be constructed, pretty much.

REFERENCES

  1. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28(1), 235-242.

  2. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, Jr., E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. & Tasumi, M. (1977). Eur. J. Biochem. 80(2), 319-324.

SEE ALSO

The PDB can be accessed via the web at <http://www.pdb.org/>. The Java API documentation for the PDB's web services is located at <http://www.rcsb.org/robohelp_f/webservices/pdbwebservice.html>.

BUGS

Please report them.

AUTHOR

Miorel-Lucian Palii, <mlpalii@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2008-2009 by Miorel-Lucian Palii

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.