Bio::Phylo::Parsers::Nhx - Parser used by Bio::Phylo::IO, no serviceable parts inside
This module parses "New Hampshire eXtended" (NHX) tree descriptions in parenthetical format. The node annotations, which are described here: https://sites.google.com/site/cmzmasek/home/software/forester/nhx, are stored as meta annotations in the namespace whose reserved prefix, nhx, is associated with the above URI. This means that after this parser is done, you can fetch an annotation value thusly:
my $gene_name = $node->get_meta_object( 'nhx:GN' );
This parser is called by the Bio::Phylo::IO facade, don't call it directly. In turn, this parser delegates processing of Newick strings to Bio::Phylo::Parsers::Newick. As such, several additional flags can be passed to the Bio::Phylo::IO parse and parse_tree functions to influence how to deal with complex newick strings:
-keep => [ ...list of taxa names... ]
-keep flag allows you to only retain certain taxa of interest, ignoring others while building the tree object.
-keep_whitespace => 1,
This will treat unescaped whitespace as if it is a normal taxon name character. Normally, whitespace is only retained inside quoted strings (e.g.
'Homo sapiens'), otherwise it is the convention to use underscores (
Homo_sapiens). This is because some programs introduce whitespace to prettify a newick string, e.g. to indicate indentation/depth, in which case you almost certainly want to ignore it. This is the default behaviour. The option to keep it is provided for dealing with incorrectly formatted data.
Note that the flag
-ignore_comments, which is optional for the Newick parser cannot be used. This is because NHX embeds its metadata in what are normally comments (i.e. square brackets), so these must be processed in a special way.
There is a mailing list at https://groups.google.com/forum/#!forum/bio-phylo for any user or developer questions and discussions.
The NHX parser is called by the Bio::Phylo::IO object. Look there to learn how to parse newick strings.
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63