NAME

Bio::Palantir - core classes and utilities for Bio::Palantir

VERSION

version 0.192540

SYNOPSIS

    use Bio::Palantir;

    # open and parse biosynML.xml or regions.js antiSMASH report
    my $infile = 'biosynML.xml'; 
    my $report = Bio::Palantir::Parser->new( file => $infile );

    # get main container
    my $root = $report->root;

    # explore Biosynthetic Gene Clusters (BGCs) content
    
    # Bio::Palantir::Parser
    for my $cluster ($root->all_clusters) {     # returns all clusters say
        $cluster->type;                         # returns the cluster type (e.g., nrps)
        
        for my $gene ($cluster->all_genes) {        # returns all genes say
            $gene->name;                            # for instance, returns the gene name say $gene->genomic_coordinates;     # returns DNA gene coordinates (relative to the genome) 
            say $gene->coordinates;                 # returns protein gene coordinates (also relative to the genome) 
            say $gene->protein_sequence;              # returns the gene protein sequence 
    
            # if the BGC possess domains (i.e., NRPS/PKS)
            for my $domain ($gene->all_domains) {   # returns all domains
            
                say $domain->rank;                  # for instance, returns the domain in the gene 
                say $domain->function;              # returns the domain function (e.g., condensation) 
                say join '-', $domain->coordinates; # returns the coordinates (which are relative to the gene ones)
                say $domain->protein_sequence;      # returns the domain protein sequence

                # lowest level is Motifs (for antiSMASH 3 and 4)
                for my $motif ($domain->all_motifs) {
                    #...
                } 
            }

        # same way for looping into Module objects 
        for my $module ($cluster->all_modules) {
            # ...
        }
    }


    # Bio::Palantir::Refiner
    use aliased 'Bio::Palantir::Refiner';
    use aliased 'Bio::Palantir::Refiner::ClusterPus';
    
    # it is possible to create Bio::Palantir::Refiner objects from already existing Bio::Palantir::Parser ones
    my @cluster_plus;
    
    for my $cluster ($root->all_clusters) { 
        push @cluster_plus, ClusterPlus->new( _cluster => $cluster ); 
    }

    # but if you intend to use the Refiner part, it is more convenient to create the Refiner object directly from a file
    my $report = Refiner->new( file => biosynML.xml);

    for my $cluster_plus ($report->all_clusters) {
        
        say $cluster_plus->type;

        for my $gene_plus ($cluster_plus->all_genes) {

            say $gene_plus->name;

            for my $domain_plus ($gene_plus->all_domains) {
                
                say 'Palantir version:'; 
                say $domain_plus->function; 
                say $domain_plus->coordinates; 
                say $domain_plus->evalue;
                
                # compare with antiSMASH results
                say 'antiSMASH version:'; say $domain_plus->_domain->function;
                say $domain_plus->_domain->coordinates;
                # say $domain_plus->evalue; # only available for Palantir part

            } 

        }

    }


    # Bio::Palantir::Explorer
    use aliased 'Bio::Palantir::Explorer::ClusterFasta';
    
    # from a Bio::Palantir::Refiner object
    for my $cluster_plus ($report->all_clusters) {
        
        for my $gene_plus ($report->all_genes) {

            for my $domain_exp ($gene_plus->all_exp_domains) {

                say $domain_exp->function; 
                say $domain_exp->coordinates; 
                say $domain_exp->evalue;

            }

        }

    }

    # from a FASTA file (containing ONLY one BGC, each sequence being interpreted as a gene from the cluster)
    my $cluster_exp = ClusterFasta->new( fasta => nrps_bgc.fasta );

    for my $gene_exp ($cluster_exp->all_genes) {

        for my $domain_exp ($gene_exp->all_domains) {
                
                say $domain_exp->function; 
                say $domain_exp->coordinates; 
                say $domain_exp->evalue;

        }

    }

DESCRIPTION

This distribution is the base of the Bio::Palantir module collection designed as a toolbox for handling the post-processing of antiSMASH report data (https://antismash.secondarymetabolites.org) and improving in some aspects its annotation of NRPS/PKS Biosynthetic Gene Clusters (BGCs), aiming then to support small and large-scale genome mining projects.

The Palantir libraries are organized as follows:

Bio::Palantir::Parser contains classes for hierarchically storing the information of antiSMASH gene clusters.

Bio::Palantir::Refiner consists in classes (parallel to Parser) dedicated to the improvement of NRPS/PKS gene clusters parallel classes to Bio::Palantir::Parser.

Bio::Palantir::Explorer contains classes (also parallel to Parser) giving access to an exploratory version of detected domains

More information on their internal structure can be found in their respective file.

Here is the list of functionalities offered by Palantir libraries and bins:

Refinement of NRPS/PKS BGC annotations

- Dynamic elongation of the coordinates of core domains: enrich the information contained in the sequences (application examples: improved similarity searches and evolutionary approaches)

- Filling the gaps in BGC annotation: retrieve missed domains from exceptions in the rules detection (application example: resolution of ambiguous or incoherent BGC annotation)

- Module delimitation: apply biological rules to group domains in modules (application example: analyses at module scale)

- BGC visualization: visualize and compare antismash and Palantir annotations [bin/draw_clusters.pl]

- Exploratory mode visualization: visualize and design the domain architecture consensus from a raw view of all detected signatures (application example: manual curation of the domaine architecture consensus)

BGC data manipulation

- Generation of PDF/Word reports: export customizable reports of refined BGC data (application example: manual reading of numerous (filtered) BGC data)

- Extraction of sequences: export Fasta files from BGC data at different scales: cluster, gene, module, domain (application example: data formatting for downstream analyses)

- Generation of SQL tables: export SQL tables containing BGC data details (application example: large-scale queries and statistics)

AUTHOR

Loic MEUNIER <lmeunier@uliege.be>

CONTRIBUTOR

Denis BAURAIN <denis.baurain@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by University of Liege / Unit of Eukaryotic Phylogenomics / Loic MEUNIER and Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.