The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

export_bgc_sql_tables.pl - Exports SQL tables of BGC data (Palantir and antiSMASH annotations)

VERSION

version 0.200290

NAME

export_bgc_sql_tables.pl - This tool exports SQL tables structuring the BGC data from antiSMASH reports and annotated with Palantir.

USAGE

    $0 [options] --infiles [=] <report_paths>.../--file-table [=] <report.list>

REQUIRED ARGUMENTS

OPTIONAL ARGUMENTS

--infiles [=] <report_paths>...

Paths to biosynML.xml (antiSMASH 3-4) or regions.js (antiSMASH 5) files. This option can takes multiple values.

--file-table [=] <tsv_file>

TSV (Tab-Separated Values) format file to give non ambiguously the path of xml reports, proteomes and quast files. Order : xml reports (1st column), proteomes (2nd column) and quast files (3rd column). If you only want to parse xml and quast reports, you can follow this format : "biosynML.xml undef quast.tsv".

--types [=] <str>...

Filter clusters on a/several specific type(s).

Types allowed: acyl_amino_acids, amglyccycl, arylpolyene, bacteriocin, butyrolactone, cyanobactin, ectoine, hserlactone, indole, ladderane, lantipeptide, lassopeptide, microviridin, nrps, nucleoside, oligosaccharide, otherks, phenazine, phosphonate, proteusin, PUFA, resorcinol, siderophore, t1pks, t2pks, t3pks, terpene.

Any combination of these types, such as nrps-t1pks or t1pks-nrps, is also allowed. The argument is repeatable.

--taxdir [=] <dir>

Path to a local mirror of the NCBI Taxonomy database.

--idm[-file] [=] <file>

Path to an id mapper file to retrieve the assembly accession numbers. The file should be in tabular format with accession numbers in the second column.

--proteomes

Use organism proteome to predict with external pHMMs domains to include in SQL database.

--quast

Create an additionnal table "Assemblies" with Quast statistics. For this option, you need to use the transposed_report.tsv output of quast and name it with the basename of your report file. For example, if you use my_org.xml, name your Quast file my_org.tsv.

--contam-file [=] <file>

Add an SQL table for CheckM contamination results (tabular file). This option was devised for the interface database.

--new-db

Remove the previous sql tables to start over the db.

--db-name [=] <str>

Name of your database [default: bgc_db].

--module-delineation [=] <str>

Method for delineating the modules. Modules can either be cut on condensation (C and KS) or substrate-selection domains (A and AT) [default: 'substrate-selection'].

--gap-filling [=] <bool>

Tries to find domains if gaps present in clusters [default: 1].

--undef-cleaning [=] <bool>

Eliminates undef domains from antiSMASH output that can't be recovered [default: 1].

--undef-recov [=] <bool>

Try to recover antismash undef domain values [default: 0].

--evalue-threshold [=] <n>

E-value threshold to apply in HMMER searches [default: 1e-4].

--cpu [=] <n>

Number of threads/cpus to use [default: 1].

--version
--usage
--help
--man

print the usual program information

AUTHOR

Loic MEUNIER <lmeunier@uliege.be>

CONTRIBUTOR

Denis BAURAIN <denis.baurain@uliege.be>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by University of Liege / Unit of Eukaryotic Phylogenomics / Loic MEUNIER and Denis BAURAIN.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.