scripts/genomes_to_taxonomies


            
              1
2
3
4
5
6
7
8
—
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
              #!perl
use strict;
use Data::Dumper;
use Carp;
#
# This is a SAS Component
#
=head1 genomes_to_taxonomies
The routine genomes_to_taxonomies can be used to retrieve taxonomic information for
each of a list of input genomes.  For each genome in the input list of genomes, a list of
taxonomic groups is returned.  Kbase will use the groups maintained by NCBI.  For an NCBI
taxonomic string like
     cellular organisms;
     Bacteria;
     Proteobacteria;
     Gammaproteobacteria;
     Enterobacteriales;
     Enterobacteriaceae;
     Escherichia;
     Escherichia coli
associated with the strain 'Escherichia coli 1412', this routine would return a list of these
taxonomic groups:
     ['Bacteria',
      'Proteobacteria',
      'Gammaproteobacteria',
      'Enterobacteriales',
      'Enterobacteriaceae',
      'Escherichia',
      'Escherichia coli',
      'Escherichia coli 1412'
     ]
That is, the initial "cellular organisms" has been deleted, and the strain ID has
been added as the last "grouping".
The output is a mapping from genome IDs to lists of the form shown above.
Example:
    genomes_to_taxonomies [arguments] < input > output
The standard input should be a tab-separated table (i.e., each line
is a tab-separated set of fields).  Normally, the last field in each
line would contain the identifer. If another column contains the identifier
use
    -c N
where N is the column (from 1) that contains the subsystem.
This is a pipe command. The input is taken from the standard input, and the
output is to the standard output.
=head2 Documentation for underlying call
This script is a wrapper for the CDMI-API call genomes_to_taxonomies. It is documented as follows:
  $return = $obj->genomes_to_taxonomies($genomes)
=over 4
=item Parameter and return types
=begin html
<pre>
$genomes is a genomes
$return is a reference to a hash where the key is a genome and the value is a taxonomic_groups
genomes is a reference to a list where each element is a genome
genome is a string
taxonomic_groups is a reference to a list where each element is a taxonomic_group
taxonomic_group is a string
</pre>
=end html
=begin text
$genomes is a genomes
$return is a reference to a hash where the key is a genome and the value is a taxonomic_groups
genomes is a reference to a list where each element is a genome
genome is a string
taxonomic_groups is a reference to a list where each element is a taxonomic_group
taxonomic_group is a string
=end text
=back
=head2 Command-Line Options
=over 4
=item -c Column
This is used only if the column containing the subsystem is not the last column.
=item -i InputFile    [ use InputFile, rather than stdin ]
=back
=head2 Output Format
The standard output is a tab-delimited file. It consists of the input
file with extra columns added. One extra column is added, consisting of the taxonomy list presented as a single string with items separated by a ":".
Input lines that cannot be extended are written to stderr.
=cut
my $usage = "usage: genomes_to_taxonomies [-c column] < input > output";
use Bio::KBase::CDMI::CDMIClient;
use Bio::KBase::Utilities::ScriptThing;
my $column;
my $input_file;
my $kbO = Bio::KBase::CDMI::CDMIClient->new_for_script('c=i' => \$column,
                                      'i=s' => \$input_file);
if (! $kbO) { print STDERR $usage; exit }
my $ih;
if ($input_file)
{
    open $ih, "<", $input_file or die "Cannot open input file $input_file: $!";
}
else
{
    $ih = \*STDIN;
}
while (my @tuples = Bio::KBase::Utilities::ScriptThing::GetBatch($ih, undef, $column)) {
    my @h = map { $_->[0] } @tuples;
    my $h = $kbO->genomes_to_taxonomies(\@h);
    for my $tuple (@tuples) {
        #
        # Process output here and print.
        #
        my ($id, $line) = @$tuple;
        my $v = $h->{$id};
        if (! defined($v))
        {
            print STDERR $line,"\n";
        }
        elsif (ref($v) eq 'ARRAY')
        {
                my $a = join(": ", @$v);
                print "$line\t$a\n";
        }
        else
        {
            print "$line\t$v\n";
        }
    }
}
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)