scripts/proteins_to_roles


            
              1
2
3
4
5
6
7
8
—
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
              #!perl
use strict;
use Data::Dumper;
use Carp;
#
# This is a SAS Component
#
=head1 proteins_to_roles
The routine proteins_to_roles allows a user to gather the set of functional
roles that are associated with specifc protein sequences.  A single protein
sequence (designated by an MD5 value) may have numerous associated functions,
since functions are treated as an attribute of the feature, and multiple
features may have precisely the same translation.  In our experience,
it is not uncommon, even for the best annotation teams, to assign
distinct functions (and, hence, functional roles) to identical
protein sequences.
For each input MD5 value, this routine gathers the set of features (fids)
that share the same sequence, collects the associated functions, expands
these into functional roles (for multi-functional proteins), and returns
the set of roles that results.
Note that, if the user wishes to see the specific features that have the
assigned functional roles, they should use proteins_to_functions instead (it
returns the fids associated with each assigned function).
Example:
    proteins_to_roles [arguments] < input > output
The standard input should be a tab-separated table (i.e., each line
is a tab-separated set of fields).  Normally, the last field in each
line would contain the identifer. If another column contains the identifier
use
    -c N
where N is the column (from 1) that contains the subsystem.
This is a pipe command. The input is taken from the standard input, and the
output is to the standard output. For each input line, there can be multiple output lines, one for each role the protein can map to. The role is added to the end of each line.
=head2 Documentation for underlying call
This script is a wrapper for the CDMI-API call proteins_to_roles. It is documented as follows:
  $return = $obj->proteins_to_roles($proteins)
=over 4
=item Parameter and return types
=begin html
<pre>
$proteins is a proteins
$return is a reference to a hash where the key is a protein and the value is a roles
proteins is a reference to a list where each element is a protein
protein is a string
roles is a reference to a list where each element is a role
role is a string
</pre>
=end html
=begin text
$proteins is a proteins
$return is a reference to a hash where the key is a protein and the value is a roles
proteins is a reference to a list where each element is a protein
protein is a string
roles is a reference to a list where each element is a role
role is a string
=end text
=back
=head2 Command-Line Options
=over 4
=item -c Column
This is used only if the column containing the subsystem is not the last column.
=item -i InputFile    [ use InputFile, rather than stdin ]
=back
=head2 Output Format
The standard output is a tab-delimited file. It consists of the input
file with extra columns added.
Input lines that cannot be extended are written to stderr.
=cut
my $usage = "usage: proteins_to_roles [-c column] < input > output";
use Bio::KBase::CDMI::CDMIClient;
use Bio::KBase::Utilities::ScriptThing;
my $column;
my $input_file;
my $kbO = Bio::KBase::CDMI::CDMIClient->new_for_script('c=i' => \$column,
                                      'i=s' => \$input_file);
if (! $kbO) { print STDERR $usage; exit }
my $ih;
if ($input_file)
{
    open $ih, "<", $input_file or die "Cannot open input file $input_file: $!";
}
else
{
    $ih = \*STDIN;
}
while (my @tuples = Bio::KBase::Utilities::ScriptThing::GetBatch($ih, undef, $column)) {
    my @h = map { $_->[0] } @tuples;
    my $h = $kbO->proteins_to_roles(\@h);
    for my $tuple (@tuples) {
        #
        # Process output here and print.
        #
        my ($id, $line) = @$tuple;
        my $v = $h->{$id};
        if (! defined($v))
        {
            print STDERR $line,"\n";
        }
        elsif (ref($v) eq 'ARRAY')
        {
            foreach $_ (@$v)
            {
                print "$line\t$_\n";
            }
        }
        else
        {
            print "$line\t$v\n";
        }
    }
}
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)