doc/slides/intro/Spork.slides


            
              1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
              ----
presentation_topic: dbic_chado
presentation_title: Bio::Chado::Schema
presentation_subtitle: a standard Perl ORM layer for Chado
presentation_place: San Diego, CA
presentation_date: January 13-14, 2010
----
= What is DBIx::Class?
* Object-relational mapping framework for Perl
* is now the de-facto standard
* nice features
----
= What is DBIx::Class?
* query building (the magic of chainable ResultSets)
* cross-database deployment (using |SQL::Translator| in the backend)
* lots and lots of plugins availabe (dates, testing, ...)
see http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Manual.pod
----
= Data Model
* a set of classes
* encapsulate the underlying storage
* providing a nicer, maintainable API for working with the data
This isn't really middleware, it's a code library.
----
= And therefore, Bio::Chado::Schema
== Approx. 180 Perl classes
* every table and view, every module
* Chado docs mirrored in the POD
http://search.cpan.org/dist/Bio-Chado-Schema
+  ( So why do we need this? )
----
= Chado needs encapsulation
== Chado's design makes for:
* complex queries
* steep learning curve
* hard to get good performance
----
= Chado needs encapsulation
== complex queries
* generating
* storing
* automating
----
= Chado needs encapsulation
== steep learning curve
* codifying best practices
* unified, high-level documentation
* can help with database administration
----
= Chado needs encapsulation
== performance
* encapsulation => performance optimizations where needed
* first step for more data management
** creative indexing
** materialized views (with their own classes)
----
= BCS Usage
* open the schema.  ( actual DB connection is deferred )
+  my $chado = Bio::Chado::Schema->connect( 'dbi:Pg:...', $user, $pass );
+* get a ResultSet object representing a set of Rows
+  my $all_features = $chado->resultset('Sequence::Feature');
+* a subset of those (another ResultSet)
+  my $other_feature = $all_features->search({ name => 'something' });
+* get an actual data object
+  my $some_feature = $all_features->find( 232432 );
  $other_feature->first;
  # or search in list context returns all resulting rows (careful!)
  my @other_features = $all_features->search({ name => 'something' });
+* get its type cvterm
+  say $some_feature->type->name;
----
= BCS Usage: Joined Select
  # get features via the potato organism, also joining in the cvterms table
  my $potato_bacs =
       $chado->resultset('Organism::Organism')
             ->search({ species => 'Solanum tuberosum' })
             ->search_related( 'features',
                               { 'type.name' => 'BAC_clone'},
                               { 'join' => 'type' },
                             );
----
= BCS Usage: Joined Select
  # the equivalent bare SQL
  my $potato_bacs = $dbh->selectall_arrayref( <<EOS, undef, 'Solanum tuberosum', 'BAC_clone');
  SELECT features.feature_id
       , features.dbxref_id
       , features.organism_id
       , features.name
       , features.uniquename
       , features.residues
       , features.seqlen
       , features.md5checksum
       , features.type_id
       , features.is_analysis
       , features.is_obsolete
       , features.timeaccessioned
       , features.timelastmodified
  FROM organism me
  LEFT JOIN feature features
         ON features.organism_id = me.organism_id
  JOIN cvterm type
         ON type.cvterm_id = features.type_id
  WHERE type.name = 'BAC_clone' AND species = 'Solanum tuberosum'
  EOS
----
= BCS Usage: Loading
  $chado->resultset( 'Cv::Cv' )
        ->find_or_create({ name => 'My Fake Ontology' })
        ->create_related(  'cvterm',
                           { name => 'MyFakeTerm' });
makes the SQL:
  SELECT me.cv_id
       , me.name
       , me.definition
  FROM cv me
  WHERE ( me.name = 'my fake ontology' )
  INSERT INTO cv ( name )
          VALUES ( ?    )
----
= BCS Usage: Transactions
  $chado->txn_do(sub {
      $chado->resultset('Cv::Cv')
            ->find_or_create({ name => 'My Fake Ontology' })
            ->create_related( 'cvterm', { name => 'MyFakeTerm' } );
  });
----
= The Real Advantages of DBIC
* easier to manipulate and assemble queries
* Don't Repeat Yourself
----
= The Real Advantages of DBIC
* it's all objects.  you can delegate to them, pass them around, etc.
* HOWEVER:
+** usually you don't want to subclass them
** but, see |DBIx::Class::Manual::Cookbook|
----
= The Real Advantages of DBIC
* complex joined queries (Chado queries) are very easy and compact
----
= The Real Advantages of DBIC
* SQL syntax errors are much more difficult to make
----
= Using DBIC with your own tables
* use |DBIx::Class::Schema::Loader| to dump a whole set
* make your own definitions
* your table: other_thing, foreign key feature_id to Chado feature table
  package My::DBIC:::Layer::OtherThing;
  use base 'DBIx::Class::Core';
  __PACKAGE__->table('other_thing');
  __PACKAGE__->add_columns(
    'other_thing_id' => { ... },
    'name'           => { ... },
    'definition'     => { ... },
    'feature_id'     => { ... },
  );
  __PACKAGE__->set_primary_key('other_thing_id');
  __PACKAGE__->add_unique_constraint('ot_c1', ['name']);
  __PACKAGE__->belongs_to(
    'feature',
    'Bio::Chado::Schema::Sequence::Feature',
    { 'foreign.feature_id' => 'self.feature_id' },
  );
----
= "Duct tape" BCS to your own schema
* make an accessor 'other_things' that ties your own DBIC class to BCS
+  Bio::Chado::Schema::Sequence::Feature->has_many(
    'other_things',
    'My::DBIC::Layer::OtherThing',
    { 'foreign.feature_id' => 'self.feature_id' },
  );
+* add it to the BCS schema dynamically
+  Bio::Chado::Schema->register_source('OtherThing', 'My::DBIC::Layer::OtherThing');
+* use it with the rest
+  $chado->resultset('Sequence::Feature')->other_things;
----
= Making a composite schema
  my $merged_schema_class =
        Bio::Chado::Schema->merge( 'My::DBIC::Layer' );
  $merged_schema_class->connect( ... );
  $chado->resultset('Sequence::Feature')->other_things;
  $chado->resultset('OtherThing')->find(...)->feature;
Note: merge() is new in BCS 0.6, releasing soon.
         
----
= Further Work
== Still need to add in some DBIx::Class relationships:
* more many_to_many relationships (must be added manually)
== More useful things are needed:
* automate more common querying and loading patterns
* compatibility with BioPerl data objects
----
== Acknowledgments
* Aure Bombarely (SGN)
* Naama Menda (SGN)
* Siddhartha Basu (dictybase)
* Lukas Mueller (SGN)
----
== That's All
* The END
	Global
`s`	Focus search bar
`?`	Bring up this help dialog
	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)
	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse
	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)