The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Webservice::InterMine::Cookbook::Recipe2 - Adding Constraints

SYNOPSIS

  # Get a list of the drosophilids in the database

  use Webservice::InterMine ('www.flymine.org/query');

  my $query = Webservice::InterMine->new_query(class => 'Organism');

  $query->select('*')->where(genus => 'Drosophila')->show();

  # Get all the information about of all publications on drosophilid 
  # published since 2010

  my $query2 = Webservice::InterMine->new_query(class => 'Gene');

  $query2->select('publications.*')
         ->where('organism.genus' => 'Drosophila', 'publications.year' => {'>=' => 2010})
         ->show();

  # Do the same thing in a more InterMine-y way:

  my $query2 = Webservice::InterMine->new_query(class => 'Gene');

  $query2->add_view('publications.*');
  $query2->add_constraint('organism.genus', '=', 'Drosophila');
  $query2->add_constraint('publications.year', '>=', 2010);
  $query2->show();

  # And more explicit yet:

  my $query2 = Webservice::InterMine->new_query(class => 'Gene');

  $query2->add_view('publications.*');
  $query2->add_constraint(
    path  => 'organism.genus', 
    op    => '=', 
    value => 'Drosophila'
  );
  $query2->add_constraint(
    path  => 'publications.year', 
    op    => '>=', 
    value =>  2010
  );
  $query2->show();

DESCRIPTION

InterMine offers a powerful and flexible set of constraints, or filters on the results you receive. The flexibility of this system is comparable to a full SQL-like query language, and allows complex, arbitrary queries over the entire database schema.

This section introduces ways constraints can be specified.

The where method:

This method takes a list of constraints, which are implicitly 'and'ed together. The syntax for describing these is hopefully relatively intuitive, and is based on that of DBIx::Class, one of the larger Perl ORM systems. In this mechanism, a constraint is usually just the path and the value it should have:

  my $query = $service->new_query(class => 'Gene');
  $query->where(
    symbol => 'zen*',
    length => 10_000,
    'organism.genus' => 'Drosophila',
    'organism.species' => ['melanogaster', 'mojavensis']
  );

Specifying a list (such as ['melanogaster', 'mojavensis']), means that the value must be one of the values in the list.

For those familiar to DBIx::Class, the search method is available:

  $query->search({
    symbol => 'zen*',
    length => 10_000,
    'organism.genus' => 'Drosophila',
    'organism.species' => ['melanogaster', 'mojavensis']
  });

In InterMine, the search method accepts either a hashref or an arrayref as its first argument. It does not take a second argument. It also does not support the 'AND'/'OR' syntax.

Where you need to be more precise about the operation, replace the value by a hash reference of operator => value:

  $query->where(length => {'<' => 10_000});

Note that there are 'legible' versions of the comparison operators which are convenient to use as hash keys:

  $query->where(length => {gt => 5_000}, length => {lt => 10_000});

The add_constraint method

A more traditionally InterMine method, this allows positional and named parameters, as well as the two parameter style above. It does not allow method chaining, instead returning the constraint just added.

Most constraints take the following parameters (or a slight variation):

  • path - the path representing the attribute to be constrained

  • op - the 'operator', which defines how to constrain the path

  • value - the value to be applied to the operator

When calling with a list of parameters, the order is path, operator, value:

  $query->add_constraint('Gene.symbol', '=', 'eve');
  $query->add_constraint('Gene.homologue', 'IS NOT NULL');
  $query->add_constraint('Gene.name', 'IN', ['Even skipped', 'Zerknullt']);

This works for all constraint types.

For the simpler constraints (Binary and Unary constraints only), it is also possible to add constraints using the following pattern:

  $query->add_constraint('Gene.organism.name = "Drosophila Yakuba"');

Note the quoting of the value and the whole string: the constraint here is a single string, which is parsed for the path, operator and value. This format is not recommended - it is always better to be explicit about your intentions.

In the examples above the constraints are all cumulative, meaning we only get results back if an item satisfies all of their requirements. We say that the logic for $query2 is "A and B". It is possible however to 'or' your constraints together as well - see below:

  # Get the authors, titles and PubMed IDs of all publications
  # since 2004 on genes in D. Yakuba or D. Melanogaster

  my $query3 = Webservice::InterMine->new_query(class => 'Gene');

  $query3->select('publications.*')
         ->where(
            'organism.genus'    => 'Drosophila',   # A
            'organism.species'  => 'yakuba',       # B
            'organism.species'  => 'melanogaster', # C
            'publications.year' => {ge => 2010},   # D
          )
         ->set_logic("A and (B or C) and D")
         ->show();

In this example the logic is specified by a string which names each constraint in reference to the order they were added to the query, and sets out the relationship between them all. The valid logical operators are "and" and "or".

The following is a slightly longer-winded equivalent:

  my $query3 = Webservice::InterMine->new_query(class => 'Gene');
  my $con1 = $query3->add_constraint(
      path  => 'Gene.organism.genus',
      op    => '=',
      value => 'Drosophila',
  );
  my $con2 = $query3->add_constraint(
      path  => 'Gene.organism.species',
      op    => '=',
      value => 'yakuba',
  );
  my $con3 = $query3->add_constraint(
      path  => 'Gene.organism.species',
      op    => '=',
      value => 'melanogaster',
  );
  my $con4 = $query3->add_constraint(
      path  => 'Gene.publications.year',
      op    => '>=',
      value => 2010,
  );

  $query3->set_logic($con1 & ($con1 | $con2) & $con3);
  $query3->show();

Note that here we keep the constraint objects returned by add_constraint, which we would normally just ignore. Then these are combined to create the logic for the query using the | and & operators. You can always inspect the logic for a query by calling $query->logic->code, which here would return "(A or B) and C". It is also possible to use string parsing to define the logic:

  $query3->logic('(A or B) and C');

The letters used here are the 'codes' associated with each constraint - to find a constraint's code you can always call $con->code, and to find out what it does you can call $con->to_string, which for constraint A would return:

  'Gene.organism.name = "Drosophila Yakuba"'

Normally the constraint codes are simply a series that increments for each constraint that is added, but if you really want to rely on a specific constraint having a specific code you can call:

  $query->add_constraint(
      path  => 'Organism.name',
      op    => '=',
      value => 'Drosophila melanogaster',
      code  => 'Q',
  );

These can even be specified during a where call:

  $query->where(name => {'=' => 'Drosophila melanogaster', code => 'Q'});

Note that in this case we must use the '=' operator and the hashref style.

CONCLUSION

A query can be made very powerful with a few basic constraints and some simple logic. There are several different kinds of constraints as well, which are detailed in Recipe3.

AUTHOR

Alex Kalderimis <dev@intermine.org>

BUGS

Please report any bugs or feature requests to dev@intermine.org.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Webservice::InterMine

You can also look for information at:

COPYRIGHT AND LICENSE

Copyright 2006 - 2010 FlyMine, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.