ElasticSearch - An API for communicating with ElasticSearch
Version 0.28, tested against ElasticSearch server version 0.15.2.
NOTE: This version has been completely refactored, to provide multiple Transport backends, and some methods have moved to subclasses.
ElasticSearch is an Open Source (Apache 2 license), distributed, RESTful Search Engine based on Lucene, and built for the cloud, with a JSON API.
Check out its features: http://www.elasticsearch.org/
This module is a thin API which makes it easy to communicate with an ElasticSearch cluster.
It maintains a list of all servers/nodes in the ElasticSearch cluster, and spreads the load across these nodes in round-robin fashion. If the current active node disappears, then it attempts to connect to another node in the list.
Forking a process triggers a server list refresh, and a new connection to a randomly chosen node in the list.
use ElasticSearch; my $e = ElasticSearch->new( servers => 'search.foo.com:9200', transport => 'http' | 'httplite' | 'httptiny' | 'thrift', # default 'http' max_requests => 10_000, # default 10_000 trace_calls => 'log_file', ); $e->index( index => 'twitter', type => 'tweet', id => 1, data => { user => 'kimchy', post_date => '2009-11-15T14:12:12', message => 'trying out Elastic Search' } ); $data = $e->get( index => 'twitter', type => 'tweet', id => 1 ); $results = $e->search( index => 'twitter', type => 'tweet', query => { term => { user => 'kimchy' }, } ); $results = $e->search( index => 'twitter', type => 'tweet', query => { query_string => { query => 'kimchy' }, } ); $dodgy_qs = "foo AND AND bar"; $results = $e->search( index => 'twitter', type => 'tweet', query => { query_string => { query => $e->query_parser->filter($dodgy_qs) }, } );
See the examples/ directory for a simple working example.
examples/
You can download the latest released version of ElasticSearch from http://www.elasticsearch.org/download/.
See here for setup instructions: http://www.elasticsearch.org/tutorials/2010/07/01/setting-up-elasticsearch.html
I've tried to follow the same terminology as used in the ElasticSearch docs when naming methods, so it should be easy to tie the two together.
Some methods require a specific index and a specific type, while others allow a list of indices or types, or allow you to specify all indices or types. I distinguish between them as follows:
index
type
$e->method( index => multi, type => single, ...)
multi values can be:
multi
index => 'twitter' # specific index index => ['twitter','user'] # list of indices index => undef # (or not specified) = all indices
single values must be a scalar, and are required parameters
single
type => 'tweet'
If you pass as_json => 1 to any request to the ElasticSearch server, it will return the raw UTF8-decodeed JSON response, rather than a Perl datastructure.
as_json => 1
Methods that query the ElasticSearch cluster return the raw data structure that the cluster returns. This may change in the future, but as these data structures are still in flux, I thought it safer not to try to interpret.
Anything that is known to be an error throws an exception, eg trying to delete a non-existent index.
$e = ElasticSearch->new( transport => 'http|httplite|httptiny|thrift', # default 'http' servers => '127.0.0.1:9200' # single server | ['es1.foo.com:9200', 'es2.foo.com:9200'], # multiple servers trace_calls => 1 | '/path/to/log/file', timeout => 30, max_requests => 10_000, # refresh server list # after max_requests );
servers is a required parameter and can be either a single server or an ARRAY ref with a list of servers.
servers
These servers are used in a round-robin fashion. If any server fails to connect, then the other servers in the list are tried, and if any succeeds, then a list of all servers/nodes currently known to the ElasticSearch cluster are retrieved and stored.
Every max_requests (default 10,000) this list of known nodes is refreshed automatically. To disable this automatic refresh, you can set max_requests to 0.
max_requests
0
To force a lookup of live nodes, you can do:
$e->refresh_servers();
There are various transport backends that ElasticSearch can use: http (the default, based on LWP), httplite (based on HTTP::Lite), httptiny (based on HTTP::Tiny) or thrift (which uses the Thrift protocol).
transport
http
httplite
httptiny
thrift
Although the thrift interface has the right buzzwords (binary, compact, sockets), the generated Perl code is very slow. Until that is improved, I recommend one of the http backends instead.
The httplite backend is about 30% faster than the default http backend, and will probably become the default after more testing in production.
The httptiny backend is 1% faster again than httplite but has just been added and needs more testing before putting it into production.
See also: ElasticSearch::Transport, "timeout()", "trace_calls()", http://www.elasticsearch.org/guide/reference/modules/http.html and http://www.elasticsearch.org/guide/reference/modules/thrift.html
$result = $e->index( index => single, type => single, id => $document_id, # optional, otherwise auto-generated data => { key => value, ... }, # optional create => 0 | 1, parent => $parent, percolate => $percolate, refresh => 0 | 1, routing => $routing, timeout => eg '1m' or '10s' version => int, );
eg:
$result = $e->index( index => 'twitter', type => 'tweet', id => 1, data => { user => 'kimchy', post_date => '2009-11-15T14:12:12', message => 'trying out Elastic Search' }, );
Used to add a document to a specific index as a specific type with a specific id. If the index/type/id combination already exists, then that document is updated, otherwise it is created.
id
index/type/id
Note:
If the id is not specified, then ElasticSearch autogenerates a unique ID and a new document is always created.
If version is passed, and the current version in ElasticSearch is different, then a Conflict error will be thrown.
version
Conflict
See also: http://www.elasticsearch.org/guide/reference/api/index_.html, "bulk()" and "put_mapping()"
set() is a synonym for "index()"
set()
$result = $e->create( index => single, type => single, id => $document_id, # optional, otherwise auto-generated data => { key => value, ... }, # optional parent => $parent, percolate => $percolate, refresh => 0 | 1, routing => $routing, timeout => eg '1m' or '10s' );
$result = $e->create( index => 'twitter', type => 'tweet', id => 1, data => { user => 'kimchy', post_date => '2009-11-15T14:12:12', message => 'trying out Elastic Search' }, );
Used to add a NEW document to a specific index as a specific type with a specific id. If the index/type/id combination already exists, then a Conflict error is thrown.
If the id is not specified, then ElasticSearch autogenerates a unique ID.
See also: "index()"
$result = $e->get( index => single, type => single, id => single, # optional fields => 'field' or ['field1',...] refresh => 0 | 1, routing => $routing, ignore_missing => 0 | 1, );
Returns the document stored at index/type/id or throws an exception if the document doesn't exist.
Example:
$e->get( index => 'twitter', type => 'tweet', id => 1)
Returns:
{ _id => 1, _index => "twitter", _source => { message => "trying out Elastic Search", post_date=> "2009-11-15T14:12:12", user => "kimchy", }, _type => "tweet", }
By default the _source field is returned. Use fields to specify a list of (stored) fields to return instead, or [] to return no fields.
_source
fields
[]
Pass a true value for refresh to force an index refresh before performing the get.
refresh
If the requested index, type or id is not found, then a Missing exception is thrown, unless ignore_missing is true.
Missing
ignore_missing
See also: "bulk()", http://www.elasticsearch.org/guide/reference/api/get.html
$result = $e->delete( index => single, type => single, id => single, # optional consistency => 'quorum' | 'one' | 'all' ignore_missing => 0 | 1 refresh => 0 | 1 routing => $routing, replication => 'sync' | 'async' version => int );
Deletes the document stored at index/type/id or throws an Missing exception if the document doesn't exist and ignore_missing is not true.
If you specify a version and the current version of the document is different (or if the document is not found), a Conflict error will be thrown.
If refresh is true, an index refresh will be forced after the delete has completed.
$e->delete( index => 'twitter', type => 'tweet', id => 1);
See also: "bulk()", http://www.elasticsearch.org/guide/reference/api/delete.html
$result = $e->bulk( [ { create => { index => 'foo', type => 'bar', id => 123, data => { text => 'foo bar'}, # optional routing => $routing, parent => $parent, percolate => $percolate, }}, { index => { index => 'foo', type => 'bar', id => 123, data => { text => 'foo bar'}, # optional routing => $routing, parent => $parent, percolate => $percolate, version => $version }}, { delete => { index => 'foo', type => 'bar', id => 123, # optional routing => $routing, parent => $parent, version => $version }}, ], consistency => 'quorum' | 'one' | 'all' # optional refresh => 0 | 1 # optional );
Perform multiple index,create or delete operations in a single request. In my benchmarks, this is 10 times faster than serial operations.
create
delete
For the above example, the $result will look like:
$result
{ actions => [ the list of actions you passed in ], results => [ { create => { _id => 123, _index => "foo", _type => "bar", _version => 1 } }, { index => { _id => 123, _index => "foo", _type => "bar", _version => 2 } }, { delete => { _id => 123, _index => "foo", _type => "bar", _version => 3 } }, ] }
where each row in results corresponds to the same row in actions. If there are any errors for individual rows, then the $result will contain a key errors which contains an array of each error and the associated action, eg:
results
actions
errors
$result = { actions => [ ## NOTE - num is numeric { index => { index => 'bar', type => 'bar', id => 123, data => { num => 123 } } }, ## NOTE - num is a string { index => { index => 'bar', type => 'bar', id => 123, data => { num => 'foo bar' } } }, ], errors => [ { action => { index => { index => 'bar', type => 'bar', id => 123, data => { num => 'text foo' } } }, error => "MapperParsingException[Failed to parse [num]]; ...", }, ], results => [ { index => { _id => 123, _index => "bar", _type => "bar", _version => 1 }}, { index => { error => "MapperParsingException[Failed to parse [num]];...", id => 123, index => "bar", type => "bar", }, }, ], };
NOTE: bulk() also accepts the _index, _type, _id, _source, _parent, _routing and _version parameters so that you can pass search results directly to bulk(). See "reindex.pl" in examples for an example script.
bulk()
_index
_type
_id
_parent
_routing
_version
See http://www.elasticsearch.org/guide/reference/api/bulk.html for more details.
bulk_create()
bulk_delete()
These are convenience methods which allow you to pass just the data, without the index, create or index action for each record, eg:
$e->bulk_index([ { id => 123, index => 'bar', type => 'bar', data => { text=>'foo'} }, { id => 124, index => 'bar', type => 'bar', data => { text=>'bar'} }, ], { refresh => 1 });
is the equivalent of:
$e->bulk([ { index => { id => 123, index => 'bar', type => 'bar', data => { text=>'foo'}} }, { index => { id => 124, index => 'bar', type => 'bar', data => { text=>'bar'}} } ], { refresh => 1 });
$result = $e->analyze( index => single, text => $text_to_analyze, # required # optional analyzer => $analyzer, format => 'detailed' | 'text', prefer_local => 1 | 0 );
The analyze() method allows you to see how ElasticSearch is analyzing the text that you pass in, eg:
analyze()
$result = $e->analyze( text => 'The Man', index => 'foo')
returns:
{ tokens => [ { end_offset => 7, position => 2, start_offset => 4, token => "man", type => "<ALPHANUM>", }, ], }
See http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html for more.
$result = $e->search( index => multi, type => multi, query => {query}, # optional explain => 1 | 0 facets => { facets } fields => [$field_1,$field_n] from => $start_from highlight => { highlight } indices_boost => { index_1 => 1.5,... } routing => [$routing, ...] script_fields => { script_fields } search_type => $search_type size => $no_of_results sort => ['_score',$field_1] scroll => '5m' | '30s' timeout => '10s' version => 0 | 1 );
Searches for all documents matching the query. Documents can be matched against multiple indices and multiple types, eg:
$result = $e->search( index => undef, # all type => ['user','tweet'], query => { term => {user => 'kimchy' }} );
For all of the options that can be included in the query parameter, see http://www.elasticsearch.org/guide/reference/api/search and http://www.elasticsearch.org/guide/reference/query-dsl
query
$result = $e->scroll( scroll_id => $scroll_id, scroll => '5m' | '30s', );
If a search has been executed with a scroll parameter, then the returned scroll_id can be used like a cursor to scroll through the rest of the results.
scroll
scroll_id
If a further scroll request will be issued, then the scroll parameter should be passed as well. For instance;
my $result = $e->search( query=>{match_all=>{}}, scroll => '5m' ); while (1) { my $hits = $result->{hits}{hits}; last unless @$hits; # if no hits, we're finished do_something_with($hits); $result = $e->scroll( scroll_id => $result->{_scroll_id}, scroll => '5m' ); }
See http://www.elasticsearch.org/guide/reference/api/search/scroll.html
$result = $e->count( index => multi, type => multi, # optional routing => [$routing,...] # one of: bool | constant_score | custom_score | dis_max | field | field_masking_span | filtered | flt | flt_field | has_child | fuzzy | match_all | mlt | mlt_field | query_string | prefix | range | span_term | span_first | span_near | span_not | span_or | term | top_children | wildcard );
Counts the number of documents matching the query. Documents can be matched against multiple indices and multiple types, eg
$result = $e->count( index => undef, # all type => ['user','tweet'], term => {user => 'kimchy' }, );
See also "search()", http://www.elasticsearch.org/guide/reference/api/count.html and http://www.elasticsearch.org/guide/reference/query-dsl
$result = $e->delete_by_query( index => multi, type => multi, # optional consistency => 'quorum' | 'one' | 'all' replication => 'sync' | 'async' routing => [$routing,...] # one of : bool | constant_score | custom_score | dis_max | field | field_masking_span | filtered | flt | flt_field | has_child | fuzzy | match_all | mlt | mlt_field | query_string | prefix | range | span_term | span_first | span_near | span_not | span_or | term | top_children | wildcard );
Deletes any documents matching the query. Documents can be matched against multiple indices and multiple types, eg
$result = $e->delete_by_query( index => undef, # all type => ['user','tweet'], term => {user => 'kimchy' } );
See also "search()", http://www.elasticsearch.org/guide/reference/api/delete-by-query.html and http://www.elasticsearch.org/guide/reference/query-dsl
# mlt == more_like_this $results = $e->mlt( index => single, # required type => single, # required id => $id, # required # optional more-like-this params boost_terms => float mlt_fields => 'scalar' or ['scalar_1', 'scalar_n'] max_doc_freq => integer max_query_terms => integer max_word_len => integer min_doc_freq => integer min_term_freq => integer min_word_len => integer pct_terms_to_match => float stop_words => 'scalar' or ['scalar_1', 'scalar_n'] # optional search params explain => {explain} facets => {facets} fields => {fields} from => {from} highlight => {highlight} indices_boost => { index_1 => 1.5,... } routing => [$routing,...] script_fields => { script_fields } scroll => '5m' | '10s' search_type => $search_type size => {size} sort => {sort} scroll => '5m' | '30s' timeout => '10s' version => 0 | 1 )
More-like-this (mlt) finds related/similar documents. It is possible to run a search query with a more_like_this clause (where you pass in the text you're trying to match), or to use this method, which uses the text of the document referred to by index/type/id.
more_like_this
This gets transformed into a search query, so all of the search parameters are also available.
See http://www.elasticsearch.org/guide/reference/api/more-like-this.html and http://www.elasticsearch.org/guide/reference/query-dsl/mlt-query.html
$result = $e->index_status( index => multi, );
Returns the status of $result = $e->index_status(); #all $result = $e->index_status( index => ['twitter','buzz'] ); $result = $e->index_status( index => 'twitter' );
Throws a Missing exception if the specified indices do not exist.
See http://www.elasticsearch.org/guide/reference/api/admin-indices-status.html
$result = $e->create_index( index => single, # optional settings => {...}, mappings => {...}, );
Creates a new index, optionally passing index settings and mappings, eg:
$result = $e->create_index( index => 'twitter', settings => { number_of_shards => 3, number_of_replicas => 2, analysis => { analyzer => { default => { tokenizer => 'standard', char_filter => ['html_strip'], filter => [qw(standard lowercase stop asciifolding)], } } } }, mappings => { tweet => { properties => { user => { type => 'string' }, content => { type => 'string' }, date => { type => 'date' } } } } );
Throws an exception if the index already exists.
See http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html
$result = $e->delete_index( index => single, ignore_missing => 0 | 1 # optional );
Deletes an existing index, or throws a Missing exception if the index doesn't exist and ignore_missing is not true:
$result = $e->delete_index( index => 'twitter' );
See http://www.elasticsearch.org/guide/reference/api/admin-indices-delete-index.html
$result = $e->update_index_settings( index => multi, settings => { ... settings ...}, );
Update the settings for all, one or many indices. Currently only the number_of_replicas is exposed:
number_of_replicas
$result = $e->update_index_settings( settings => { number_of_replicas => 1 } );
See http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html
$result = $e->aliases( actions => [actions] | {actions} )
Adds or removes an alias for an index, eg:
$result = $e->aliases( actions => [ { remove => { index => 'foo', alias => 'bar' }}, { add => { index => 'foo', alias => 'baz' }} ]);
actions can be a single HASH ref, or an ARRAY ref containing multiple HASH refs.
See http://www.elasticsearch.org/guide/reference/api/admin-indices-aliases.html
$result = $e->get_aliases( index => multi )
Returns a hashref listing all indices and their corresponding aliases, and all aliases and their corresponding indices, eg:
{ aliases => { bar => ["foo"], baz => ["foo"], }, indices => { foo => ["baz", "bar"] }, }
If you pass in the optional index argument, which can be an index name or an alias name, then it will only return the indices and aliases related to that argument.
Note: get_aliases() does not support "as_json"
get_aliases()
$result = $e->open_index( index => single);
Opens a closed index.
The open and close index APIs allow you to close an index, and later on open it.
A closed index has almost no overhead on the cluster (except for maintaining its metadata), and is blocked for read/write operations. A closed index can be opened which will then go through the normal recovery process.
See http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close.html for more
$result = $e->close_index( index => single);
Closes an open index. See http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close.html for more
$result = $e->create_index_template( name => single, template => $template, # required mappings => {...}, # optional settings => {...}, # optional );
Index templates allow you to define templates that will automatically be applied to newly created indices. You can specify both settings and mappings, and a simple pattern template that controls whether the template will be applied to a new index.
settings
mappings
template
For example:
$result = $e->create_index_template( name => 'my_template', template => 'small_*', settings => { number_of_shards => 1 } );
See http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html for more.
$result = $e->index_template( name => single );
Retrieves the named index template.
See http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html#GETting_a_Template
$result = $e->delete_index_template( name => single, ignore_missing => 0 | 1 # optional );
Deletes the named index template.
See http://www.elasticsearch.org/guide/reference/api/admin-indices-templates.html#Deleting_a_Template
$result = $e->flush_index( index => multi, full => 0 | 1, # optional refresh => 0 | 1, # optional );
Flushes one or more indices, which frees memory from the index by flushing data to the index storage and clearing the internal transaction log. By default, ElasticSearch uses memory heuristics in order to automatically trigger flush operations as required in order to clear memory.
$result = $e->flush_index( index => 'twitter' );
See http://www.elasticsearch.org/guide/reference/api/admin-indices-flush.html
$result = $e->refresh_index( index => multi, );
Explicitly refreshes one or more indices, making all operations performed since the last refresh available for search. The (near) real-time capabilities depends on the index engine used. For example, the robin one requires refresh to be called, but by default a refresh is scheduled periodically.
$result = $e->refresh_index( index => 'twitter' );
See http://www.elasticsearch.org/guide/reference/api/admin-indices-refresh.html
$result = $e->optimize_index( index => multi, only_deletes => 0 | 1, # only_expunge_deletes flush => 0 | 1, # flush after optmization refresh => 0 | 1, # refresh after optmization wait_for_merge => 1 | 0, # wait for merge to finish max_num_segments => int, # number of segments to optimize to )
See http://www.elasticsearch.org/guide/reference/api/admin-indices-optimize.html
$result = $e->gateway_snapshot( index => multi, );
Explicitly performs a snapshot through the gateway of one or more indices (backs them up ). By default, each index gateway periodically snapshot changes, though it can be disabled and be controlled completely through this API.
$result = $e->gateway_snapshot( index => 'twitter' );
See http://www.elasticsearch.org/guide/reference/api/admin-indices-gateway-snapshot.html and http://www.elasticsearch.org/guide/reference/modules/gateway
snapshot_index() is a synonym for "gateway_snapshot()"
snapshot_index()
$result = $e->clear_cache( index => multi, bloom => 0 | 1, field_data => 0 | 1, filter => 0 | 1, id => 0 | 1, );
Clears the caches for the specified indices. By default, clears all caches, but if any of id, field, field_data or bloom are true, then it clears just the specified caches.
field
field_data
bloom
See http://www.elasticsearch.org/guide/reference/api/admin-indices-clearcache.html
$result = $e->put_mapping( index => multi, type => single, properties => { ... }, # required # optional _all => { ... }, _analyzer => { ... }, _boost => { ... }, _id => { ... }, _index => { ... }, _meta => { ... }, _parent => { ... }, _routing => { ... }, _source => { ... }, dynamic => 1 | 0 | 'strict', dynamic_templates => [ ... ], ignore_conflicts => 0 | 1, );
A mapping is the data definition of a type. If no mapping has been specified, then ElasticSearch tries to infer the types of each field in document, by looking at its contents, eg
mapping
'foo' => string 123 => integer 1.23 => float
However, these heuristics can be confused, so it safer (and much more powerful) to specify an official mapping instead, eg:
$result = $e->put_mapping( index => ['twitter','buzz'], type => 'tweet', _source => { compress => 1 }, properties => { user => {type => "string", index => "not_analyzed"}, message => {type => "string", null_value => "na"}, post_date => {type => "date"}, priority => {type => "integer"}, rank => {type => "float"} } );
See also: http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html and http://www.elasticsearch.org/guide/reference/mapping
$result = $e->delete_mapping( index => multi, type => single, ignore_missing => 0 | 1, );
Deletes a mapping/type in one or more indices. See also http://www.elasticsearch.org/guide/reference/api/admin-indices-delete-mapping.html
Throws a Missing exception if the indices or type don't exist and ignore_missing is false.
$mapping = $e->mapping( index => single, type => multi );
Returns the mappings for all types in an index, or the mapping for the specified type(s), eg:
$mapping = $e->mapping( index => 'twitter', type => 'tweet' ); $mappings = $e->mapping( index => 'twitter', type => ['tweet','user'] ); # { twitter => { tweet => {mapping}, user => {mapping}} }
Note: the index name which as used in the results is the actual index name. If you pass an alias name as the index name, then this key will be the index (or indices) that the alias points to.
See also: http://www.elasticsearch.org/guide/reference/api/admin-indices-get-mapping.html
See http://www.elasticsearch.org/guide/reference/river/ and http://www.elasticsearch.org/guide/reference/river/twitter.html.
$result = $e->create_river( river => $river_name, # required type => $type, # required $type => {...}, # depends on river type index => {...}, # depends on river type );
Creates a new river with name $name, eg:
$name
$result = $e->create_river( river => 'my_twitter_river', type => 'twitter', twitter => { user => 'user', password => 'password', }, index => { index => 'my_twitter_index', type => 'status', bulk_size => 100 } )
$result = $e->get_river( river => $river_name, ignore_missing => 0 | 1 # optional );
Returns the river details eg
$result = $e->get_river ( river => 'my_twitter_river' )
Throws a Missing exception if the river doesn't exist and ignore_missing is false.
$result = $e->delete_river( river => $river_name );
Deletes the corresponding river, eg:
$result = $e->delete_river ( river => 'my_twitter_river' )
See http://www.elasticsearch.org/guide/reference/river/.
$result = $e->river_status( river => $river_name, ignore_missing => 0 | 1 # optional );
Returns the status doc for the named river.
See also: http://www.elasticsearch.org/guide/reference/api/percolate.html and http://www.elasticsearch.org/blog/2011/02/08/percolator.html
$e->create_percolator( index => single percolator => $percolator query => {query} # required data => {data} # optional )
Create a percolator, eg:
$e->create_percolator( index => 'myindex', percolator => 'mypercolator', query => { field => { text => 'foo' }}, data => { color => 'blue' } )
$e->get_percolator( index => single percolator => $percolator, ignore_missing => 0 | 1, )
Retrieves a percolator, eg:
$e->get_percolator( index => 'myindex', percolator => 'mypercolator', )
Throws a Missing exception if the specified index or percolator does not exist, and ignore_missing is false.
$e->delete_percolator( index => single percolator => $percolator, ignore_missing => 0 | 1, )
Deletes a percolator, eg:
$e->delete_percolator( index => 'myindex', percolator => 'mypercolator', )
$result = $e->percolate( index => single, type => single, doc => { doc to percolate }, # optional query => { query to filter percolators }, prefer_local => 1 | 0, )
Check for any percolators which match a document, optionally filtering which percolators could match by passing a query param, for instance:
$result = $e->percolate( index => 'myindex', type => 'mytype', doc => { text => 'foo' }, query => { term => { color => 'blue' }} );
{ ok => 1, matches => ['mypercolator'] }
$result = $e->cluster_state( # optional filter_blocks => 0 | 1, filter_nodes => 0 | 1, filter_metadata => 0 | 1, filter_routing_table => 0 | 1, filter_indices => [ 'index_1', ... 'index_n' ], );
Returns cluster state information.
See http://www.elasticsearch.org/guide/reference/api/admin-cluster-state.html
$result = $e->cluster_health( index => multi, level => 'cluster' | 'indices' | 'shards', timeout => $seconds wait_for_status => 'red' | 'yellow' | 'green', | wait_for_relocating_shards => $number_of_shards, | wait_for_nodes => eg '>=2', );
Returns the status of the cluster, or index|indices or shards, where the returned status means:
red
yellow
green
It can block to wait for a particular status (or better), or can block to wait until the specified number of shards have been relocated (where 0 means all) or the specified number of nodes have been allocated.
If waiting, then a timeout can be specified.
$result = $e->cluster_health( wait_for_status => 'green', timeout => '10s')
See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-health.html
$result = $e->nodes( nodes => multi, settings => 1 | 0 # optional );
Returns information about one or more nodes or servers in the cluster. If settings is true, then it includes the node settings information.
true
See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-info.html
$result = $e->nodes_stats( node => multi, );
Returns various statistics about one or more nodes in the cluster.
See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats.html
$result = $e->shutdown( node => multi, delay => '5s' | '10m' # optional );
Shuts down one or more nodes (or the whole cluster if no nodes specified), optionally with a delay.
node can also have the values _local, _master or _all.
node
_local
_master
_all
See: http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-shutdown.html
$result = $e->restart( node => multi, delay => '5s' | '10m' # optional );
Restarts one or more nodes (or the whole cluster if no nodes specified), optionally with a delay.
See: "KNOWN ISSUES"
$version = $e->current_server_version()
Returns a HASH containing the version number string, the build date and whether or not the current server is a snapshot_build.
number
date
snapshot_build
$es->trace_calls(1); # log to STDERR $es->trace_calls($filename); # log to $filename.$PID $es->trace_calls(0 | undef); # disable logging
trace_calls() is used for debugging. All requests to the cluster are logged either to STDERR or the specified filename, with the current $PID appended, in a form that can be rerun with curl.
trace_calls()
STDERR
The cluster response will also be logged, and commented out.
Example: $e->cluster_health is logged as:
$e->cluster_health
# [Tue Oct 19 15:32:31 2010] Protocol: http, Server: 127.0.0.1:9200 curl -XGET 'http://127.0.0.1:9200/_cluster/health' # [Tue Oct 19 15:32:31 2010] Response: # { # "relocating_shards" : 0, # "active_shards" : 0, # "status" : "green", # "cluster_name" : "elasticsearch", # "active_primary_shards" : 0, # "timed_out" : false, # "initializing_shards" : 0, # "number_of_nodes" : 1, # "unassigned_shards" : 0 # }
$qp = $e->query_parser(%opts);
Returns an ElasticSearch::QueryParser object for tidying up query strings so that they won't cause an error when passed to ElasticSearch.
See ElasticSearch::QueryParser for more information.
$transport = $e->transport
Returns the Transport object, eg ElasticSearch::Transport::HTTP.
$timeout = $e->timeout($timeout)
Convenience method which does the same as:
$e->transport->timeout($timeout)
$e->refresh_servers()
$e->transport->refresh_servers()
This tries to retrieve a list of all known live servers in the ElasticSearch cluster by connecting to each of the last known live servers (and the initial list of servers passed to new()) until it succeeds.
new()
This list of live servers is then used in a round-robin fashion.
refresh_servers() is called on the first request and every max_requests. This automatic refresh can be disabled by setting max_requests to 0:
refresh_servers()
$e->transport->max_requests(0)
Or:
$e = ElasticSearch->new( servers => '127.0.0.1:9200', max_requests => 0, );
$bool = $e->camel_case($bool)
Gets/sets the camel_case flag. If true, then all JSON keys returned by ElasticSearch are in camelCase, instead of with_underscores. This flag does not apply to the source document being indexed or fetched.
Defaults to false.
$bool = $e->error_trace($bool)
If the ElasticSearch server is returning an error, setting error_trace to true will return some internal information about where the error originates. Mostly useful for debugging.
error_trace
$Elasticsearch::DEBUG = 0 | 1;
If $Elasticsearch::DEBUG is set to true, then ElasticSearch exceptions will include a stack trace.
$Elasticsearch::DEBUG
Clinton Gormley, <drtech at cpan.org>
<drtech at cpan.org>
The _source key that is returned from a "get()" contains the original JSON string that was used to index the document initially. ElasticSearch parses JSON more leniently than JSON::XS, so if invalid JSON is used to index the document (eg unquoted keys) then $e->get(....) will fail with a JSON exception.
$e->get(....)
Any documents indexed via this module will be not susceptible to this problem.
restart() is currently disabled in ElasticSearch as it doesn't work correctly. Instead you can "shutdown()" one or all nodes and then start them up from the command line.
restart()
This is a beta module, so there will be bugs, and the API is likely to change in the future, as the API of ElasticSearch itself changes.
If you have any suggestions for improvements, or find any bugs, please report them to http://github.com/clintongormley/ElasticSearch.pm/issues. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
Hopefully I'll be adding an ElasticSearch::Abstract (similar to SQL::Abstract) which will make it easier to generate valid queries for ElasticSearch.
Also, a non-blocking AnyEvent module has been written, but needs integrating with the new ElasticSearch::Transport.
This version is missing tests for parent, routing and percolator. Will follow soon.
parent
routing
percolator
You can find documentation for this module with the perldoc command.
perldoc ElasticSearch
You can also look for information at:
GitHub
http://github.com/clintongormley/ElasticSearch.pm
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=ElasticSearch
AnnoCPAN: Annotated CPAN documentation
http://annocpan.org/dist/ElasticSearch
CPAN Ratings
http://cpanratings.perl.org/d/ElasticSearch
Search CPAN
http://search.cpan.org/dist/ElasticSearch/
Thanks to Shay Bannon, the ElasticSearch author, for producing an amazingly easy to use search engine.
Copyright 2010 - 2011 Clinton Gormley.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install ElasticSearch, copy and paste the appropriate command in to your terminal.
cpanm
cpanm ElasticSearch
CPAN shell
perl -MCPAN -e shell install ElasticSearch
For more information on module installation, please visit the detailed CPAN module installation guide.