The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Apporo - Perl binding for Apporo(Approximate String Matching Engine)

SYNOPSIS

  use Apporo;

DESCRIPTION

Apporo is one of the Approximate String Matching Engine. In example, it can use to correct the miss spellings of search query of a medium scale web service.

This module enable to use Apporo from the Perl scripts. You shoule see also http://code.google.com/p/apporo/ to install the Apporo C++ Library.

First, you have to make the indexes of a target data for apporo. If your data is written in single byte character language, you should use ASCII mode.

  - ASCII mode example
  % apporo_indexer -i [your TSV file] -bt
  % apporo_indexer -i [your TSV file] -d

If your data is written in UTF-8, you should use UTF-8 mode.

  - UTF-8 char mode example
  % apporo_indexer -i [your TSV file] -u -bt
  % apporo_indexer -i [your TSV file] -d

After indexing, You have to write a configure file of Apporo. This file is written as TSV format. You can set the search options. See also Search Options section of document on GoogleCode (http://code.google.com/p/apporo/)

  % cat ./sample.conf
  ngram_length    2
  is_pre          true
  is_suf          true
  is_utf8         false
  dist_threshold  0.6
  index_path      path to your file which already indexed.
  dist_func       edit
  entry_buf_len   1024
  engine          tsubomi
  result_num      10
  bucket_size     2000
  is_surface      true
  is_kana         false
  is_roman        false
  is_mecab        false
  is_juman        false
  is_kytea        false

The Options which are is_kana, is_roman, is_mecab, is_juman and is_kytea will be able to use in the near future.

If you finish to write the configure file, you can use Apporo in following way.

  #!/usr/bin/env perl

  use strict;
  use warnings;
  use utf8;
  use YAML;

  use Apporo;

  my $config_path = "/path/to/config file/of/apporo";
  my $query = "/string/of/search/query";
  my $app = Apporo->new($config_path); #reusable
  my @arr = $app->retrieve($query);
  print Dump \@arr;

You can do approximate strigng matching from your target data using your query string.

That's all.

AUTHOR

Toshinori Satou <overlasting {at} gmail.com>

SEE ALSO

  - http://code.google.com/p/apporo/

LICENSE

This Perl module is free software. you can redistribute it and/or modify it under the same terms as Perl itself.

All code of Apporo C++ Library is provided under the New BSD license.