NAME
Lingua::YALI::Identifier - Module for language identification with custom models.
VERSION
version 0.016
SYNOPSIS
This modul identify languages with moduls provided by the user. If you want to use pretrained models use Lingua::YALI::LanguageIdentifier.
Models trained on texts from specific domain outperforms the general ones.
# create models
my
$builder_a
= Lingua::YALI::Builder->new(
ngrams
=>[2]);
$builder_a
->train_string(
"aaaaa aaaa aaa aaa aaa aaaaa aa"
);
$builder_a
->store(
"model_a.2_all.gz"
, 2);
my
$builder_b
= Lingua::YALI::Builder->new(
ngrams
=>[2]);
$builder_b
->train_string(
"bbbbbb bbbb bbbb bbb bbbb bbbb bbb"
);
$builder_b
->store(
"model_b.2_all.gz"
, 2);
# create identifier and load models
my
$identifier
= Lingua::YALI::Identifier->new();
$identifier
->add_class(
"a"
,
"model_a.2_all.gz"
);
$identifier
->add_class(
"b"
,
"model_b.2_all.gz"
);
# identify strings
my
$result1
=
$identifier
->identify_string(
"aaaaaaaaaaaaaaaaaaa"
);
$result1
->[0]->[0] .
"\t"
.
$result1
->[0]->[1];
# prints out a 1
my
$result2
=
$identifier
->identify_string(
"bbbbbbbbbbbbbbbbbbb"
);
$result2
->[0]->[0] .
"\t"
.
$result2
->[0]->[1];
# prints out b 1
More examples is presented in Lingua::YALI::Examples.
METHODS
BUILD
Initializes internal variables.
# create identifier
my
$identifier
= Lingua::YALI::Identifier->new();
add_class
$added
=
$identifier
->add_class(
$class
,
$model
)
Adds model stored in file $model
with class $class
and returns whether it was added or not.
$identifier
->add_class(
"a"
,
"model.a1.gz"
) .
"\n"
;
# prints out 1
$identifier
->add_class(
"a"
,
"model.a2.gz"
) .
"\n"
;
# prints out 0 - class a was already added
remove_class
my
$removed
=
$identifier
->remove_class(
$class
);
Removes model for class $class
.
$identifier
->add_class(
"a"
,
"model.a1.gz"
);
$identifier
->remove_class(
"a"
) .
"\n"
;
# prints out 1
$identifier
->remove_class(
"a"
) .
"\n"
;
# prints out 0 - class a was already removed
get_classes
my
\
@classes
=
$identifier
->get_classes();
Returns all registered classes.
identify_file
my
$result
=
$identifier
->identify_file(
$file
)
Identifies class for file $file
.
It returns undef if
$file
is undef.It croaks if the file
$file
does not exist or is not readable.Otherwise look for more details at method "identify_handle".
identify_string
my
$result
=
$identifier
->identify_string(
$string
)
Identifies class for string $string
.
It returns undef if
$string
is undef.Otherwise look for more details at method "identify_handle".
identify_handle
my
$result
=
$identifier
->identify_handle(
$fh
)
Identifies class for file handle $fh
and returns:
It returns undef if
$fh
is undef.It croaks if the
$fh
is not file handle.It returns array reference in format [ ['class1', score1], ['class2', score2], ...] sorted according to score descendently, so the most probable class is the first.
SEE ALSO
Identifier with pretrained models for language identification is Lingua::YALI::LanguageIdentifier.
Builder for these models is Lingua::YALI::Builder.
There is also command line tool yali-identifier with similar functionality.
Source codes are available at https://github.com/martin-majlis/YALI.
AUTHOR
Martin Majlis <martin@majlis.cz>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License
AUTHOR
Martin Majlis <martin@majlis.cz>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License