NAME

Algorithm::NGram

SYNPOSIS

use Algorithm::NGram;
my $ng = Algorithm::NGram->new(ngram_width => 3); # use trigrams

# feed in text
$ng->add_text($text1); # analyze $text1
$ng->add_text($text2); # analyze $text2

# feed in arbitrary sequence of tokens
$ng->add_start_token;
$ng->add_tokens(qw/token1 token2 token3/);
$ng->add_end_token;

my $output = $ng->generate_text;

DESCRIPTION

This is a module for analyzing token sequences with n-grams. You can use it to parse a block of text, or feed in your own tokens. It can generate new sequences of tokens from what has been fed in.

EXPORT

None.

METHODS

new

Create a new n-gram analyzer instance.

Options:

ngram_width: This is the "window size" of how many tokens the analyzer will keep track of. A ngram_width of two will make a bigram, a ngram_width of three will make a trigram, etc...

ngram_width

Returns token window size (e.g. the "n" in n-gram)

token_table

Returns n-gram table

add_text

Splits a block of text up by whitespace and processes each word as a token. Automatically calls add_start_token() at the beginning of the text and add_end_token() at the end.

add_tokens

Adds an arbitrary list of tokens.

add_start_token

Adds the "start token." This is useful because you often will want to mark the beginnings and ends of a token sequence so that when generating your output the generator will know what tokens start a sequence and when to end.

add_end_token

Adds the "end token." See add_start_token().

analyze

Generates an n-gram frequency table. Returns a hashref of N => tokens => count, where N is the number of tokens (will be from 2 to ngram_width). You will not normally need to call this unless you want to get the n-gram frequency table.

generate_text

After feeding in text tokens, this will return a new block of text based on whatever text was added.

generate

Generates a new sequence of tokens based on whatever tokens have previously been fed in.

next_tok

Given a list of tokens, will pick a possible token to come next.

token_lookup

Returns a hashref of the counts of tokens that follow a sequence of tokens.

token_key

Serializes a sequence of tokens for use as a key into the n-gram table. You will not normally need to call this.

serialize

Returns the tokens and n-gram (if one has been generated) in a string

deserialize($string)

Deserializes a string and returns an Algorithm::NGram instance

AUTHOR

Mischa Spiegelmock, <mspiegelmock@gmail.com>

COPYRIGHT AND LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Algorithm::NGram, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::NGram

CPAN shell

perl -MCPAN -e shell
install Algorithm::NGram

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)