The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

PDL::Ngrams - N-Gram utilities for PDL

SYNOPSIS

 use PDL;
 use PDL::Ngrams;

 ##---------------------------------------------------------------------
 ## Basic Data
 $toks = rint(10*random(10));

 ##---------------------------------------------------------------------
 ## ... stuff happens

DESCRIPTION

PDL::Ngrams provides basic utilities for tracking N-grams over PDL vectors.

FUNCTIONS

Counting N-Grams over PDLs

ng_cofreq

  Signature: (toks(@adims,N,NToks); %args)

  Returns: (int [o]ngramfreqs(NNgrams); [o]ngramids(@adims,N,NNgrams))

Keyword arguments (optional):

  norotate => $bool,                      ##-- if true, $toks() will NOT be rotated along $N
  boffsets => $boffsets(NBlocks)          ##-- block-offsets in $toks() along $NToks
  delims   => $delims(@adims,N,NDelims)   ##-- delimiters to splice in at block boundaries

Count co-occurrences (esp. N-Grams) over a token vector $toks. This function really just wraps ng_delimit(), ng_rotate(), vv_qsortvec(), and rlevec().

ng_rotate

  Signature: (toks(@adims,N,NToks); [o]rtoks(@adims,N,NToks-N+1))

Create a co-occurrence matrix by rotating a (delimited) token vector $toks(). Returns a matrix $rtoks() suitable for passing to ng_cofreq().

Delimiter Insertion and Removal

The following functions can be used to add or remove delimiters to a PDL vector. This can be useful to add or remove beginning- and/or end-of-word markers to rsp. from a PDL vector, before rsp. after constructing a vector of N-gram vectors.

ng_delimit

  Signature: (toks(NToks); indx boffsets(NBlocks); delims(NDelims); [o]dtoks(NDToks))

Add block-delimiters (e.g. BOS,EOS) to a vector of raw tokens.

See "ng_delimit" in PDL::Ngrams::ngutils.

ng_undelimit

  Signature: (dtoks(NDToks); indx boffsets(NBlocks); int NDelims(); [o]toks(NToks))

Remove block-delimiters (e.g. BOS,EOS) from a vector of delimited tokens.

See "ng_undelimit" in PDL::Ngrams::ngutils.

Low-Level Functions

Some additional low-level functions are provided in the PDL::Ngrams::ngutils package. See PDL::Ngrams::ngutils for details.

ACKNOWLEDGEMENTS

perl by Larry Wall.

AUTHOR

Bryan Jurish <moocow@cpan.org>

PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.

COPYRIGHT

Copyright (c) 2007-2015, Bryan Jurish. All rights reserved.

This package is free software. You may redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl(1), PDL(3perl), PDL::Ngrams::ngutils(3perl)