Changes for version 0.006 - 2012-12-27

  • Stanislaw Pusep <creaktive@gmail.com>
    • minor fixes
    • documented multibyte parsing tricks
    • break test
    • leak test
    • more optimizations
    • better utf8/latin1 tests
    • updated benchmark results
    • updated benchmark results
    • added cosine_sim utility
    • reallocation fixed
    • graduated the uniq_wc tool
    • make use of PerlIO::mmap layer
    • examples cleanup
    • optimizations
    • implemented variable codetable (+ raw ASCII support)
    • variable size arrays
    • Dist::Zilla profile update
  • Stanislaw Pusep <stanislav.poussep@buscapecompany.com>
    • File::Slurp => File::Map

Documentation

compute cosine similarity between two documents
uses MinHash & SpeedyFx to compare large text data
efficiently count unique tokens from a file

Modules

tokenize/hash large amount of strings efficiently

Examples