The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

analyze - TF-IDF Analyze a corpus

SYNOPSIS

analyze --dir=/some/corpus [options]

Options:

  --help      help message
  --man       full documentation
  --dir       corpus of text documents
  --size      ngram size
  --top       top TF-IDF ngrams
  --stop      use stopwords
  --phrase    search phrase
  --type      file extension

Examples:

  perl analyze --dir=/Users/you/Documents/lit/inaugural --top=5
  perl analyze --dir=/Users/you/Documents/lit/inaugural --phrase='public good'
  perl analyze --dir=/Users/you/Documents/lit/inaugural --dir=/Users/you/Documents/lit/SOTU --top=5
  perl analyze --dir=/Users/you/Documents/lit/Shakespeare --size=3 --top=5
  perl analyze --dir=/Users/you/perl5/perlbrew/perls/perl-5.27.7/lib/site_perl/5.27.7/Music --size=1 --type=pm

OPTIONS

--help

Brief help message

--man

Full manual page

--dir

Required corpus list of text documents

--size

Ngram phrase size - Default = 2

--top

Show the top N ngrams seen. Default = 0

--stop

Constrain the ngrams by excluding stopwords. Default = 1

--phrase

Search the corpus for the phrase and its IF-IDF values. Default = ''

--type

Read copus files of this file extension. Default = 'txt'

DESCRIPTION

This program analyzes the given corpus with the TF-IDF measure for ngrams.