NAME

dcdb-query.perl - query a DiaColloDB diachronic collocation database

SYNOPSIS

 dcdb-query.perl [OPTIONS] DBURL QUERY1 [QUERY2]

 General Options:
   -help                 # display a brief usage summary
   -version              # display program version
   -[no]time             # do/don't report operation timing (default=do)
   -iters NITERS         # benchmark NITERS iterations of query

 Query Options:
   -col, -ug, -ddc, -tdf # select profile type (collocations, unigrams, ddc client, tdf matrix; default=-col)
   -(a|b)?date DATES     # set target DATE or /REGEX/ or MIN-MAX
   -(a|b)?slice SLICE    # set target date slice (default=1)
   -groupby GROUPBY      # set result aggregation (default=l)
   -kbest KBEST          # return only KBEST items per date-slice (default=10)
   -nokbest              # disable k-best pruning
   -cutoff CUTOFF        # set minimum score for returned items (default=none)
   -nocutoff             # disable cutoff pruning
   -[no]global           # do/don't trim profiles globally (vs. locally by date-slice; default=don't)
   -[no]strings          # debug: do/don't stringify returned profile (default=do)
   -1pass , -2pass       # do/don't use fast but incorrect 1-pass method (default=don't)
   -O  KEY=VALUE         # set DiaColloDB::Client option
   -SO KEY_=VALUE        # set sub-client option (for list:// clients)

 Scoring Options:
   -f                    # score by raw frequency
   -lf                   # score by log-frequency
   -fm                   # score by frequency per million tokens
   -lfm                  # score by log-frequency per million tokens
   -milf                 # score by pointwise mutual information x log-frequency product
   -mi1                  # score by raw pointwise mutual information
   -mi3                  # score by pointwise mutual information^3 (Rychlý 2008)
   -ld                   # score by scaled log-Dice coefficient (Rychlý 2008)
   -ll                   # score by 1-sided log-likelihood ratio (Evert 2008)
   -eps EPS              # smoothing constant (default=0)
   -diff DIFFOP          # diff operation (adiff|diff|sum|min|max|avg|havg|gavg; default=adiff)

 I/O Options:
   -user USER[:PASSWD]   # user credentials for HTTP queries
   -text                 # use text output (default)
   -json                 # use json output
   -null                 # don't output profile at all
   -[no]pretty           # do/don't pretty-print json output (default=do)
   -log-level LEVEL      # set minimum DiaColloDB log-level

 Arguments:
   DBURL                # DB URL (file://, rcfile://, http://, or list://)
   QUERY1               # space-separated target1 string(s) LIST or /REGEX/ or DDC-query
   QUERY2               # space-separated target2 string(s) LIST or /REGEX/ or DDC-query (for diff profiles)

 Grouping and Filtering:
   GROUPBY is a space- or comma-separated list of the form ATTR1[=FILTER1] ..., where:
   - ATTR is the name or alias of a supported attribute (e.g. 'lemma', 'pos', etc.), and
   - FILTER is either a |-separated LIST of literal values or a /REGEX/[gimsadlu]*

 Diff Operations:
   DIFF is one of: adiff diff sum min max avg havg gavg lavg

DESCRIPTION

dcdb-query.perl is a command-line utility for querying a DiaColloDB diachronic collocation database.

OPTIONS AND ARGUMENTS

Arguments

DBURL

URL identifying the DiaColloDB database to be queried, in a form accepted by DiaColloDB::Client->open(). In particular, DBURL can be a local DiaColloDB database directory, in which case it will be queried via the DiaColloDB::Client::file class. A local DiaColloDB::Client configuration file RCFILE can be specified using the rcfile://RCFILE syntax.

QUERY1

Primary target query as accepted by DiaColloDB->parseQuery, usually a space-separated of target string(s) LIST, a target /REGEX/ or a DDC-query string.

QUERY2

Optional comparsion target query. If specified, a "diff" profile is computed as for DiaColloDB::compare(), otherwise a unary profile is computed as for DiaColloDB::profile().

General Options

-help

Display a brief help message and exit.

-version

Display version information and exit.

-time
-notime

Do/don't report operation timing (default=do).

-iters NITERS

Benchmark NITERS iterations of query (default=1).

Query Options

-col

Request "collocation" profiling via DiaColloDB::Relation::Cofreqs (default).

-ug

Request "unigram" profiling via DiaColloDB::Relation::Unigrams

-ddc

Request profiling via DiaColloDB::Relation::DDC. Slow and generally inefficient, but very flexible. Requires that the underlying DB be associated with a DDC server, e.g. by means of the ddcServer DB key.

-tdf

Request (term x document) matrix profiling via DiaColloDB::Relation::TDF. Requires TDF support in the underlying DB.

-date DATES
-adate DATES

Set primary target date DATE or /REGEX/ or date-range MIN:MAX. Either MIN or or MAX may be an asterisk (*) to indicate the minimum rsp. maximum date indexed in the corpus.

-bdate DATES

As for -adate, but specifies date for the comparison target.

-slice SLICE
-aslice SLICE

Set the primary target date slice (default=1).

-bslice SLICE

Set the comparison target date slice (default=1).

-groupby GROUPBY

Aggregate collocates by the attributes specified in GROUPBY, which should be a list of indexed attributes with optional restriction clauses as accepted by DiaColloDB->parseQuery, or (in -ddc mode only) a DDC count-by list enclosed in square brackets [ l_countkeys ].

-kbest KBEST

Return only KBEST items per date-slice (default=10).

-nokbest

Disable k-best pruning.

-cutoff CUTOFF

Set minimum score for returned items (unary profiles only; default=none).

-nocutoff

Disable cutoff pruning.

-[no]global

Do/don't trim profiles globally (vs. locally by date-slice; default=don't).

-[no]strings

Debug: do/don't stringify returned profile (default=do).

-1pass

Use fast but incorrect single-pass frequency acquisition method.

-2pass

Use slower but correct 2-pass frequency acqusition method (default).

-O KEY=VALUE

Set a DiaColloDB::Client option.

Scoring Options

See DiaColloDB::Profile for supported scoring functions.

-f

score by raw frequency

-lf

score by log-frequency

-fm

score by frequency per million tokens

-lfm

score by log-frequency per million tokens

-milf

score by pointwise mutual information x log-frequency product

-mi1

score by raw pointwise mutual information

-mi3

score by pointwise mutual information^3 (Rychlý 2008)

-ld

score by scaled log-Dice coefficient (Rychlý 2008; default)

-ll

score by 1-sided log-likelihood ratio (Evert 2008)

-eps EPS

score function smoothing constant (default=0.5)

-diff DIFFOP

diff operation to use for comparison profiles. Known values:

 adiff  # absolute score difference (default)
 diff   # raw score difference
 sum    # sum
 min    # minimum
 max    # maximum
 avg    # average
 havg   # pseudo-harmonic average
 gavg   # pseudo-geometric average

I/O and Logging Options

-user USER[:PASSWD]

Specify user credentials for HTTP queries

-text

generate text output (default).

-json

generate json output.

-html

generate HTML output.

-null

don't output profile data at all (for timing and debugging).

-[no]pretty

do/don't pretty-print json output (default=do)

-score-format FORMAT

sprintf-format for score formatting, used by text and HTML output modes.

-log-level LEVEL

set minimum DiaColloDB::Logger log-level.

BUGS AND LIMITATIONS

Probably many.

ACKNOWLEDGEMENTS

Perl by Larry Wall.

AUTHOR

Bryan Jurish <moocow@cpan.org>

SEE ALSO

DiaColloDB(3pm), dcdb-create.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), perl(1).