NAME
dcdb-query.perl - query a DiaColloDB diachronic collocation database
SYNOPSIS
dcdb-query.perl [OPTIONS] DBURL QUERY1 [QUERY2]
General Options:
-help # display a brief usage summary
-version # display program version
-[no]time # do/don't report operation timing (default=do)
-iters NITERS # benchmark NITERS iterations of query
Query Options:
-col, -ug, -ddc, -tdf # select profile type (collocations, unigrams, ddc client, tdf matrix; default=-col)
-(a|b)?date DATES # set target DATE or /REGEX/ or MIN-MAX
-(a|b)?slice SLICE # set target date slice (default=1)
-groupby GROUPBY # set result aggregation (default=l)
-kbest KBEST # return only KBEST items per date-slice (default=10)
-nokbest # disable k-best pruning
-cutoff CUTOFF # set minimum score for returned items (default=none)
-nocutoff # disable cutoff pruning
-[no]global # do/don't trim profiles globally (vs. locally by date-slice; default=don't)
-[no]strings # debug: do/don't stringify returned profile (default=do)
-1pass , -2pass # do/don't use fast but incorrect 1-pass method (default=don't)
-O KEY=VALUE # set DiaColloDB::Client option
-SO KEY_=VALUE # set sub-client option (for list:// clients)
Scoring Options:
-f # score by raw frequency
-lf # score by log-frequency
-fm # score by frequency per million tokens
-lfm # score by log-frequency per million tokens
-milf # score by pointwise mutual information x log-frequency product
-mi1 # score by raw pointwise mutual information
-mi3 # score by pointwise mutual information^3 (Rychlý 2008)
-ld # score by scaled log-Dice coefficient (Rychlý 2008)
-ll # score by 1-sided log-likelihood ratio (Evert 2008)
-eps EPS # smoothing constant (default=0)
-diff DIFFOP # diff operation (adiff|diff|sum|min|max|avg|havg|gavg; default=adiff)
I/O Options:
-user USER[:PASSWD] # user credentials for HTTP queries
-text # use text output (default)
-json # use json output
-null # don't output profile at all
-[no]pretty # do/don't pretty-print json output (default=do)
-log-level LEVEL # set minimum DiaColloDB log-level
Arguments:
DBURL # DB URL (file://, rcfile://, http://, or list://)
QUERY1 # space-separated target1 string(s) LIST or /REGEX/ or DDC-query
QUERY2 # space-separated target2 string(s) LIST or /REGEX/ or DDC-query (for diff profiles)
Grouping and Filtering:
GROUPBY is a space- or comma-separated list of the form ATTR1[=FILTER1] ..., where:
- ATTR is the name or alias of a supported attribute (e.g. 'lemma', 'pos', etc.), and
- FILTER is either a |-separated LIST of literal values or a /REGEX/[gimsadlu]*
Diff Operations:
DIFF is one of: adiff diff sum min max avg havg gavg lavg
DESCRIPTION
dcdb-query.perl is a command-line utility for querying a DiaColloDB diachronic collocation database.
OPTIONS AND ARGUMENTS
Arguments
- DBURL
-
URL identifying the DiaColloDB database to be queried, in a form accepted by DiaColloDB::Client->open(). In particular, DBURL can be a local DiaColloDB database directory, in which case it will be queried via the DiaColloDB::Client::file class. A local DiaColloDB::Client configuration file RCFILE can be specified using the rcfile://RCFILE syntax.
- QUERY1
-
Primary target query as accepted by DiaColloDB->parseQuery, usually a space-separated of target string(s)
LIST
, a target/REGEX/
or a DDC-query string. - QUERY2
-
Optional comparsion target query. If specified, a "diff" profile is computed as for DiaColloDB::compare(), otherwise a unary profile is computed as for DiaColloDB::profile().
General Options
- -help
-
Display a brief help message and exit.
- -version
-
Display version information and exit.
- -time
- -notime
-
Do/don't report operation timing (default=do).
- -iters NITERS
-
Benchmark NITERS iterations of query (default=1).
Query Options
- -col
-
Request "collocation" profiling via DiaColloDB::Relation::Cofreqs (default).
- -ug
-
Request "unigram" profiling via DiaColloDB::Relation::Unigrams
- -ddc
-
Request profiling via DiaColloDB::Relation::DDC. Slow and generally inefficient, but very flexible. Requires that the underlying DB be associated with a DDC server, e.g. by means of the
ddcServer
DB key. - -tdf
-
Request (term x document) matrix profiling via DiaColloDB::Relation::TDF. Requires TDF support in the underlying DB.
- -date DATES
- -adate DATES
-
Set primary target date
DATE
or/REGEX/
or date-rangeMIN:MAX
. EitherMIN
or orMAX
may be an asterisk (*
) to indicate the minimum rsp. maximum date indexed in the corpus. - -bdate DATES
-
As for -adate, but specifies date for the comparison target.
- -slice SLICE
- -aslice SLICE
-
Set the primary target date slice (default=1).
- -bslice SLICE
-
Set the comparison target date slice (default=1).
- -groupby GROUPBY
-
Aggregate collocates by the attributes specified in GROUPBY, which should be a list of indexed attributes with optional restriction clauses as accepted by DiaColloDB->parseQuery, or (in -ddc mode only) a DDC count-by list enclosed in square brackets
[ l_countkeys ]
. - -kbest KBEST
-
Return only KBEST items per date-slice (default=10).
- -nokbest
-
Disable k-best pruning.
- -cutoff CUTOFF
-
Set minimum score for returned items (unary profiles only; default=none).
- -nocutoff
-
Disable cutoff pruning.
- -[no]global
-
Do/don't trim profiles globally (vs. locally by date-slice; default=don't).
- -[no]strings
-
Debug: do/don't stringify returned profile (default=do).
- -1pass
-
Use fast but incorrect single-pass frequency acquisition method.
- -2pass
-
Use slower but correct 2-pass frequency acqusition method (default).
- -O KEY=VALUE
-
Set a DiaColloDB::Client option.
Scoring Options
See DiaColloDB::Profile for supported scoring functions.
- -f
-
score by raw frequency
- -lf
-
score by log-frequency
- -fm
-
score by frequency per million tokens
- -lfm
-
score by log-frequency per million tokens
- -milf
-
score by pointwise mutual information x log-frequency product
- -mi1
-
score by raw pointwise mutual information
- -mi3
-
score by pointwise mutual information^3 (Rychlý 2008)
- -ld
-
score by scaled log-Dice coefficient (Rychlý 2008; default)
- -ll
-
score by 1-sided log-likelihood ratio (Evert 2008)
- -eps EPS
-
score function smoothing constant (default=0.5)
- -diff DIFFOP
-
diff operation to use for comparison profiles. Known values:
adiff # absolute score difference (default) diff # raw score difference sum # sum min # minimum max # maximum avg # average havg # pseudo-harmonic average gavg # pseudo-geometric average
I/O and Logging Options
- -user USER[:PASSWD]
-
Specify user credentials for HTTP queries
- -text
-
generate text output (default).
- -json
-
generate json output.
- -html
-
generate HTML output.
- -null
-
don't output profile data at all (for timing and debugging).
- -[no]pretty
-
do/don't pretty-print json output (default=do)
- -score-format FORMAT
-
sprintf-format for score formatting, used by text and HTML output modes.
- -log-level LEVEL
-
set minimum DiaColloDB::Logger log-level.
BUGS AND LIMITATIONS
Probably many.
ACKNOWLEDGEMENTS
Perl by Larry Wall.
AUTHOR
Bryan Jurish <moocow@cpan.org>
SEE ALSO
DiaColloDB(3pm), dcdb-create.perl(1), dcdb-info.perl(1), dcdb-export.perl(1), perl(1).