The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

nat-ptd - concentrates a set of PTD commands in a common interface

SYNOPSIS

  nat-ptd [-v] <command> [command-args]

DESCRIPTION

nat-ptd supports the following commands. Most places where a PTD needs to be specified, you can use a bziped2 PTD as far as the filename ends in bz2.

help

The method can be invoked without arguments, and a list of available commands will be printed.

If an optional parameter with the name of a command is supplied, it prints detailed help for it (from this man-page).

    nat-ptd help [command-name]

intersect

Intersects domains from supplied PTDs. Keep lowerer counts and translation probabilities.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

toSQLite

This option can be used to convert a PTD to the SQLite format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toDmp

This option can be used to convert a PTD to the Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toDmpBz

This option can be used to convert a PTD to a Bzipped Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

stats

Prints some basic statistics about a PTD.

compare

Given two PTD, print some basic statistics comparing their size, domains, etc.

query

This command allows you to query interactively a PTD.

grep

Greps entries matching a specific pattern from a PTD. Supply a pattern and a PTD file. By default it dumps a subset PTD with entries that match. With the -compact option it will print a small table with the entry's best translation.

    nat-ptd grep [-compact] [-o=outfile] <pattern> <ptd-file>

compose

This method receives a two or more dictionaries.

When receiving a pair of dictionaries (first dictionary target language should be the same as the second dictionary source language), composes them, resulting a PTD from first dictionary source language to second dictionary target language.

This method can be used with more than two dictionaries for a full transitive dictionary computation.

You can specify the output filename with the -o switch.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

filter

This method filters a dictionary (or dictionary pair) accordingly with some default values (that can be adjusted).

If the supplied name is a directory, it is supposed to be of a NATools object (a NATools alignment folder). In this case, files source-target.dmp and target-source.dmp are searched inside it.

If the supplied name is not a directory, it is suppoed to be a name of a PTD dump file. This command will check if it is alone (just a direction) or if a second filename was supplied. If two were supplied, they are considered bidirectional (source-target and target-source).

Therefore, three possible usages:

    nat-ptd filter <natools-obj-dir>
    nat-ptd filter <file.dmp>
    nat-ptd filter <file-s-t.dmp> <file-t-s.dmp>

The following switchs can be used:

-numbers

By default the filtering will remove terms (entries and translations) with numbers (only numbers, with possible digit separators: space, comma, point, colon). Use this switch to force them to be preserved.

-symbols

Any other term type that is not a standard word (with possible dash or apostrophe) or a number (as described above), is considered to include strange symbols, and will be ignored. Use this switch to force them to be preserved.

-none

By default, the 'no translation', also known as 'none', is removed. You can force it to be preserved with this switch.

-occs=n

Defines the minimum occurrence count for entries to be preserved. By default the used value is 2 (that is, entries with 1 occurrence are discarded). Use 0 to not discard any entry by occurrence count.

-prob=p

Defines the minimum probability for translations to be preserved. By default the value is 1% (0.01). Define the value as 0 to preserve all translations.

-bidir

Defines if the filtering should check for bidirectional translations, that is, preserve only terms which translations' translations' include that term. Mathematically, preserve t if

    t   in   Translations ( Translations ( t ) )

Note that this is only available for NATool objects or dictionary pairs. By default this switch is ON. Turn it OFF assigning a 0 to the switch: -bidir=0

Also, the -o switch can be used to define an output filename. When using a pair of dictionaries, specify the output filenames separated by a comma: -o=outputfile1,outputfile2.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

lowercase

This method recompute the probabilities for a dictionary, lowercasing all terms, and summing up occurrences, and recomputing probabilities.

    nat-ptd lowercase [-o=outputfile] <ptd-filename>

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

reprob

This method recompute the probabilities from a dictionary. It sums up all possible translations probabilities, consider that total to be 100% (1), and recomputes each probability accordingly.

It takes a required argument, the name of the PTD dump file. Optionally, you can supply an output file with the -o switch.

    nat-ptd reprob [-o=outputfile] <ptd-filename>

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

add

Adds two or more PTD files into a single PTD file. They should have the same source and target language. You can use the -o switch to specify an output filename.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

ucts

Create unambiguous-concept traslation sets.

  ptd-nat ucts [-m=<number>] [-M=<number>] [-p=<probabilty>] [-P=<probability>] <ptd-filename> <ptd-filename>

The following options can be used:

-m=n

Mininum number of occurences of each token. n must be an integer (default: 10).

-M=n

Manixum number of occurences of each token. n must be an integer (default: 100).

-p=p

Minimum probabilty for translation. p must be a probability in the interval [0,1] (default: 0.2).

-P=p

Minimum probabilty for the inverse translations. p must be a probability in the interval [0,1] (default: 0.8).

-r=0|1

Print rank (default: 0).

bws

Create bi-words sets.

  ptd-nat bws [-m=<number>] [-p=<probabilty>] <ptd-filename> <ptd-filename>

The following options are available:

-m=n

Mininum number of occurences of each token. n must be an integer (default: 10).

-p=p

Minimum probabilty for translation. p must be a probability in the interval [0,1] (default: 0.4).

-r=0|1

Print rank (default: 0).

SEE ALSO

NATools, perl(1)

AUTHOR

Alberto Manuel Brandão Simões, <ambs@cpan.org>

Nuno Alexandre Carvalho, <smash@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010-2012 by Natura Project