The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

nat-ptd - concentrates a set of PTD commands in a common interface

SYNOPSIS

  nat-ptd [-v] <command> [command-args]

DESCRIPTION

nat-ptd supports the following commands. Most places where a PTD needs to be specified, you can use a bziped2 PTD as far as the filename ends in bz2.

help

The method can be invoked without arguments, and a list of available commands will be printed.

If an optional parameter with the name of a command is supplied, it prints detailed help for it (from this man-page).

    nat-ptd help [command-name]

intersect

Intersects domains from supplied PTDs. Keep lowerer counts and translation probabilities.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

toSQLite

This option can be used to convert a PTD to the SQLite format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toDmp

This option can be used to convert a PTD to the Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toJson

This option can be used to convert a PTD to a JSON File. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toDmpBz

This option can be used to convert a PTD to a Bzipped Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toDmpXz

This option can be used to convert a PTD to a XZipped Dumper format. First argument is the PTD filename. Second, optional, argument can be specified as the output filename.

toTSV

This option can be used to export a PTD to a plain text file using a Tab Separated Format. The first column represent each term, the second column the possible translation, and the third column the probability of this possible translation. This file can be directly used as a glossary in OmegaT.

Usage:

  ptd-nat toTSV [-m=<p>] <ptd-filename> <dst-filename>

The following options can be used:

-m=p

Minimum probabilty for translation to be exported. p must be a probability in the interval [0,1] (default: 0.5).

toStarDict

FIXME

Usage:

  ptd-nat toStarDict [-m=<p>] [-d=<directory>] <ptd-filename> <dst-dict-name>

The following options can be used:

-m=p

Minimum probabilty for translation to be exported. p must be a probability in the interval [0,1] (default: 0.4).

-d=directory

Destination directory for the created dictinary (default: .).

stats

Prints some basic statistics about a PTD.

compare

Given two PTD, print some basic statistics comparing their size, domains, etc.

query

This command allows you to query interactively a PTD.

grep

Greps entries matching a specific pattern from a PTD. Supply a pattern and a PTD file. By default it dumps a subset PTD with entries that match. With the -compact option it will print a small table with the entry's best translation.

    nat-ptd grep [-compact] [-o=outfile] <pattern> <ptd-file>

compose

This method receives a two or more dictionaries.

When receiving a pair of dictionaries (first dictionary target language should be the same as the second dictionary source language), composes them, resulting a PTD from first dictionary source language to second dictionary target language.

This method can be used with more than two dictionaries for a full transitive dictionary computation.

You can specify the output filename with the -o switch.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

filter

This method filters a dictionary (or dictionary pair) accordingly with some default values (that can be adjusted).

If the supplied name is a directory, it is supposed to be of a NATools object (a NATools alignment folder). In this case, files source-target.dmp and target-source.dmp are searched inside it.

If the supplied name is not a directory, it is suppoed to be a name of a PTD dump file. This command will check if it is alone (just a direction) or if a second filename was supplied. If two were supplied, they are considered bidirectional (source-target and target-source).

Therefore, three possible usages:

    nat-ptd filter <natools-obj-dir>
    nat-ptd filter <file.dmp>
    nat-ptd filter <file-s-t.dmp> <file-t-s.dmp>

The following switchs can be used:

-numbers

By default the filtering will remove terms (entries and translations) with numbers (only numbers, with possible digit separators: space, comma, point, colon). Use this switch to force them to be preserved.

-symbols

Any other term type that is not a standard word (with possible dash or apostrophe) or a number (as described above), is considered to include strange symbols, and will be ignored. Use this switch to force them to be preserved.

-none

By default, the 'no translation', also known as 'none', is removed. You can force it to be preserved with this switch.

-occs=n

Defines the minimum occurrence count for entries to be preserved. By default the used value is 2 (that is, entries with 1 occurrence are discarded). Use 0 to not discard any entry by occurrence count.

-prob=p

Defines the minimum probability for translations to be preserved. By default the value is 1% (0.01). Define the value as 0 to preserve all translations.

-bidir

Defines if the filtering should check for bidirectional translations, that is, preserve only terms which translations' translations' include that term. Mathematically, preserve t if

    t   in   Translations ( Translations ( t ) )

Note that this is only available for NATool objects or dictionary pairs. By default this switch is ON. Turn it OFF assigning a 0 to the switch: -bidir=0

Also, the -o switch can be used to define an output filename. When using a pair of dictionaries, specify the output filenames separated by a comma: -o=outputfile1,outputfile2.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

lowercase

This method recompute the probabilities for a dictionary, lowercasing all terms, and summing up occurrences, and recomputing probabilities.

    nat-ptd lowercase [-o=outputfile] <ptd-filename>

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

reprob

This method recompute the probabilities from a dictionary. It sums up all possible translations probabilities, consider that total to be 100% (1), and recomputes each probability accordingly.

It takes a required argument, the name of the PTD dump file. Optionally, you can supply an output file with the -o switch.

    nat-ptd reprob [-o=outputfile] <ptd-filename>

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

add

Adds two or more PTD files into a single PTD file. They should have the same source and target language. You can use the -o switch to specify an output filename.

As of recent NATools versions, you can supply an option -type to specify the type of output file (dmp or sqlite are supported, and dmp is the default).

ucts

Create unambiguous-concept traslation sets.

  ptd-nat ucts [-m=<number>] [-M=<number>] [-p=<probabilty>] [-P=<probability>] <ptd-filename> <ptd-filename>

The following options can be used:

-m=n

Mininum number of occurences of each token. n must be an integer (default: 10).

-M=n

Manixum number of occurences of each token. n must be an integer (default: 100).

-p=p

Minimum probabilty for translation. p must be a probability in the interval [0,1] (default: 0.2).

-P=p

Minimum probabilty for the inverse translations. p must be a probability in the interval [0,1] (default: 0.8).

-r=0|1

Print rank (default: 0).

bws

Create bi-words sets.

  ptd-nat bws [-m=<number>] [-p=<probabilty>] <ptd-filename> <ptd-filename>

The following options are available:

-m=n

Mininum number of occurences of each token. n must be an integer (default: 10).

-p=p

Minimum probabilty for translation. p must be a probability in the interval [0,1] (default: 0.4).

-r=0|1

Print rank (default: 0).

SEE ALSO

NATools, perl(1)

AUTHOR

Alberto Manuel Brandão Simões, <ambs@cpan.org>

Nuno Alexandre Carvalho, <smash@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010-2014 by Natura Project