Alberto Simões 🐪


cwb-align-import - Import existing sentence alignment into a CWB corpus


  cwb-align-import [options] <alignment_beads.txt>


  -r <dir>, --registry=<dir>    use registry directory <dir>
  -i, --inverse                 encode inverse alignment (target -> source)
  -p, --prune                   ignore alignment beads with ID errors
  -e, --empty                   allow 1:0 and 0:1 alignments (not encoded)
  -v, --verbose                 show progress messages during processing
  -h, --help                    display short help page

  -nh, --no-header              alignment file without header; must specify:
  -l1 <name>, --source=<name>   CWB name of source corpus
  -l2 <name>, --target=<name>   CWB name of target corpus
  -s <att>,   --grid=<att>      alignment grid (s-attribute, usually sentences)
  -k <spec>,  --key=<spec>      pattern for constructing unique sentence IDs


Short description of what the module does


--help, -h

Show usage and options summary.

--verbose, -v

Verbose mode (shows progress messages during processing).

==item --registry=dir, -r dir

Locate corpora in CWB registry directory dir, overriding the default directory and the environment variable CORPUS_REGISTRY.

--inverse, -i

Encode inverse alignment (from target language to source language).

--prune, -p

Automatically ignore alignment beads if sentence IDs are not found, either in the source or the target corpus. Without -p, cwb-align-import will abort with an error message in this case. Note that the -p option implies -e (see below).

--empty, -e

Allow 1:0 and 0:1 alignment beads, which will be silently ignored (without -e, they cause a fatal error).

--no-header, -nh

Alignment file does not contain a header line. In this case, the header information must be provided on the command line with the -l1, -l2, -s and -k flags (documented below).

--source=ID, -l1 ID

CWB corpus ID of the source language corpus. Overrides information in alignment file header, if present.

--target=ID, -l2 ID

CWB corpus ID of the target language corpus. Overrides information in alignment file header, if present.

--grid=attribute, -s attribute

CWB attribute used as alignment grid (i.e., each alignment bead links n grid regions in the source language to m grid regions in the target language). For the most common case of sentence alignment, the grid attribute will usually be s. Note that the same attribute is used for both source and target language corpus.

--key=pattern, -k pattern



Stefan Evert <>


Copyright (C) 2007-2010 Stefan Evert [http::/]

This software is provided AS IS and the author makes no warranty as to its use and performance. You may use the software, redistribute and modify it under the same terms as Perl itself.