The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::GaleChurch - Perl extension for aligning translated sentences

SYNOPSIS

    use Text::GaleChurch;

    my @eParagraph = ();
    push @eParagraph, "According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting the growing popularity of these products.";
    push @eParagraph, "Cola drink manufacturers in particular achieved above-average growth rates.";
    push @eParagraph, "The higher turnover was largely due to an increase in the sales volume.";
    push @eParagraph, "Employment and investment levels also climbed.";
    push @eParagraph, "Following a two-year transitional period, the new Foodstuffs Ordinance for Mineral Water came into effect on April 1, 1988.";
    push @eParagraph, "Specifically, it contains more stringent requirements regarding quality consistency and purity guarantees.";

    my @fParagraph = ();
    push @fParagraph, "Quant aux eaux minérales et aux limonades, elles rencontrent toujours plus d'adeptes.";
    push @fParagraph, "En effet, notre sondage fait ressortir des ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment.";
    push @fParagraph, "La progression des chiffres d'affaires résulte en grande partie de l'accroissement du volume des ventes.";
    push @fParagraph, "L'emploi et les investissements ont également augmenté.";  
    push @fParagraph, "La nouvelle ordonnance fédérale sur les denrées alimentaires concernant entre autres les eaux minérales, entrée en vigueur le 1er avril 1988 après une période transitoire de deux ans, exige surtout une plus grande constance dans la qualité et une garantie de la pureté.";

    my $eAlignedRef,$fAlignedRef; 
    ($eAlignedRef,$fAlignedRef) = Text::GaleChurch::align(\@eParagraph,\@fParagraph);

    for(my $i=0;$i<scalar(@{$eAlignedRef});$i++) {
        print "E:",$eAlignedRef->[$i],"\t is aligned to\tF:",$fAlignedRef->[$i],"\n";
    }

DESCRIPTION

This module aligns the sentences of paragraphs in two languages in a way that the aligned sentences are likely translations of each other. This is useful for applications in machine translation and other applications where sentence-aligned parallel corpora are needed. The algorithm used for this is described in the paper "A Program for Aligning Sentences in Bilingual Corpora" by William A. Gale and Kenneth W. Church (Computational Linguistics, 1994). The input to the align function are two arrays with sentences from the source language and target language text. The arrays need to contain one sentence per array element. To split paragraphs into sentences the module Lingua::Sentence can be used.

EXPORT

split($sourceRef,$targetRef)

Align the bilingual sentences in the arrays referenced by the two arguments. The function returns two array references.

SUPPORT

Bugs should always be submitted via the CPAN bug tracker

http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-GaleChurch

For other issues, contact the maintainer.

SEE ALSO

Lingua::Sentence

Google code project: http://code.google.com/p/corpus-tools/

AUTHOR

Achim Ruopp, <achimru@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2010 by Digital Silk Road

Portions Copyright (C) 2005 by Philip Koehn and Josh Schroeder (used with permission)

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.