Lingua::NL::FactoidExtractor - A tool for extracting factoids from Dutch texts
use strict; use lib "./lib"; use Lingua::NL::FactoidExtractor; my $inputfile = "alpino.xml"; my $verbose = 1; #boolean my $factoids = extract($inputfile,$verbose); print "$factoids\n";
The Dutch parser Alpino is a prerequisite for this module. Alpino is available under the conditions of the Gnu Lesser General Public License. See The Alpino Home Page.
MEN|open|de luchthaven|op 8 juli 1964
de bandleden|speel_in|de instrumenten|opnieuw
Rome|IS|de hoofdstad van Italië|opnieuw
de behandeling van Crohn|IS|symptomatisch|
de voornaamste vertegenwoordiger|IS|Rembrandt Rembrandt|schilder|veel Bijbelse taferelen|
Bangalore|IS|een belangrijke industriestad|Voor de onafhankelijkheid het|IS|een belangrijk centrum van de informatietechnologie in India|meer recent MEN|noem|het & de Silicon Valley van India|meer recent & wel het|IS|de Silicon Valley van India
If punctuation such as a full stop or a comma is glued to a word in the Alpino output then this punctuation also ends up in the factoids extracted from the sentence. Work-around is to use a tokenizer that separates punctuation from words by whitespace before parsing the sentence.
Suzan Verberne, http://sverberne.ruhosting.nl
Copyright (C) 2012 by Suzan Verberne
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.
This work was funded by Google by means of a European Digital Humanities Award.
To install Lingua::NL::FactoidExtractor, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::NL::FactoidExtractor
CPAN shell
perl -MCPAN -e shell install Lingua::NL::FactoidExtractor
For more information on module installation, please visit the detailed CPAN module installation guide.