NAME

nlquestion2sparqlquery - Perl script for converting Natural Language Questions in SPARQL queries

SYNOPSIS

nlquestion2sparqlquery [option] --input <FILENAME>

OPTIONS AND ARGUMENTS

--input=filename, -i filename

This option defines the input file to load. If the filename is - (or the option is not specified), the input data is read on STDIN.
--output <filename>

This option defines the output file to load. If the filename is - (or the option is not specified), the output data is print on STDOUT.
--rcfile=file, -c file

Load the given configuration file.
--answer, -a

This option specifies if the answers are returned (otherwise, the SPARQL query is returned)
--format [XML|SPARQL], -f [XML|SPARQL]

This option defines the format of the output:
- XML: the output is in XML, as required by the QALD challenge
- SPARQL: the output is the SPARQL query or the list of answers
--help

Print help message for using nlquestion2sparqlquery
--man

Print man page of nlquestion2sparqlquery
--verbose, -v

Go into the verbose mode. Note that the verbosity can be increased by using several times the option.
--debug, -D

Switch in debug mode for the script nlquestion2sparqlquery (the switch has no influence on the object code).

DESCRIPTION

This script aims at querying RDF knowledge base with questions expressed in Natural language. Natural language questions are converted in SPARQL queries. The method is based on rules and resources. Resources are provided for querying the Drugbank (<http://www.drugbank.ca>), Diseasome (<http://diseasome.eu>) and Sider (<http://sideeffects.embl.de>).

The Natural language question has been already annotated with linguistic and semantic information. Input file provides this information (see details regarding the format in the section INPUT FORMAT).

If you use this software, please cite:

Natural Language Question Analysis for Querying Biomedical Linked Data Thierry Hamon, Natalia Grabar, and Fleur Mougin. Natural Language Interfaces for Web of Data (NLIWod 2014). 2014. To appear.

EXAMPLES of USE

Tu run the script, a configuration file is needed (usually nlquestion.rc in /etc/nlquestion - see section CONFIGURATION FILE FORMAT for more details. An example of the configuration file is available in etc/nlquestion/nlquestion.rc from the archive directory.

The most common command line to run nlquestion2sparqlquery is

nlquestion2sparqlquery -i example1.qald

It is assumed that the directory containing the program nlquestion2sparqlquery is in your PATH variable and that the configuration file is /etc/nlquestion/nlquestion.rc.

The SPARQL query is printed on the STDOUT in QALD XML format.
If you are not allow to copy the configuration file nlquestion.rc in the directory /etc/nlquestion (or create this directory), or if you want to use your own configuration file, you can specify the file with its path by using the option --rcfile

nlquestion2sparqlquery --rcfile nlquestion2.rc -i example1.qald
you can also change the format and record the results in a file

nlquestion2sparqlquery --rcfile nlquestion2.rc -i example1.qald -f SPARQL -a -o example1.out

INPUT FORMAT

The input file is composed of several parts providing linguistic and semantic information on the natural language question:

the identifier of the question is introduced by DOC: on one line. For instance:
```
 DOC: question1
```
The end of the information associated to the document is marked by the keyword _END_DOC_ .
the definition of the language of the question is defined with language: on one line. For instance:
```
 language: EN
```
the list of the sentence(s) is introducted by the keyword sentence: and ends with the keyword _END_SENT_ (both in one line). For instance:
```
 sentence:
 Which diseases is Cetuximab used for?
 _END_SENT_
```
the morpho-syntactic information associated to each word is introduced by the keyword word information: ends with the keyword _END_POSTAG_ (both in one line). Each line contains 4 information separated by tabulations: the inflected form of the word, its part-of-speech tag, its lemma and its offset (in number of characters). For instance:
```
 word information:
 Which  WDT     which   10      
 diseases       NNS     disease 16      
 is     VBZ     be      25      
 Cetuximab      VBN     Cetuximab       28      
 used   VBN     use     38      
 for    IN      for     43      
 ?      SENT    ?       46      
 _END_POSTAG_
```

the semantic entities and associated semantic information is introduced by the keyword semantic units: ends with the keyword _END_SEM_UNIT_ (both in one line). Each line contains 5 information separated by tabulations: the semantic entity, its canonical form, its semantic types (separated by column), its start offset and its end offset (in number of characters). For instance:

 semantic units:
 # term form<tab>term canonical form<tab>semantic features<tab>offset start<tab>offset end (ended by _END_SEM_UNIT_)
 diseases       diseas  disease:disease 16      23
 Cetuximab      Cetuximab       drug/drugbank/gen/DB00002:drug/drugbank/gen/DB00002     28      36
 used for       used for        possibleDrug:possibleDrug       38      45
 Cetuximab      Cetuximab       drug/drugbank/gen/DB00002:drug/drugbank/gen/DB00002     28      36
 diseases       diseas  disease:disease 16      23
 used for       used for        possibleDrug:possibleDrug       38      45
 _END_SEM_UNIT_

Semantic types can be decomposed in subtypes. They are coded in the same way as a unix file path.

NB: Comments are introduced by the character #. Empty lines are ignored.

Examples of files are available in the example of the archive.

CONFIGURATION FILE FORMAT

The configuration file format is similar to the Apache configuration format. The module Config::General is used to read the file. There are sections named NLQUESTION for each language (identified with the attribute language). Each section defines the following variables defining the behaviour of the script:

VERBOSE: it defines the verbose mode level similarly to the option --verbose. It is overwritten by this option.
REGEXFORM: this boolean variable indicates if in case of use of regex, the inflected form (value 1) or canonical form (value 0) is used.
UNION: this boolean variable indicates if the union is used or not
SEMANTICTYPECORRESPONDANCE: this variable defines the file containing the semantic information (rewriting rules, semantic correspondance, etc.) to generate the SPARQL queries
URL_PREFIX: it specifies the begining of the URL (before the SPARQL query) when the query is sent to a virtuoso server.
URL_SUFFIX: it specifies the end of the URL (before the SPARQL query) when the query is sent to a virtuoso server.

AUTHOR

Thierry Hamon, <hamon@limsi.fr>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

To install RDF::NLP::SPARQLQuery, copy and paste the appropriate command in to your terminal.

cpanm

cpanm RDF::NLP::SPARQLQuery

CPAN shell

perl -MCPAN -e shell
install RDF::NLP::SPARQLQuery

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)