NAME
WWW::Wikipedia::LangTitles - get interwiki links from Wikipedia.
SYNOPSIS
use
utf8;
my
$title
=
'Three-phase electric power'
;
my
$links
= get_wiki_titles (
$title
);
"$title is '$links->{de}' in German.\n"
;
my
$film
=
'東京物語'
;
my
$flinks
= get_wiki_titles (
$film
,
lang
=>
'ja'
);
"映画「$film」はイタリア語で「$flinks->{it}」と名付けた。\n"
;
produces output
Three-phase electric power is
'Dreiphasenwechselstrom'
in German.
映画「東京物語」はイタリア語で「Viaggio a Tokyo」と名付けた。
(This example is included as synopsis.pl in the distribution.)
VERSION
This documents version 0.04 of WWW::Wikipedia::LangTitles corresponding to git commit cd5d0156c401472bc424421159fca7d3c0f769fe released on Thu Jul 20 13:15:53 2017 +0900.
DESCRIPTION
This module retrieves the Wikipedia interwiki link titles from the web site wikidata.org. It can be used, for example, to translate a term in English into other languages, or to get near equivalents.
FUNCTIONS
get_wiki_titles
my
$ref
= get_wiki_titles (
'Helium'
);
Given a word or phrase as an argument, which is the title of a Wikipedia article, the return value is a hash reference containing keys which are language codes, and values which are the names of the equivalent Wikipedia article in other languages. For example, in the above case of Helium, $ref->{th}
will be equal to ฮีเลียม, the Thai title of the Wikipedia article on helium.
The language of the original page can be specified like this:
use
utf8;
my
$from_th
= get_wiki_titles (
'ฮีเลียม'
,
lang
=>
'th'
);
The URL is encoded using "uri_escape_utf8" in URI::Escape, so use character, not byte, strings (use "use utf8;" etc.)
As of version 0.04, get_wiki_titles deletes the non-encyclopedia sites like Wikiquote and Wikiversity from the list of returned values.
make_wiki_url
my
$url
= make_wiki_url (
'helium'
);
Make a URL for the Wikidata page. You will then need to retrieve the page and parse the JSON yourself. Use a second argument to specify the language of the page:
use
utf8;
make_wiki_url (
'ฮีเลียม'
,
'th'
),
"\n"
;
produces output
https://www.wikidata.org/w/api.php?action=wbgetentities
&sites
=thwiki
&titles
=
%E0
%B8
%AE
%E0
%B8
%B5
%E0
%B9
%80
%E0
%B8
%A5
%E0
%B8
%B5
%E0
%B8
%A2
%E0
%B8
%A1
&props
=sitelinks/urls|datatype
&format
=json
(This example is included as thai-url.pl in the distribution.)
If no language is specified, the default is en
for English.
This method was added in version 0.02 of the module.
SEE ALSO
- Locale::Codes
-
This module enables one to convert the language key names given by this module into the English-language names of the languages.
use
utf8;
my
$article
=
'King Kong'
;
my
$titles
= get_wiki_titles (
$article
);
for
my
$lang
(
keys
%$titles
) {
my
$l2c
= code2language (
$lang
);
if
(!
$l2c
) {
$l2c
=
$lang
;
}
my
$name
=
$titles
->{
$lang
};
if
(
$name
ne
$article
) {
print
"$name in $l2c.\n"
;
}
}
produces output
king.kong in jbo.
קינג קונג in Hebrew.
Кинг Конг in Bulgarian.
キングコング in Japanese.
كينغ كونغ in Arabic.
Кінг-Конг in Ukrainian.
King Kong (hahmo) in Finnish.
金剛 (怪獸) in Chinese.
Քինգ Քոնգ in Armenian.
คิงคอง in Thai.
کینگ کونگ in Persian.
Кинг-Конг in Russian.
킹콩 in Korean.
კინგ კონგი in Georgian.
(This example is included as locale-codes.pl in the distribution.)
DEPENDENCIES
- Carp
-
Carp is used to report errors
- LWP::UserAgent
-
LWP::UserAgent is used to retrieve the data from Wikidata.
- JSON::Parse
-
JSON::Parse is used to parse the JSON data from Wikidata.
- URI::Escape
-
URI::Escape is used to make the URLs for Wikidata from the input titles.
EXPORTS
Nothing is exported by default. The export tag ':all' exports all the functions of the module.
TESTING
The default tests of the module do not attempt to connect to the internet. To test using an internet connection, run xt/scrape.t like this:
prove -I lib xt/scrape.t
from the top directory of the distribution.
HISTORY
This module was a collection of small scripts I had been using to scrape multilingual article names related to physics from Wikipedia. I made the scripts into a CPAN module because I thought it could be useful to other people. Specifically, I used my scripts to add some Japanese element names to Chemistry::Elements, and I thought this method might be useful for someone else.
Version 0.02 added the "make_wiki_url" for people who want to retrieve and parse the output themselves.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2016-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.