NAME
Catmandu::Importer - Namespace for packages that can import
SYNOPSIS
# From the command line
# JSON is an importer and YAML an exporter
$ catmandu convert JSON to YAML < data.json
# OAI is an importer and JSON an exporter
$ catmandu convert OAI --url http://biblio.ugent.be/oai to JSON
# Fetch remote content
$ catmandu convert JSON --file http://example.com/data.json to YAML
# From Perl
use
Catmandu;
use
Data::Dumper;
my
$importer
= Catmandu->importer(
'JSON'
,
file
=>
'data.json'
);
$importer
->
each
(
sub
{
my
$item
=
shift
;
Dumper(
$item
);
});
my
$num
=
$importer
->count;
my
$first_item
=
$importer
->first;
# Convert OAI to JSON in Perl
my
$exporter
= Catmandu->exporter(
'JSON'
);
$exporter
->add_many(
$importer
);
DESCRIPTION
A Catmandu::Importer is a Perl package that can generate structured data from sources such as JSON, YAML, XML, RDF or network protocols such as Atom, OAI-PMH, SRU and even DBI databases. Given an Catmandu::Importer a programmer can read data from using one of the many Catmandu::Iterable methods:
$importer
->to_array;
$importer
->count;
$importer
->
each
(\
&callback
);
$importer
->first;
$importer
->rest;
...etc...
Every Catmandu::Importer is also Catmandu::Fixable and thus inherits a 'fix' parameter that can be set in the constructor. When given a 'fix' parameter, then each item returned by the generator will be automatically Fixed using one or more Catmandu::Fixes. E.g.
my
$importer
= Catmandu->importer(
'JSON'
,
fix
=> [
'upcase(title)'
]);
$importer
->
each
(
sub
{
my
$item
=
shift
;
# Every $item->{title} is now upcased...
});
# or via a Fix file
my
$importer
= Catmandu->importer(
'JSON'
,
fix
=> [
'/my/fixes.txt'
]);
$importer
->
each
(
sub
{
my
$item
=
shift
;
# Every $item->{title} is now upcased...
});
CONFIGURATION
- file
-
Read input from a local file given by its path. If the path looks like a url, the content will be fetched first and then passed to the importer. Alternatively a scalar reference can be passed to read from a string.
- fh
-
Read input from an IO::Handle. If not specified, Catmandu::Util::io is used to create the input stream from the
file
argument or by using STDIN. - encoding
-
Binmode of the input stream
fh
. Set to:utf8
by default. - fix
-
An ARRAY of one or more Fix-es or Fix scripts to be applied to imported items.
- data_path
-
The data at
data_path
is imported instead of the original data.# given this imported item:
{
abc
=> [{
a
=>1},{
b
=>2},{
c
=>3}]}
# with data_path 'abc', this item gets imported instead:
[{
a
=>1},{
b
=>2},{
c
=>3}]
# with data_path 'abc.*', 3 items get imported:
{
a
=>1}
{
b
=>2}
{
c
=>3}
- variables
-
Variables given here will interpolate the
file
andhttp_body
options. The syntax is the same as URI::Template.# named arguments
my
$importer
= Catmandu->importer(
'JSON'
,
variables
=> {
server
=>
'biblio.ugent.be'
,
path
=>
'file.json'
},
);
# positional arguments
my
$importer
= Catmandu->importer(
'JSON'
,
variables
=>
'biblio.ugent.be,file.json'
,
);
# or
my
$importer
= Catmandu->importer(
'JSON'
,
variables
=> [
'biblio.ugent.be'
,
'file.json'
],
);
# or via the command line
HTTP CONFIGURATION
These options are only relevant if file
is a url. See LWP::UserAgent for details about these options.
- http_body
-
Set the GET/POST message body.
- http_method
-
Set the type of HTTP request 'GET', 'POST' , ...
- http_headers
-
A reference to a HTTP::Headers objects.
Set an own HTTP client
Alternative set the parameters of the default client
- http_agent
-
A string containing the name of the HTTP client.
- http_max_redirect
-
Maximum number of HTTP redirects allowed.
- http_timeout
-
Maximum execution time.
- http_verify_hostname
-
Verify the SSL certificate.
- http_retry
-
Maximum times to retry the HTTP request if it temporarily fails. Default is not to retry. See LWP::UserAgent::Determined for the HTTP status codes that initiate a retry.
- http_timing
-
Maximum times and timeouts to retry the HTTP request if it temporarily fails. Default is not to retry. See LWP::UserAgent::Determined for the HTTP status codes that initiate a retry and the format of the timing value.
METHODS
first, each, rest , ...
See Catmandu::Iterable for all inherited methods.
CODING
Create your own importer by creating a Perl package in the Catmandu::Importer namespace that implements Catmandu::Importer
. Basically, you need to create a method 'generate' which returns a callback that creates one Perl hash for each call:
my
$importer
= Catmandu::Importer::Hello->new;
$importer
->generate();
# record
$importer
->generate();
# next record
$importer
->generate();
# undef = end of stream
Here is an example of a simple Hello
importer:
package
Catmandu::Importer::Hello;
use
Catmandu::Sane;
use
Moo;
sub
generator {
my
(
$self
) =
@_
;
state
$fh
=
$self
->fh;
my
$n
= 0;
return
sub
{
$self
->
log
->debug(
"generating record "
. ++
$n
);
my
$name
=
$self
->fh->
readline
;
return
defined
$name
? {
"hello"
=>
$name
} :
undef
;
};
}
1;
This importer can be called via the command line as:
$ catmandu convert Hello to JSON < /tmp/names.txt
$ catmandu convert Hello to YAML < /tmp/names.txt
$ catmandu
import
Hello to MongoDB --database_name test < /tmp/names.txt
Or, via Perl
use
Catmandu;
my
$importer
= Catmandu->importer(
'Hello'
,
file
=>
'/tmp/names.txt'
);
$importer
->
each
(
sub
{
my
$items
=
shift
;
});
SEE ALSO
Catmandu::Iterable , Catmandu::Fix , Catmandu::Importer::CSV, Catmandu::Importer::JSON , Catmandu::Importer::YAML