The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XRD::Parser - parse XRD and host-meta files into RDF::Trine models

SYNOPSIS

  use RDF::Query;
  use XRD::Parser;
  
  my $parser = XRD::Parser->new(undef, "http://example.com/foo.xrd");
  $parser->consume;
  
  my $results = RDF::Query->new(
    "SELECT * WHERE {?who <http://spec.example.net/auth/1.0> ?auth.}")
    ->execute($parser->graph);
        
  while (my $result = $results->next)
  {
    print $result->{'auth'}->uri . "\n";
  }

  my $data = XRD::Parser->hostmeta('gmail.com')
                          ->consume
                            ->graph
                              ->as_hashref;

VERSION

0.05

DESCRIPTION

While XRD has a rather different history, it turns out it can mostly be thought of as a serialisation format for a limited subset of RDF.

This package ignores the order of <Link> elements, as RDF is a graph format with no concept of statements coming in an "order". The XRD spec says that grokking the order of <Link> elements is only a SHOULD. That said, if you're concerned about the order of <Link> elements, the callback routines allowed by this package may be of use.

This package aims to be roughly compatible with RDF::RDFa::Parser's interface.

Constructors

$p = XRD::Parser->new($content, $uri, \%options, $store)

This method creates a new XRD::Parser object and returns it.

The $content variable may contain an XML string, or a XML::LibXML::Document. If a string, the document is parsed using XML::LibXML::Parser, which may throw an exception. XRD::Parser does not catch the exception.

$uri the supposed URI of the content; it is used to resolve any relative URIs found in the XRD document. Also, if $content is empty, then XRD::Parser will attempt to retrieve $uri using LWPx::ParanoidAgent.

Options [default in brackets]:

  * link_prop       - How to handle <Property> in <Link>? [0]
                      0=skip, 1=reify, 2=subproperty, 3=both.
  * tdb_service     - thing-described-by.org when possible. [0]

$storage is an RDF::Trine::Storage object. If undef, then a new temporary store is created.

$p = XRD::Parser->hostmeta($uri)

This method creates a new XRD::Parser object and returns it.

The parameter may be a URI (from which the hostname will be extracted) or just a bare host name (e.g. "example.com"). The resource "/.well-known/host-meta" will then be fetched from that host using an appropriate HTTP Accept header, and the parser object returned.

Public Methods

$p->uri($uri)

Returns the base URI of the document being parsed. This will usually be the same as the base URI provided to the constructor.

Optionally it may be passed a parameter - an absolute or relative URI - in which case it returns the same URI which it was passed as a parameter, but as an absolute URI, resolved relative to the document's base URI.

This seems like two unrelated functions, but if you consider the consequence of passing a relative URI consisting of a zero-length string, it in fact makes sense.

$p->dom

Returns the parsed XML::LibXML::Document.

$p->set_callbacks(\%callbacks)

Set callback functions for the parser to call on certain events. These are only necessary if you want to do something especially unusual.

  $p->set_callbacks({
    'pretriple_resource' => sub { ... } ,
    'pretriple_literal'  => sub { ... } ,
    'ontriple'           => undef ,
    });

Either of the two pretriple callbacks can be set to the string 'print' instead of a coderef. This enables built-in callbacks for printing Turtle to STDOUT.

For details of the callback functions, see the section CALLBACKS. set_callbacks must be used before consume. set_callbacks itself returns a reference to the parser object itself.

NOTE: the behaviour of this function was changed in version 0.05.

$p->consume

This method processes the input DOM and sends the resulting triples to the callback functions (if any).

Returns the parser object itself.

$p->graph

This method will return an RDF::Trine::Model object with all statements of the full graph.

It makes sense to call consume before calling graph. Otherwise you'll just get an empty graph.

Utility Functions

$uri = XRD::Parser::host_uri($uri)

Returns a URI representing the host. These crop up often in host-meta files.

$uri = XRD::Parser::template_uri($relationship_uri)

Returns a URI representing not a normal relationship, but the relationship between a host and a template URI literal.

CALLBACKS

Several callback functions are provided. These may be set using the set_callbacks function, which taskes a hashref of keys pointing to coderefs. The keys are named for the event to fire the callback on.

pretriple_resource

This is called when a triple has been found, but before preparing the triple for adding to the model. It is only called for triples with a non-literal object value.

The parameters passed to the callback function are:

  • A reference to the XRD::Parser object

  • A reference to the XML::LibXML::Element being parsed

  • Subject URI or bnode (string)

  • Predicate URI (string)

  • Object URI or bnode (string)

The callback should return 1 to tell the parser to skip this triple (not add it to the graph); return 0 otherwise.

pretriple_literal

This is the equivalent of pretriple_resource, but is only called for triples with a literal object value.

The parameters passed to the callback function are:

  • A reference to the XRD::Parser object

  • A reference to the XML::LibXML::Element being parsed

  • Subject URI or bnode (string)

  • Predicate URI (string)

  • Object literal (string)

  • Datatype URI (string or undef)

  • Language (string or undef)

The callback should return 1 to tell the parser to skip this triple (not add it to the graph); return 0 otherwise.

ontriple

This is called once a triple is ready to be added to the graph. (After the pretriple callbacks.) The parameters passed to the callback function are:

  • A reference to the XRD::Parser object

  • A reference to the XML::LibXML::Element being parsed

  • An RDF::Trine::Statement object.

The callback should return 1 to tell the parser to skip this triple (not add it to the graph); return 0 otherwise. The callback may modify the RDF::Trine::Statement object.

SEE ALSO

RDF::Trine, RDF::Query, RDF::RDFa::Parser.

http://www.perlrdf.org/.

AUTHOR

Toby Inkster, <tobyink@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2010 by Toby Inkster

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8 or, at your option, any later version of Perl 5 you may have available.