The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RDF::TrineX::Merge::Bnodes - Merge blank nodes that obviously refer to the same resource

VERSION

version 0.1.1

SYNOPSIS

    use RDF::TrineX::Merge::Bnodes;

    $model = merge_bnodes($model_or_iterator, %options);

To give an example, applying merge_bnodes on this graph:

    @prefix foaf: <http://xmlns.com/foaf/0.1/> .
    @base   <http://example.org/> .

    <Alice> foaf:knows [ a foaf:Person ; foaf:name "Bob" ] .
    <Alice> foaf:knows [ a foaf:Person ; foaf:name "Bob" ] . # obviously the same

will remove the second Bob.

DESCRIPTION

This module exports the function merge_bnodes to merge blank nodes that obviously refer to the same resource in an RDF graph. The function gets passed a RDF::Trine::Model or RDF::Trine::Iterator. The model or iterator should only contain RDF-compatible statements (e.g. no blank node predicates).

The function can be applied to get rid of obviously duplicated statements. Obviously duplicated statements are defined as following:

  • The statements include either a blank node subject or a blank node object.

  • The statements only differ by their blank node identifier.

  • The blank nodes are not part of any other statement that includes two blank nodes.

In other words, the algorithm first finds all star subgraphs with the internal node as only blank nodes in the subgraph. Each subgraph is assigned a digest value calculated from all triples and nodes expect the blank nodes. Then duplicated subgraphs with same digest are removed.

LIMITATIONS

Statements that involve multiple blank nodes or blank nodes that are connected to another blank node are never removed.

Don't expect the algorithm to understand what you is actually meant by the existence of blank nodes in your data.

CONFIGURATION

Options can be passed as key-value pairs:

digest

A Digest or the name of a Digest module, e.g. "MD4". The default digest is Digest::MD5.

Options not implemented yet:

  • Option to skolemize blank nodes (IRIs with .well-known/genid/).

  • Option to also remove entailed statements with blank nodes:

        <Alice> foaf:knows [ a foaf:Person ; foaf:name "Bob" ] .
        <Alice> foaf:knows [ a foaf:Person ] . # could also be removed

AUTHOR

Jakob Voß

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Jakob Voß.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.