The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::Microformats - parse microformats in HTML

SYNOPSIS

 use HTML::Microformats;
 
 my $doc = HTML::Microformats
             ->new_document($html, $uri)
             ->assume_profile(qw(hCard hCalendar));
 print $doc->json(pretty => 1);
 
 use RDF::TrineShortcuts qw(rdf_query);
 my $results = rdf_query($sparql, $doc->model);
 

VERSION

0.100

DESCRIPTION

The HTML::Microformats module is a wrapper for parser and handler modules of various individual microformats (each of those modules has a name like HTML::Microformats::Format::Foo).

The general pattern of usage is to create an HTML::Microformats object (which corresponds to an HTML document) using the new_document method; then ask for the data, as a Perl hashref, a JSON string, or an RDF::Trine model.

Constructor

$doc = HTML::Microformats->new_document($html, $uri, %opts)

Constructs a document object.

$html is the HTML or XHTML source (string) or an XML::LibXML::Document.

$uri is the document URI, important for resolving relative URL references.

%opts are additional parameters; currently only one option is defined: $opts{'type'} is set to 'text/html' or 'application/xhtml+xml', to control how $html is parsed.

Profile Management

HTML::Microformats uses HTML profiles (i.e. the profile attribute on the HTML <head> element) to detect which Microformats are used on a page. Any microformats which do not have a profile URI declared will not be parsed.

Because many pages fail to properly declare which profiles they use, there are various profile management methods to tell HTML::Microformats to assume the presence of particular profile URIs, even if they're actually missing.

$doc->profiles

This method returns a list of profile URIs declared by the document.

$doc->has_profile(@profiles)

This method returns true if and only if one or more of the profile URIs in @profiles is declared by the document.

$doc->add_profile(@profiles)

Using add_profile you can add one or more profile URIs, and they are treated as if they were found on the document.

For example:

 $doc->add_profile('http://microformats.org/profile/rel-tag')

This is useful for adding profile URIs declared outside the document itself (e.g. in HTTP headers).

Returns a reference to the document.

$doc->assume_profile(@microformats)

For example:

 $doc->assume_profile(qw(hCard adr geo))

This method acts similarly to add_profile but allows you to use names of microformats rather than URIs.

Microformat names are case sensitive, and must match HTML::Microformats::Format::Foo module names.

Returns a reference to the document.

$doc->assume_all_profiles

This method is equivalent to calling assume_profile for all known microformats.

Returns a reference to the document.

Parsing Microformats

Generally speaking, you can skip this. The data, json and model methods will automatically do this for you.

$doc->parse_microformats

Scans through the document, finding microformat objects.

On subsequent calls, does nothing (as everything is already parsed).

Returns a reference to the document.

$doc->clear_microformats

Forgets information gleaned by parse_microformats and thus allows parse_microformats to be run again. This is useful if you've modified added some profiles between runs of parse_microformats.

Returns a reference to the document.

Retrieving Data

These methods allow you to retrieve the document's data, and do things with it.

$doc->objects($format);

$format is, for example, 'hCard', 'adr' or 'RelTag'.

Returns a list of objects of that type. (If called in scalar context, returns an arrayref.)

Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details.

$doc->all_objects

Returns a hashref of data. Each hashref key is the name of a microformat (e.g. 'hCard', 'RelTag', etc), and the values are arrayrefs of objects.

Each object is, for example, an HTML::Microformat::hCard object, or an HTML::Microformat::RelTag object, etc. See the relevent documentation for details.

$doc->json(%opts)

Returns data roughly equivalent to the all_objects method, but as a JSON string.

%opts is a hash of options, suitable for passing to the JSON module's to_json function. The 'convert_blessed' and 'utf8' options are enabled by default, but can be disabled by explicitly setting them to 0, e.g.

  print $doc->json( pretty=>1, canonical=>1, utf8=>0 );
$doc->model

Returns data as an RDF::Trine::Model, suitable for serialising as RDF or running SPARQL queries.

$object->serialise_model(as => $format)

As model but returns a string.

$doc->add_to_model($model)

Adds data to an existing RDF::Trine::Model.

Returns a reference to the document.

Utility Functions

HTML::Microformats->modules

Returns a list of Perl modules, each of which implements a specific microformat.

HTML::Microformats->formats

As per modules, but strips 'HTML::Microformats::Format::' off the module name, and sorts alphabetically.

WHY ANOTHER MICROFORMATS MODULE?

There already exist two microformats packages on CPAN (see Text::Microformat and Data::Microformat), so why create another?

Firstly, HTML::Microformats isn't being created from scratch. It's actually a fork/clean-up of a non-CPAN application (Swignition), and in that sense predates Text::Microformat (though not Data::Microformat).

It has a number of other features that distinguish it from the existing packages:

  • It supports more formats.

    HTML::Microformats supports hCard, hCalendar, rel-tag, geo, adr, rel-enclosure, rel-license, hReview, hResume, hRecipe, xFolk, XFN, hAtom, hNews and more.

  • It supports more patterns.

    HTML::Microformats supports the include pattern, abbr pattern, table cell header pattern, value excerpting and other intricacies of microformat parsing better than the other modules on CPAN.

  • It offers RDF support.

    One of the key features of HTML::Microformats is that it makes data available as RDF::Trine models. This allows your application to benefit from a rich, feature-laden Semantic Web toolkit. Data gleaned from microformats can be stored in a triple store; output in RDF/XML or Turtle; queried using the SPARQL or RDQL query languages; and more.

    If you're not comfortable using RDF, HTML::Microformats also makes all its data available as native Perl objects.

BUGS

Please report any bugs to http://rt.cpan.org/.

SEE ALSO

HTML::Microformats::Documentation::Notes.

Individual format modules:

Similar modules: RDF::RDFa::Parser, HTML::HTML5::Microdata::Parser, XML::Atom::Microformats, Text::Microformat, Data::Microformats.

Related web sites: http://microformats.org/, http://www.perlrdf.org/.

AUTHOR

Toby Inkster <tobyink@cpan.org>.

COPYRIGHT

Copyright 2008-2010 Toby Inkster

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 384:

'=item' outside of any '=over'