++ed by:

1 PAUSE user
1 non-PAUSE user.

Toby Inkster


Web::Magic - HTTP dwimmery


 use Web::Magic -feature => 'JSON';
 say Web::Magic->new('http://json-schema.org/card')->{description};


 use Web::Magic -sub => 'W', -feature => 'JSON';
 say W('http://json-schema.org/card')->{description};


On the surface of it, Web::Magic appears to just perform HTTP requests, but it's more than that. A URL blessed into the Web::Magic package can be interacted with in all sorts of useful ways.


new ([$method,] $uri [, %args])

$method is the HTTP method to use with the URI, such as 'GET', 'POST', 'PUT' or 'DELETE'. The HTTP method must be capitalised to avoid it being interpreted by the constructor as a URI. It defaults to 'GET'.

The URI should be an HTTP or HTTPS URL. Other URI schemes may work to varying degress of success (e.g. "ftp://" supports GET, HEAD and PUT requests, but bails on other methods; "mailto:" supports POST, but bails on others). It may be given as a plain string, or an object blessed into the URI or RDF::Trine::Node::Resource classes. (Objects blessed into XML::Feed::Entry, XML::LibXML::Attr and XML::LibXML::Element will often work too, though this is somewhat obscure magic.)

The %args hash is a convenience for constructing HTTP query strings. Hash values should be scalars, or at least overload stringification. The following are all equivalent...

 Web::Magic->new(GET => 'http://www.google.com/search', q => 'kittens');
 Web::Magic->new('http://www.google.com/search', q => 'kittens');
 Web::Magic->new(GET => 'http://www.google.com/search?q=kittens');

Note that %args always sets a URI query string, and does not set the request body, even in the case of the POST method. To set the request body, see the set_request_body method.

It is also possible to use the syntax:


Where $http_request_object is a HTTP::Request object. This uses not only the URI of the HTTP::Request object, but also its method, headers and body content.

new_from_data ($media_type, @data)

Allows you to instantiate a Web::Magic object from a string (actually a list of strings, that will be joined using the empty string). But, if you're not actually doing HTTP with Web::Magic, then you're probably missing the point of Web::Magic.

This works by passing the media type and data through to URI::data, and then using the new constructor. The object returned is in an already-requested state (i.e. is_requested is true; pre-request methods will fail).


You can import a sub to act as a shortcut for new:

 use Web::Magic -sub => 'W';
 W(GET => 'http://www.google.com/search', q => 'kittens');
 W('http://www.google.com/search', q => 'kittens');
 W(GET => 'http://www.google.com/search?q=kittens');

There is experimental support for a quote-like operator similar to q() or qq():

 use Web::Magic -quotelike => 'magic';
 my $kittens = magic <http://www.google.com/search?q=kittens>;

The quote-like operator does support interpolation, but requires the entire URL to be on one line (not that URLs generally contain line breaks).

No shortcut is provided for new_from_data.

In Perl one-liners (that is, using the "-e" or "-E" command-line options), use Web::Magic -sub => 'web' is automatically exported into main. So this works:

 perl -MWeb::Magic -E'web(q<http://example.com/>) \
   -> make_absolute_urls \
   -> findnodes("~links") \
   -> foreach(sub { say $_->{href} })'

Pre-Request Methods

Constructing a Web::Magic object doesn't actually perform a request for the URI. Web::Magic defers requesting the URI until the last possible moment. (Which in some cases will be when it slips out of scope, or even not at all.)

Pre-request methods are those that can be called before the request is made. Unless otherwise noted they will not themselves trigger the request to be made. Unless otherwise noted, they return a reference to the Web::Magic object itself, so can be chained:

  my $magic = Web::Magic
    ->new(GET => 'http://www.google.com/')

The following methods are pre-request.

set_request_method($method, [$body])

Sets the HTTP request method (e.g. 'GET' or 'POST'). You can optionally set the HTTP request body at the same time.

As a shortcut, you can use the method name as an object method. That is, the following are equivalent:

  $magic->set_request_method(POST => $body);

Using the latter technique, methods need to conform to this regular expression: /^[A-Z][A-Z0-9]{0,19}$/. (And certain Perl built-ins like $magic->DESTROY, $magic->AUTOLOAD, etc will use their Perl built-in meaning. However, currently there are no conflicts between Perl built-ins and officially defined HTTP methods. If in doubt, the set_request_method method will always work, as will the first parameter to the constructor.)

This will throw a Web::Magic::Exception::BadPhase::SetRequestMethod exception if called on a Web::Magic object that has already been requested.

set_request_header($header, $value)

Sets an HTTP request header (e.g. 'User-Agent').

As a shortcut, you can use the header name as an object method, substituting hyphens for underscores. That is, the following are equivalent:

  $magic->set_request_header('User-Agent', 'MyBot/0.1');

Using the latter technique, methods need to begin with a capital letter and contain at least one lower-case letter.

This will throw a Web::Magic::Exception::BadPhase::SetRequestHeader exception if called on a Web::Magic object that has already been requested.


Sets the body for a POST, PUT or other request that needs a body.

$body may be a string, but can be a hash or array reference, an XML::LibXML::Document or an RDF::Trine::Model, in which case they'll be serialised appropriately based on the Content-Type header of the request.

  my $magic = W('http://www.example.com/document-submission')

Yes, that's right. Even though the content-type is set *after* the body, it is still serialised appropriately. This is because serialisation is deferred until just before the request is made.

This will throw a Web::Magic::Exception::BadPhase::SetRequestBody exception if called on a Web::Magic object that has already been requested.

A Web::Magic::Exception::BadPhase::Cancel exception will be thrown if the body can't be serialised, but not until the request is actually performed.

Attaching files to a form submission: to attach files, you need to use a Content-Type of "multipart/form-data".

  my $magic = W('http://example.ie/song-competition-entry-form')
          title   => 'My Lovely Horse',
          singer  => 'Ted Krilly',
          attach  => ['dir/horse.mp3',
                      Content_Type => 'audio/mp3',
                      X_Encoding_Rate => '192 kbps',

Note that the key "attach" is not especially significant. It's equivalent to the name attribute of an HTML file submission control:

  <input type="file" name="attach">

What is significant is the use of an arrayref as attach's value. The first element in the array specifies a filename to load the data from (yes, a file handle might be nice, but it's not supported yet). The second element is the file name that you'd like to inform the server. Everything else is additional headers to submit with the file. "Content-Type" is just about the only additional header worth bothering with.

set_auth($username, $password)

Set username and password for HTTP Basic authentication.


A variant of the user_agent method (see below), which returns the invocant so can be used for chaining.


This method may be called to show you do not intend for this object to be requested. Attempting to request an object that has been cancelled will throw a Web::Magic::Exception::BadPhase::Cancel exception.

  my $magic = W('http://www.google.com/');
  $magic->do_request; # throws

Why is this needed? Because even if you don't explicitly call do_request, the request will be made implicitly in some cases. cancel allows you to avoid the implicit request.


Actually performs the HTTP request. You rarely need to call this method explicitly, as calling any Post-Request method will automatically call do_request.

do_request will be called automatically (via DESTROY) on any Web::Magic object that gets destroyed (e.g. goes out of scope) unless the request has been cancelled, or the request is unlikely to have had side-effects (i.e. its method is 'GET', 'HEAD', 'OPTIONS', 'TRACE' or 'SEARCH').

This will throw a Web::Magic::Exception::BadPhase::WillNotRequest exception if called on a Web::Magic object that has been cancelled.

Post-Request Methods

The following methods can be called after a request has been made, and will implicitly call do_request if called on an object which has not yet been requested.

These do not typically return a reference to the invocant Web::Magic object, so cannot always easily be chained. However, Web::Magic provides a tap method to force chaining even with these methods. (See Object::Tap.)


The response, as an HTTP::Response object.


The response body, as a string. This is a shortcut for:


Web::Magic overloads stringification calling this method. Thus:

  print W('http://www.example.com/');

will print the content returned from 'http://www.example.com/'.


The response headers, as an HTTP::Headers object. This is a shortcut for:


A response header, as a string. This is a shortcut for:


Parses the response body as XML or HTML (depending on Content-Type header) and returns the result as an XML::LibXML::Document.

If XML::LibXML::Augment is installed and already loaded, then this method will also call XML::LibXML::Augment->rebless on the resultant DOM tree. In particular, if HTML::HTML5::DOM is already loaded, this will supplement XML::LibXML's existing XML DOM support with most of the HTML5 DOM.

When to_dom is called on an unrequested Web::Magic object, it implicitly sets the HTTP Accept header to include XML and HTML unless the Accept header has already been set.

Additionally, the following methods can be called which implicitly call to_dom (see XML::LibXML::Document):

  • getElementsByTagName

  • getElementsByTagNameNS

  • getElementsByLocalName

  • getElementsById

  • documentElement

  • cloneNode

  • firstChild

  • lastChild

  • findnodes (but see below)

  • find

  • findvalue

  • exists

  • childNodes

  • attributes

  • getNamespaces

  • querySelector

  • querySelectorAll

So, for example, the following are equivalent:

  my @titles = W('http://example.com/')
  my @titles = W('http://example.com/')

I'll just draw your attention to querySelector and querySelectorAll which were mentioned in the previous list, but are hidden gems. See XML::LibXML::QuerySelector for further details.

This will throw a Web::Magic::Exception::BadReponseType exception if the HTTP response has a Content-Type that cannot be converted to a DOM.


If $xpath matches /^[~]\w+$/ then, it is looked up in %Web::Magic::XPaths which is a hash of useful XPaths.

 $magic->findnodes('~links')     # //a[@href], //link[@href], etc
 $magic->findnodes('~images')    # //img[@src]
 $magic->findnodes('~resources') # //img[@src], //video[@src], etc

Future versions of Web::Magic should add more.

make_absolute_urls($xpath_context, @xpaths)

Replaces relative URLs with absolute ones. Currently this only affects the data you get back from to_dom and friends. (That is, if you call content you'll see the original relative URLs.)

$xpath_context should be an XML::LibXML::XPathContext object. If undefined, a suitable default will be created automatically based on the namespaces defined in the document. If the document was served with a media type matching the regular expression /html/i then this automatic context will include the XHTML namespace bound to the prefix "xhtml".

@xpaths is a list of XPaths which should select attributes and/or text nodes. (Any other nodes selected, such as element nodes, will be ignored.) If called with the empty list, then for media types matching /html/i, a default list is used:


Because of the defaults, using this method with (X)HTML is very easy:

 my $dom = W(GET => 'http://example.com/document.html')
   -> User_Agent('MyFoo/0.001')
   -> assert_success
   -> make_absolute_urls
   -> to_dom;

make_absolute_urls defers to XML::LibXML and HTTP::Response to determine the correct base URL. These should do the right thing with xml:base attributes, the HTML <base> element and the Content-Location and Content-Base HTTP headers, but I'm sure if you try hard enough, you could trick it.

Note that hunting for every single relative URL in the DOM, and replacing them all with absolute URLs is often overkill. It may be more efficient to find just the URLs you need and make them absolute like this:

 my $link = do { ... some code that selects an element ... };
 my $abs  = URI->new_abs(
   $link->getAttribute('href'),  # or in XML::LibXML 1.91+: $link->{href}
   ($link->baseURI // $link->ownerDocument->URI),

You could even consider monkey-patching XML::LibXML to do the work for you:

 sub XML::LibXML::Element::getAttributeURI
   my ($self, $attr) = @_;
     ($self->baseURI // $self->ownerDocument->URI),

But make_absolute_urls is nice for quick scripts where efficient coding is more important than efficient execution.

This method returns the invocant, so is suitable for chaining.


Parses the response body as JSON or YAML (depending on Content-Type header) and returns the result as a hashref (or arrayref).

Actually, technically it returns an JSON::JOM object which can be accessed as if it were a hashref or arrayref.

When a Web::Magic object is accessed as a hashref, this implicitly calls to_hashref. So the following are equivalent:


When to_hashref is called on an unrequested Web::Magic object, it implicitly sets the HTTP Accept header to include JSON and YAML unless the Accept header has already been set.

This will throw a Web::Magic::Exception::BadReponseType exception if the HTTP response has a Content-Type that cannot be converted to a hashref.


Finds nodes in the structure returned by to_hashref using JsonPath. This is actually a shortcut for:


Earlier versions of Web::Magic supported this functionality using a method called findNodes, but this was not documented because the idea of two different functions which differed only in case (findNodes and findnodes) was so horrible. Thus it has become json_findnodes and is now documented.

See also JSON::JOM::Plugins::JsonPath and JSON::Path.


Parses the response body as RDF/XML, Turtle, RDF/JSON or RDFa (depending on Content-Type header) and returns the result as an RDF::Trine::Model.

When to_model is called on an unrequested Web::Magic object, it implicitly sets the HTTP Accept header to include RDF/XML and Turtle unless the Accept header has already been set.

Additionally, the following methods can be called which implicitly call to_model (see RDF::Trine::Model):

  • subjects

  • predicates

  • objects

  • objects_for_predicate_list

  • get_pattern

  • get_statements

  • count_statements

  • get_sparql

  • as_stream

So, for example, the following are equivalent:


Returns a hashref of Open Graph Protocol data for the page, or if unable to, an empty hashref.

See also http://ogp.me/.


Parses the response body as Atom or RSS (depending on Content-Type header) and returns the result as an XML::Feed.

When to_feed is called on an unrequested Web::Magic object, it implicitly sets the HTTP Accept header to include Atom and RSS unless the Accept header has already been set.

Additionally, the following methods can be called which implicitly call to_feed (see XML::Feed):

  • entries

So, for example, the following are equivalent:


Saves the raw content retrieved from the URL to a file. May be passed a file handle or a file name.

If called pre-request, then will trigger do_request.

If called pre-request with a file name, will set the If-Modified-Since HTTP request header to that file's mtime.

If passed a file name, then additionally sets the file's mtime and atime to the date from the HTTP Last-Modified response header.

If the response is not a sucess (HTTP 2xx code) then acts as a no-op.

This method returns the invocant, so may be chained.

LWP::UserAgent's mirror method is perhaps somewhat more sophisticated, but only supports the HTTP GET method.

Any Time Methods

These can be called either before or after the request, and do not trigger the request to be made. They do not usually return the invocant Web::Magic object, so are not usually suitable for chaning.


Returns the original URI, as a URI object.

Additionally, the following methods can be called which implicitly call uri (see URI):

  • scheme

  • authority

  • path

  • query

  • host

  • port

So, for example, the following are equivalent:


If you need a copy of the URI as a string, two methods are:

  my $magic = W('http://example.com/');
  my $str_1 = $magic->uri->as_string;
  my $str_2 = $$magic;

The former perhaps makes for easier to read code; the latter is maybe slightly faster code.


Returns true if the invocant has already been requested.


Returns true if the invocant has been cancelled.

assert_response($name, $coderef)

Checks an assertion about the HTTP response. Web::Magic will blithely allow you to call to_hashref on a non-JSON/YAML response, or getElementsByTagName on an HTTP error page. This may not be what you want. assert_response allows you to check things are as expected before continuing, throwing a Web::Magic::Exception::AssertionFailure otherwise.

$coderef should be a subroutine that returns true if everything is OK, and false if something bad has happened. $name is just a label for the assertion, to provide a more helpful error message if the assertion fails.

 print W('http://example.com/data.json')
   ->assert_response(correct_type => sub { $_->content_type =~ /json/i })

Your subroutine is called with the Web::Magic object as $_[0] (this was changed between Web::Magic 0.003 and 0.004). Additionally, $_ is set to the HTTP::Response object.

An assertion can be made at any time. If made before the request, then it is queued up for checking later. If the assertion is made after the request, it is checked immediately.

This method returns the invocant, so may be chained.


A shortcut for:

  assert_response(success => sub { $_->is_success })

This checks the HTTP response has a 2XX HTTP status code.


Another shortcut for a common assertion - checks that the response HTTP Content-Type header is as expected.

If called before the request has been issued, then this method will also set an HTTP Accept header for the request. (But if you've set one manually, it will not over-ride it.)

 $magic->assert_content_type(qw{ text/html application/xhtml+xml })

Returns true if the Web::Magic object has had any response assertions made. (In fact, returns the number of such assertions.)


This method can be called as an object method or a class method, with slightly different semantics for each.

Object method: Get/set the LWP::UserAgent that will be used (or has been used) to issue this request.

  $magic->user_agent($ua);      # set
  my $ua = $magic->user_agent;  # get

If called as a setter on a Web::Magic object that has already been requested, then throws a Web::Magic::Exception::BadPhase exception.

If passed a hashref instead of a blessed user agent, Web::Magic will keep the existing user agent but use the hashref to set attributes for it.

    from          => 'tobyink@cpan.org',
    max_redirect  => 3,
  # the above is a shortcut for:

Class method: In usual Web::Magic usage, a new user agent is instantiated for each request. However, it is possible to create a global user agent to use as the default UA for all future requests.

  Web::Magic->user_agent( LWP::UserAgent->new(...) );

This may be useful for caching, retaining cookies between requests, etc. When a global user agent is defined, it is still possible to set user_agent on individual user_agent instances, using the user_agent object method. You can clear the global user agent using:

  Web::Magic->user_agent( undef );

Throws a Web::Magic::Exception if called as a setter and passed a defined but non-LWP::UserAgent value. (Unlike the object method, the class method does not accept a plain hashref.)

Package variable: As an alternative way of accessing the global user agent, you can use the package variable.

  $Web::Magic::user_agent = LWP::UserAgent->new(...);

This has the advantage that changes can be localised using Perl's local keyword, but it skips validation logic in the getter/setter so needs to be used with caution.


Returns the string 'Acme::24'.

Additionally, the following methods can be called which implicitly call acme_24:

  • random_jackbauer_fact

So, for example, the following are equivalent:


This method exists to emphasize the whimsical and experimental status of the current release of Web::Magic. If Web::Magic ever becomes ready for serious production use, expect the following to evaluate to false:



  • NAMESPACE_XHTML = 'http://www.w3.org/1999/xhtml'

Private Methods

The following methods should not normally be used, but may be useful for people wishing to subclass Web::Magic:

  • _stash

    A hashref for storing useful data.

  • _ua_string

    User-Agent header string to use for HTTP requests.

  • _request_object

    The (mutable) HTTP::Request object that can/will be used to issue the request.

  • _final_request_object(%default_headers)

    Returns the HTTP::Request object that will be used to issue the request. Sets %default_headers as HTTP request headers only if they are not already set. Serialises the request body from $self->_stash->{request_body}. Once this method has been called, it is assumed that no further changes will be made to the request object.

  • _check_assertions($reponse, @assertions)

    Each assertion is a [name, coderef] arrayref. Checks each assertion against the HTTP response, throwing exceptions as necessary.

  • _cancel_progress

    A no-op in this implementation. This method is sometimes called just prior to an exception being thrown. Thus, in an asynchronous implementation which performs HTTP requests in a background thread, you can use this callback to tidy up HTTP connections prior to the exception being thrown.

  • _blessed_thing_to_uri($thing)

    Class method called by Web::Magic's constructor to convert a blessed object to a URI string.


Web::Magic's exceptions are subclasses of Exception::Class::Base - the documentation for that class lists several useful functions, such as:

 Web::Magic::Exception->Trace(1); # enable full stack traces


Cause: a general Web::Magic error has occurred.


Cause: an assertion failed.

Additional fields: assertion_name, assertion_coderef, http_request, http_response.


Cause: cannot coerce from a Perl object to HTTP message body.

Additional fields: body.


Cause: a method has been called on a Web::Magic object which is in the wrong state to perform that method.


Cause: attempt to cancel a request that has already been performed.


Cause: attempt to set request body for a request that has already been performed.

Additional fields: attempted_body, used_body.


Cause: attempt to set a request header for a request that has already been performed.

Additional fields: attempted_value, used_value, header.


Cause: attempt to set request method for a request that has already been performed.

Additional fields: attempted_method, used_method.


Cause: attempt to perform a request that was explicitly cancelled.

Additional fields: cancellation.


Cause: cannot coerce from an HTTP message body to a Perl object, because is is of the wrong type.

Additional fields: content_type.


Use of upper/lower case in method names

At first glance, Web::Magic seems a little chaotic...

    ->foreach(sub{ say $_->{href} })

But there is actually a logic to it.

  • Methods that set HTTP request methods (as well as certain Perl built-ins - DESTROY, etc) are UPPERCASE. That is, they follow the case conventionally used in HTTP over the wire.

  • Methods that set HTTP request headers are Title_Case_With_Underscores. That is, they follow the case conventionally used in HTTP over the wire, just substituting hyphens for underscores.

  • Methods delegated to other Perl modules (e.g. getElementsByTagName from XML::LibXML::Node and get_statements from RDF::Trine::Model) are named exactly as they are in their parent package. This is usually lowerCameCase or lower_case_with_underscores.

  • All other methods use the One True Way: lower_case_with_underscores.

Haven't I seen something like this before?

Web::Magic is inspired in equal parts by http://jquery.com/|jQuery, by modules that take good advantage of chaining (such as Class::Path::Rule), and by the great modules Web::Magic depends on (LWP, RDF::Trine, URI, XML::LibXML, etc).

Some parts of it are quite jQuery-like, such as the ability to select XML and HTML nodes using CSS selectors, and you may recognise this ability from other jQuery-inspired Perl modules such as pQuery, Web::Query, HTML::Query and App::scrape. But while these modules focus on HTML (and to a certain extent XML), Web::Magic aims to offer a similar level coolness for RDF, JSON and feeds. (That is, it is not only useful for the Web of Documents, but also the Web of Data, and REST APIs.)

Web::Magic may also seem to share some of the properties of WWW::Mechanize, in that it downloads stuff and does things with it. But Web::Magic and WWW::Mechanize use quite different models of the world. WWW::Mechanize gives you a single object that is used for multiple HTTP requests, maintaining state between them; Web::Magic uses an object per HTTP request, and by default no state is kept between them (for RESTful resources, there should be no need to). It should be quite easy to use them together, as Web::Magic allows you to set a custom user agent for HTTP requests, and WWW::Mechanize is a subclass of LWP::UserAgent:

 local $Web::Magic::user_agent = WWW::Mechanize->new(...);

Use with HTML::HTML5::DOM

HTML::HTML5::DOM is not a dependency of Web::Magic, but if it's available, then calling the to_dom method on an HTML Web::Magic object will return an HTML::HTML5::DOM::HTMLDocument object.

Why does Web::Magic have so many dependencies?

Mostly because it has so many features but I don't like to reinvent the wheel.

Web::Magic does quite a lot, and if you're only using a small part of its functionality, then this list of dependencies may seem daunting. (However, it's worth noting that many of the dependencies aren't loaded until they're needed.)

That said, there is work underway to split some of the current functionality out into plugins. It is strongly suggested that you indicate which features you are using on import. This will help you avoid surprises when the splits start.

  use Web::Magic -feature => [qw( JSON RDF )];

Currently valid feature names are: HTML, XML, JSON, YAML, RDF, Feeds and Acme. They should be considered case-sensitive.


  • Reduce dependencies.

    See the NOTES above.

  • Make non HTTP Basic authentication easier.

    For example, HTTP Digest auth, WebID. OAuth might be within this scope, but probably not.


Inumerable, almost certainly.

Have a go at enumerating them here: http://rt.cpan.org/Dist/Display.html?Queue=Web-Magic.



LWP::UserAgent, URI, HTTP::Request, HTTP::Response.

XML::LibXML, XML::LibXML::QuerySelector, JSON::JOM, RDF::Trine, XML::Feed, XML::LibXML::Augment, HTML::HTML5::DOM.


Toby Inkster <tobyink@cpan.org>.


This software is copyright (c) 2011-2012 by Toby Inkster.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.