The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

HTML::ParseBrowser - Simple interface for User Agent string parsing.

SYNOPSIS

  use HTML::ParseBrowser;
  my $ua = HTML::ParseBrowser->new($ENV{HTTP_USER_AGENT});
  my $browsername = $ua->name;

  my $browser = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows 98; Win 9x 4.90)';
                    # BTW: That's IE 5.5 on Windows ME
  $ua->Parse($new_browser);
  $browsername = $ua->name;
  my $os = $ua->os_type;

  $browser = 'Mozilla 3.0 - Mozilla/3.0 (Linux 2.2.19 i686; U) Opera 5.0  [en]';
                 # BTW: that's Opera 5.0 on Linux, English
  $ua->Parse($new_browser);
  my $lingo = $ua->language;

DESCRIPTION

The HTML::ParseBrowser is an Object-Oriented interface for parsing a User Agent string. It provides simple autoloaded methods for retrieving both the actual values stored in the interpreted (and, so far, correct) information that these wildly varying and nonstandardised strings attempt to convey.

It provides the following methods:

new() (constructor method)

Accepts an optional User Agent string as an argument. If present, the string will be parsed and the object populated. Either way the base object will be created.

Parse()

Intended to be given a User Agent string as an argument. If present, it will be parsed and the object repopulated.

If called without a true argument or with the argument '-' Parse() will simply depopulate the object and return undef. (This is useful for parsing logs, which often fill in a '-' for a null value.)

Case-insensitive Access Methods and properties.

Any of the methods below may be called. Properties (->{whatever}) are case sensitive and are lowercase. Called as methods (the preferred way ->whatever() ) they are NOT case sensitive. As a result you can say $ua->NAME, $ua->name, $ua->Name, or $ua->nAMe if you so feel inclined.

If an item is not able to be parsed, the methods will return undef. Calling things in the method way will not cause autovivification, while checking as properties without using exists() in a conditional first will cause autovivifivation first (and, in the case of the version subproperties, even exists() will do so - Ack!)

Note that in some cases it is absolutely impossible to tell certain details. Nothing is guaranteed to be present -- not even 'name'.

It is also possible for someone to make their browser lie about the operating system they are using (especially with spiders) -- and in some cases, they may even be using more than one at the same time (like running Konqueror through an X-Windows client on a Windows box).

user_agent()

The actual original User Agent string you passed Parse() or new()

languages()

Returns an arrayref of all languages recognised by placement and context in the User_Agent string. Uses English names of languages encountered where comprehended, ANSI code otherwise. Feel free to add to the hash to cover more languages.

language()

Returns the language of the browser, interpreted as an English language name if possible, as above. If more than one language are uncovered in the string, chooses the one most repeated or the first encountered on any tie.

langs()

Like languages() above, except uses ANSI standard language codes always.

lang()

Like language() above, but only containing the ANSI language code

detail()

The stuff inside any parentheses encountered. (Note that if for some really weird reason some User Agent string has two sets of parens, this string will contain the entire contents from the first paren to the last, including any intervening close and open parens. Anyway, they aren't supposed to do that, and such a case would likely only exist in cases of spiders and homebrewed browsers.)

useragents()

Returns an arrayref of all intelligible standard User Agent engine/version pairs, and Opera's, to, if applicable. (Please note that this is despiute the fact that Opera's is _not_ intelligible.)

properties()

Returns an arrayref of the stuff in details() broken up by /;\s+/

name()

The _interpreted_ name of the browser. This value may not actually appear anywhere inside the string you handed it. Netscape Communicator provides a good example of this oddness.

version()

Returns a hashref containing v, major, and minor, as explained below and keyed as such.

v()

The full version of the useragent (i.e. '5.6.0') To access as a property, grab $ua->{version}->{v}

major()

The Major version number (i.e. '5') To access as a property, grab $ua->{version}->{major}

minor()

The Minor version number (i.e. '6.0') To access as a property, grab $ua->{version}->{minor}

os()

The Operating System the browser is running on.

ostype()

The _interpreted_ type of the Operating System. For instance, 'Windows' rather than 'Windows 9x 4.90'

osvers()

The _interpreted_ version of the Operating System. For instance, 'ME' rather than '9x 4.90'

Note: Windows NT versions below 5 will show up with ostype 'Windows NT' and osvers as appropriate. Windows NT version 5 will show up as ostype 'Windows NT' and osvers '2000'. Windows NT 5.1+ will show up as osvers 'XP', until it gets to 6, where it will become Vista, until 6.06 which will be reported as 'Server 2008'.

osarc()

While rarely defined, some User Agent strings happily announce some detail or another about the Architecture they are running under. If this happens, it will be reflected here. Linux ('i686') and Mac ('PPC') are more likely than Windows to do this, strangely.

It should be noted, and is of great and vast world-shattering importance, that Firefox 3 reports the wrong OS version on Vista, so it's impossible to tell FF3 on Vista from FF3 on XP. It is suspected that this was done deliberately by the Mozilla group to avoid embarrasing Vista users by exposing about how they ended up stuck with that piece of shit.

SEE ALSO

Modules

HTTP::BrowserDetect (similar goal but with an opposite approach)

I'm thinking 'see also' in the sense of bad example. No offence to that module's writer, but "Is this IE? Yay! Is this 7? Yay" is a bass-ackwards approach to how to detect useragents. It's inherently unuseful. I wrote this deliberately because I couldn't stand that approach, because 'What is this?' made more sense to me, and moreover because it's robust. It's been seven years since the last update, and it just finally really kinda needed one (because Konqueror and a few others weren't detecting quite right on the name() results).

Web Sites

AUTHOR

Dodger (aka Sean Cannon)

COPYRIGHT

The HTML::ParseBrowser module and code therein is Copyright (c)2001 Sean Cannon, Bensalem, Pennsylvania, 2008 Sean Cannon, San Jose, California

All rights reserved. All rites reversed.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.