HTML::ParseBrowser - Simple interface for User-Agent string parsing
use HTML::ParseBrowser; # Opera 6 on Windows 98, French my $uastring = 'Mozilla/4.0 (compatible; MSIE 5.0; Windows 98) Opera 6.0 [fr]'; my $ua = HTML::ParseBrowser->new($uastring); print "Browser : ", $ua->name, "\n"; print "Version : ", $ua->v, "\n"; print "OS : ", $ua->os, "\n"; print "Language : ", $ua->language, "\n";
HTML::ParseBrowser is a module for parsing a User-Agent string, and providing access to parts of the string, such as browser name, version, and operating system. Some of the returned values are exactly as they appeared in the User-Agent string, and others are interpreted; for example Internet Explorer identifies itself as MSIE, but the name method will return Internet Explorer.
It provides the following methods:
- new() (constructor method)
Accepts an optional User Agent string as an argument. If present, the string will be parsed and the object populated. Either way the base object will be created.
Intended to be given a User Agent string as an argument. If present, it will be parsed and the object repopulated.
If called without a true argument or with the argument '-' Parse() will simply depopulate the object and return undef. (This is useful for parsing logs, which often fill in a '-' for a null value.)
- Access methods
The following methods are used to access different parts of the User-Agent string.
If the particular piece of information wasn't included in the User-Agent string provided, or it couldn't be parsed, then the relevant method will return undef.
Also, not that some browsers let the user change the User-Agent string, as do many libraries. So there is no guarantee that a User-Agent string you find in a logfile is valid, or makes sense.
The original User-Agent string you passed to Parse() or new().
Returns an arrayref of all languages recognised by placement and context in the User-Agent string. Uses English names of languages encountered where comprehended, or the ISO two-letter language code otherwise.
Returns the language of the browser, interpreted as an English language name if possible, as above. If more than one language are uncovered in the string, chooses the one most repeated or the first encountered on any tie.
Like languages() above, except uses ISO standard language codes always.
Like language() above, but only containing the ISO language code.
The stuff inside any parentheses encountered. If the User-Agent string contains more than one set of parentheses, this method will return the result of concatenating all of the. This seems sub-optimal, but works for the moment.
Returns an arrayref of all intelligible standard User Agent engine/version pairs, and Opera's, to, if applicable. (Please note that this is despiute the fact that Opera's is not intelligible.)
Returns an arrayref of the stuff in details() broken up by /;\s+/
The interpreted name of the browser. This value may not actually appear anywhere inside the string you handed it. For example, Internet Explorer identifies itself in the User-Agent string as MSIE, but this method will return Internet Explorer.
Returns a hashref containing v, major, and minor, as explained below and keyed as such.
The full version of the useragent (i.e. '5.6.0').
The Major version number. For Safari 5.1 this method would return 5.
The Minor version number. For Opera 9.0.1, this method would return 0.
The Operating System the browser is running on.
The interpreted type of the Operating System. For instance, 'Windows' rather than 'Windows 9x 4.90' For 'Android',
os()returns 'Android' and
The interpreted version of the Operating System. For instance, 'ME' rather than '9x 4.90'
Note: Windows NT versions below 5 will show up with ostype 'Windows NT' and osvers as appropriate. Windows NT version 5 will show up as ostype 'Windows NT' and osvers '2000'. Windows NT 5.1+ will show up as osvers 'XP', until it gets to 6, where it will become Vista, until 6.06 which will be reported as 'Server 2008'.
While rarely defined, some User-Agent strings happily announce some detail or another about the Architecture they are running under. If this happens, it will be reflected here. Linux ('i686') and Mac ('PPC') are more likely than Windows to do this, strangely.
Apparently, Firefox 3 reports the wrong OS version on Vista, so it's impossible to tell FF3 on Vista from FF3 on XP.
I have done a review of all CPAN modules for parsing the User-Agent string. If you have a specific need, it may be worth reading the review, to find the best match:
In brief, the following modules are worth considering.
Parse::HTTP::UserAgent has best overall coverage of different browsers and other user agents.
HTTP::DetectUserAgent doesn't have as good coverage, but handles modern browsers well, and is the fastest module, so if you're processing large logfiles, this might be the best choice.
HTTP::UserAgentString::Parser is by far the fastest, and has good coverage of modern browsers.
Woothee is available for a number of programming languages, not just Perl. It is faster than most of the modules, and has good coverage of the most popular browsers, but not as good overall coverage.
HTTP::BrowserDetect has poorest coverage of the modules listed here, and doesn't do well at recognising version numbers. It's the best module for detecting whether a given agent is a robot/crawler though.
Dodger (aka Sean Cannon)
Recent changes by Neil Bowers.
COPYRIGHT AND LICENSE
The HTML::ParseBrowser module and code therein is Copyright (c) 2001-2008 Sean Cannon
Changes in 1.01 and later are Copyright (C) 2012-2014, Neil Bowers.
All rights reserved. All rights reversed.
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.