Author image Philip Abrahamson

NAME

Lingua::EN::TitleParse - Parse titles in people's names

SYNOPSIS

  use Lingua::EN::TitleParse;

  # Functional interface
  my ($title, $name) = Lingua::EN::TitleParse->parse("Mr Joe Bloggs");

  # $title = "Mr", $name = "Joe Bloggs"

  # OO interface
  $title_obj      = Lingua::EN::TitleParse->new();
  ($title, $name) = $title_obj->parse("Mr Joe Bloggs");

  # $title = "Mr", $name = "Joe Bloggs"

  # Use your own titles with the OO interface
  #
  @titles = ('Master', 'International Master', 'Grandmaster');
  $title_obj  = Lingua::EN::TitleParse->new( titles => \@titles );
  ($title, $name) = $title_obj->parse("Grandmaster Joe Bloggs");

  # $title = "Grandmaster", $name = "Joe Bloggs"

  # Retrieve the list of titles
  @titles = $title_obj->titles;

  # Optionally get cleaned titles on output
  $title_obj      = Lingua::EN::TitleParse->new( clean => 1 );
  ($title, $name) = $title_obj->parse("mR. Joe Bloggs");

  # $title = "Mr", $name  = "Joe Bloggs"
 
  # Without 'clean' turned on
  $title_obj      = Lingua::EN::TitleParse->new();
  ($title, $name) = $title_obj->parse("mR. Joe Bloggs");

  # $title = "mR.", $name  = "Joe Bloggs"

DESCRIPTION

This module parses strings containing people's names to identify titles, like "Mr", "Mrs", etc, so the names and titles can be separated.

e.g. "Mr Joe Bloggs" will be parsed to "Mr", and "Joe Bloggs".

The module handles "fuzziness" such as changes of case and punctuation characters: "Mr", "MR", "Mr.", and "mr" will all be recognised correctly.

It differs from another CPAN module, Lingua::EN::NameParse, in two key respects:

Firstly, Lingua::EN::TitleParse performs well irrespective of the number of titles being matched against. While Lingua::EN::NameParse loops through a series of regular expressions, and suffers when the set of titles being matched is long, Lingua::EN::TitleParse uses hash-lookups after "normalising" each name string, providing consistently good performance.

Secondly it's only focused on parsing titles in names, whereas Lingua::EN::NameParse attempts much more. However the extra intelligence of Lingua::EN::NameParse can come at the cost of predictablity. Lingua::EN::TitleParse is more conservative, and by default makes no changes to the case or content (with the exception of compressing extra white-space) of what was input, effectively only splitting the input string in two. (But that said, there is an option to output cleaned titles).

We're using the same titles Lingua::EN::NameParse uses (their "extended set") with minor additions, but your own set of titles can be imported instead.

METHODS

parse

This method identifies a title in a name and splits the name out into the title and the rest of the name.

  # e.g. via the functional interface
  my ($title, $name) = Lingua::EN::TitleParse->parse("Mr Joe Bloggs");

  # e.g. via the Object-Oriented interface
  $title_obj      = Lingua::EN::TitleParse->new();
  ($title, $name) = $title_obj->parse("Mr Joe Bloggs");
titles

This method returns an array of the titles in use. This will either be the default titles, or custom titles input during construction.

  # e.g. via the functional interface
  @titles = Lingua::EN::TitleParse->titles;

  # e.g. via the Object-Oriented interface
  $title_obj = Lingua::EN::TitleParse->new( titles => \@custom_titles );
  @titles = $title_obj->titles;

EXPORT

None.

SEE ALSO

Lingua::EN::NameParse

AUTHOR

Philip Abrahamson, <PhilipAbrahamson@venda.com>

COPYRIGHT AND LICENSE

Copyright (C) 2013 by Venda Ltd

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.4 or, at your option, any later version of Perl 5 you may have available.