The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Genealogy::Gedcom::Date - Parse GEDCOM dates in French r/German/Gregorian/Hebrew/Julian

Synopsis

A script (scripts/synopsis.pl):

        #!/usr/bin/env perl

        use strict;
        use warnings;

        use Genealogy::Gedcom::Date;

        # --------------------------

        sub process
        {
                my($count, $parser, $date) = @_;

                print "$count: $date: ";

                my($result) = $parser -> parse(date => $date);

                print "Canonical date @{[$_ + 1]}: ", $parser -> canonical_date($$result[$_]), ". \n" for (0 .. $#$result);
                print 'Canonical form: ', $parser -> canonical_form($result), ". \n";
                print "\n";

        } # End of process.

        # --------------------------

        my($parser) = Genealogy::Gedcom::Date -> new(maxlevel => 'debug');

        process(1, $parser, 'Julian 1950');
        process(2, $parser, '@#dJulian@ 1951');
        process(3, $parser, 'From @#dJulian@ 1952 to Gregorian 1953/54');
        process(4, $parser, 'From @#dFrench r@ 1955 to 1956');
        process(5, $parser, 'From @#dJulian@ 1957 to German 1.Dez.1958');

One-liners:

        perl scripts/parse.pl -max debug -d 'Between Gregorian 1701/02 And Julian 1703'

Output:

        Return value from parse():
        [
          {
            canonical => "1701/02",
            flag => "BET",
            kind => "Date",
            suffix => "02",
            type => "Gregorian",
            year => 1701
          },
          {
            canonical => "\@#dJULIAN\@ 1703",
            flag => "AND",
            kind => "Date",
            type => "Julian",
            year => 1703
          }
        ]

        perl scripts/parse.pl -max debug -d 'Int 10 Nov 1200 (Approx)'

Output:

        [
          {
            canonical => "10 Nov 1200 (Approx)",
            day => 10,
            flag => "INT",
            kind => "Date",
            month => "Nov",
            phrase => "(Approx)",
            type => "Gregorian",
            year => 1200
          }
        ]

        perl scripts/parse.pl -max debug -d '(Unknown)'

Output:

        Return value from parse():
        [
          {
            canonical => "(Unknown)",
            kind => "Phrase",
            phrase => "(Unknown)",
            type => "Phrase"
          }
        ]

See the "FAQ" for the explanation of the output arrayrefs.

See also scripts/parse.pl and scripts/compare.pl for sample code.

Lastly, you are strongly encouraged to peruse t/*.t.

Description

Genealogy::Gedcom::Date provides a Marpa-based parser for GEDCOM dates.

Calender escapes supported are (case-insensitive): French r/German/Gregorian/Hebrew/Julian.

Gregorian is the default, and does not need to be used at all.

Comparison of 2 Genealogy::Gedcom::Date-based objects is supported by calling the sub "compare($other_object)" method on one object and passing the other object as the parameter.

Note: compare() can return any one of four (4) values.

See the GEDCOM Specification, p 45.

Installation

Install Genealogy::Gedcom::Date as you would for any Perl module:

Run:

        cpanm Genealogy::Gedcom::Date

or run:

        sudo cpan Genealogy::Gedcom::Date

or unpack the distro, and then either:

        perl Build.PL
        ./Build
        ./Build test
        sudo ./Build install

or:

        perl Makefile.PL
        make (or dmake or nmake)
        make test
        make install

Constructor and Initialization

new() is called as my($parser) = Genealogy::Gedcom::Date -> new(k1 => v1, k2 => v2, ...).

It returns a new object of type Genealogy::Gedcom::Date.

Key-value pairs accepted in the parameter list (see corresponding methods for details [e.g. "date([$date])"]):

o canonical => $integer

Note: Nothing is printed unless maxlevel is set to debug.

o canonical => 0

Data::Dumper::Concise's Dumper() prints the output of the parse.

o canonical => 1

canonical_form() is called on the output of parse() to print a string.

o canonical => 2

canonocal_date() is called on each element in the result from parse(), to print strings on separate lines.

Default: 0.

o date => $date

The string to be parsed.

Each ',' is replaced by a space. See the "FAQ" for details.

Default: ''.

o logger => $aLoggerObject

Specify a logger compatible with Log::Handler, for the lexer and parser to use.

Default: A logger of type Log::Handler which writes to the screen.

To disable logging, just set 'logger' to the empty string (not undef).

o maxlevel => $logOption1

This option affects Log::Handler.

See the Log::Handler::Levels docs.

By default nothing is printed.

Typical values are: 'error', 'notice', 'info' and 'debug'.

The default produces no output.

Default: 'notice'.

o minlevel => $logOption2

This option affects Log::Handler.

See the Log::Handler::Levels docs.

Default: 'error'.

No lower levels are used.

Note: The parameters canonical and date can also be passed to "parse([%args])".

Methods

canonical([$integer])

Here, the [] indicate an optional parameter.

Gets or sets the canonical option, which controls what exactly "parse([%args])" prints when "maxlevel([$string])" is set to debug.

By default nothing is printed.

See "canonical_date($hashref)", next, for sample code.

canonical_date($hashref)

$hashref is either element of the arrayref returned by "parse([%args])". The hashref may be empty.

Returns a date string (or the empty string) normalized in various ways:

o If Gregorian (in any form) was in the original string, it is discarded

This is done because it's the default.

o If any other calendar escape was in the original string, it is preserved

And it's output in all caps.

And as a special case, 'FRENCHR' is returned as 'FRENCH R'.

o If About, etc were in the orginal string, they are discarded

This means the flag key in the hashref is ignored.

Note: This method is called by "parse([%args])" to populate the canonical key in the arrayref of hashrefs returned by parse().

Try:

        perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015'

        perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 0

        perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 1

        perl scripts/parse.pl -max debug -d 'From 21 Jun 1950 to @#dGerman@ 05.Mär.2015' -c 2

canonical_form($arrayref)

Returns a date string containing zero, one or two dates.

This method calls "canonical_date($hashref)" for each element in the $arrayref. The arrayref may be empty.

Then it adds information from the flag key in each element, if present.

For sample code, see "canonical_date($hashref)" just above.

compare($other_object)

Returns an integer 0 .. 3 (sic) indicating the temporal relationship between the invoking object ($self) and $other_object.

Returns one of these values:

        0 if the dates have different date escapes.
        1 if $date_1 < $date_2.
        2 if $date_1 = $date_2.
        3 if $date_1 > $date_2.

Note: Gregorian years like 1510/02 are converted into 1510 before the dates are compared. Create a sub-class and override "normalize_date($date_hash)" if desired.

See scripts/compare.pl for sample code.

See also "normalize_date($date_hash)".

date([$date])

Here, [ and ] indicate an optional parameter.

Gets or sets the date to be parsed.

The date in parse(date => $date) takes precedence over both new(date => $date) and date($date).

This means if you call parse() as parse(date => $date), then the value $date is stored so that if you subsequently call date(), that value is returned.

Note: date is a parameter to new().

error()

Gets the last error message.

Returns '' (the empty string) if there have been no errors.

If Marpa::R2 throws an exception, it is caught by a try/catch block, and the Marpa error is returned by this method.

See "parse([%args])" for more about error().

log($level, $s)

If a logger is defined, this logs the message $s at level $level.

logger([$logger_object])

Here, the [] indicate an optional parameter.

Get or set the logger object.

To disable logging, just set 'logger' to the empty string (not undef), in the call to "new()".

This logger is passed to other modules.

'logger' is a parameter to "new()". See "Constructor and Initialization" for details.

maxlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if an object of type Log::Handler is ceated. See Log::Handler::Levels.

Typical values are: 'notice', 'info' and 'debug'. The default, 'notice', produces no output.

The code emits a message with log level 'error' if Marpa throws an exception, and it displays the result of the parse at level 'debug' if maxlevel is set that high. The latter display uses Data::Dumper::Concise's function Dumper().

'maxlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

minlevel([$string])

Here, the [] indicate an optional parameter.

Get or set the value used by the logger object.

This option is only used if an object of type Log::Handler is created. See Log::Handler::Levels.

'minlevel' is a parameter to "new()". See "Constructor and Initialization" for details.

new([%args])

The constructor. See "Constructor and Initialization".

normalize_date($date_hash)

Normalizes $date_hash for each date during a call to "compare($other_object)".

Override in a sub-class if you wish to change the normalization technique.

parse([%args])

Here, [ and ] indicate an optional parameter.

parse() returns an arrayref. See the "FAQ" for details.

If the arrayref is empty, call "error()" to retrieve the error message.

In particular, the arrayref will be empty if the input date is the empty string.

parse() takes the same parameters as new().

Warning: The array can contain 1 element when 2 are expected. This can happen if your input contains 'From ... To ...' or 'Between ... And ...', and one of the dates is invalid. That is, the return value from parse() will contain the valid date but no indicator of the invalid one.

Extensions to the Gedcom specification

This chapter lists exactly how this code differs from the Gedcom spec.

o Input may be in Unicode
o Input may be in any case
o Input may omit calendar escapes when the date is unambigous
o Any of the following tokens may be used
o abt, about, circa
o aft, after
o and
o bc, b.c., bce
o bef, before
o bet, between
o cal, calculated
o french r, frenchr, german, gregorian, hebrew, julian,
o est, estimated
o from
o German BCE

vc, v.c., v.chr., vchr, vuz, v.u.z.

o German month names

jan, feb, mär, maer, mrz, apr, mai, jun, jul, aug, sep, sept, okt, nov, dez

o Gregorian month names

jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec

o Hebrew month names

tsh, csh, ksl, tvt, shv, adr, ads, nsn, iyr, svn, tmz, aav, ell

o int, interpreted
o to

FAQ

What is the format of the value returned by parse()?

It is always an arrayref.

If the date is like '1950' or 'Bef 1950 BCE', there will be 1 element in the arrayref.

If the date contains both 'From' and 'To', or both 'Between' and 'And', then the arrayref will contain 2 elements.

Each element is a hashref, with various combinations of the following keys. You need to check the existence of some keys before processing the date.

This means missing values (day, month, bce) are never fabricated. These keys only appear in the hashref if such a token was found in the input.

Keys:

o bce

If the input contains any (case-insensitive) BCE indicator, under any calendar escape, the bce key will hold the exact indicator.

o canonical => $string

"parse([%args])" calls "canonical_date($hashref)" to populate this key.

o day => $integer

If the input contains a day, then the day key will be present.

o flag => $string

If the input contains any of the following (case-insensitive), then the flag key will be present:

o Abt or About
o Aft or After
o And
o Bef or Before
o Bet or Between
o Cal or Calculated
o Est or Estimated
o From
o Int or Interpreted
o To

$string will take one of these values (case-sensitive):

o ABT
o AFT
o AND
o BEF
o BET
o CAL
o EST
o FROM
o INT
o TO
o kind => 'Date' or 'Phrase'

The kind key is always present, and always takes the value 'Date' or 'Phrase'.

If the value is 'Phrase', see the phrase and type keys.

During processing, there can be another - undocumented - element in the arrayref. It represents the calendar escape, and in that case kind takes the value 'Calendar'. This element is discarded before the final arrayref is returned to the caller.

o month => $string

If the input contains a month, then the month key will be present. The case of $string will be exactly whatever was in the input.

o phrase => "($string)"

If the input contains a date phrase, then the phrase key will be present. The case of $string will be exactly whatever was in the input.

parse(date => 'Int 10 Nov 1200 (Approx)') returns:

        [
          {
            day => 10,
            flag => "INT",
            kind => "Date",
            month => "Nov",
            phrase => "(Approx)",
            type => "Gregorian",
            year => 1200
          }
        ]

parse(date => '(Unknown)') returns:

        [
          {
            kind => "Phrase",
            phrase => "(Unknown)",
            type => "Phrase"
          }
        ]

See also the kind and type keys.

o suffix => $two_digits

If the year contains a suffix (/00), then the suffix key will be present. The '/' is discarded.

Obviously, this key can only appear when the year is of the Gregorian form 1700/00.

See also the year key below.

o type => $string

The type key is always present, and takes one of these case-sensitive values:

o 'French r'
o German
o Gregorian
o Hebrew
o Julian
o Phrase

See also the kind and phrase keys.

o year => $integer

If the input contains a year, then the year key is present.

If the year contains a suffix (/00), see also the suffix key, above. This means the value of the year key is never "$integer/$two_digits".

When should I use a calendar escape?

o In theory, for every non-Gregorian date

In practice, if the month name is unique to a specific language, then the escape is not needed, since Marpa::R2 and this code automatically handle ambiguity.

Likewise, if you use a Gregorian year in the form 1700/01, then the calendar escape is obvious.

The escape is, of course, always inserted into the values returned by the canonical pair of methods when they process non-Gregorian dates. That makes their output compatible with other software. And no matter what case you use specifying the calendar escape, it is always output in upper-case.

o When you wish to force the code to provide an unambiguous result

All Gregorian and Julian dates are ambiguous, unless they use the year format 1700/01.

So, to resolve the ambiguity, add the calendar escape.

Why is '@' escaped with '\' when Data::Dumper::Concise's Dumper() prints things?

That's just how that module handles '@'.

Does this module accept Unicode?

Yes.

See t/German.t for sample code.

Can I change the default calendar?

No. It is always Gregorian.

Are dates massaged before being processed?

Yes. Commas are replaced by spaces.

French month names

See "Extensions to the Gedcom specification".

German month names

See "Extensions to the Gedcom specification".

Hebrew month names

See "Extensions to the Gedcom specification".

What happens if parse() is given a string like 'To 2000 From 1999'?

The code does not reorder the dates.

Why was this module renamed from DateTime::Format::Gedcom?

The DateTime suite of modules aren't designed, IMHO, for GEDCOM-like applications. It was a mistake to use that name in the first place.

By releasing under the Genealogy::Gedcom::* namespace, I can be much more targeted in the data types I choose as method return values.

Why did you choose Moo over Moose?

My policy is to use the lightweight Moo for all modules and applications.

Trouble-shooting

Things to consider:

o Error message: Marpa exited at (line, column) = ($line, $column) within the input string

Consider the possibility that the parse ends without a successful parse, but the input is the prefix of some input that can lead to a successful parse.

Marpa is not reporting a problem during the read(), because you can add more to the input string, and Marpa does not know that you do not plan to do this.

o You tried to enter the German month name 'Mär' via the shell

Read more about this by running 'perl scripts/parse.pl -h', where it discusses '-d'.

o You mistyped the calendar escape

Check: Are any of these valid?

o @#FRENCH@
o @#JULIAN@
o @#djulian
o @#juliand
o @#djuliand
o @#dJulian@
o Julian
o @#dJULIAN@

Yes, the last 3 are accepted by this module, and the last one is accepted by other software.

o The date is in American format (month day year)
o You used a Julian calendar with a Gregorian year

Dates - such as 1900/01 - which do not fit the Gedcom definition of a Julian year, are filtered out.

See Also

File::Bom::Utils.

Genealogy::Gedcom

DateTime

DateTimeX::Lite

Time::ParseDate

Time::Piece is in Perl core. See http://perltricks.com/article/59/2014/1/10/Solve-almost-any-datetime-need-with-Time-Piece

Time::Duration is more sophisticated than Time::Elapsed

Time::Moment implements ISO 8601

http://blogs.perl.org/users/buddy_burden/2015/09/a-date-with-cpan-part-1-state-of-the-union.html

http://blogs.perl.org/users/buddy_burden/2015/10/a-date-with-cpan-part-2-target-first-aim-afterwards.html

http://blogs.perl.org/users/buddy_burden/2015/10/-a-date-with-cpan-part-3-paving-while-driving.html

Machine-Readable Change Log

The file Changes was converted into Changelog.ini by Module::Metadata::Changes.

Version Numbers

Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.

Repository

https://github.com/ronsavage/Genealogy-Gedcom-Date.

Support

Email the author, or log a bug on RT:

https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom::Date.

Credits

Thanx to Eugene van der Pijll, the author of the Gedcom::Date::* modules.

Thanx also to the authors of the DateTime::* family of modules. See http://datetime.perl.org/wiki/datetime/dashboard for details.

Thanx for Mike Elston on the perl-gedcom mailing list for providing French month abbreviations, amongst other information pertaining to the French language.

Thanx to Michael Ionescu on the perl-gedcom mailing list for providing the grammar for German dates and German month abbreviations.

Author

Genealogy::Gedcom::Date was written by Ron Savage <ron@savage.net.au> in 2011.

Homepage: http://savage.net.au/index.html.

Copyright

Australian copyright (c) 2011, Ron Savage.

        All Programs of mine are 'OSI Certified Open Source Software';
        you can redistribute them and/or modify them under the terms of
        The Perl License, a copy of which is available at:
        http://dev.perl.org/licenses/