DateTime::Format::Alami - Parse human date/time expression (base class)
This document describes version 0.16 of DateTime::Format::Alami (from Perl distribution DateTime-Format-Alami), released on 2017-07-10.
For English:
use DateTime::Format::Alami::EN; my $parser = DateTime::Format::Alami::EN->new(); my $dt = $parser->parse_datetime("2 hours 13 minutes from now");
Or you can also call as class method:
my $dt = DateTime::Format::Alami::EN->parse_datetime("yesterday");
To parse duration:
my $dtdur = DateTime::Format::Alami::EN->parse_datetime_duration("2h"); # 2 hours
For Indonesian:
use DateTime::Format::Alami::ID; my $parser = DateTime::Format::Alami::ID->new(); my $dt = $parser->parse_datetime("5 jam lagi");
my $dt = DateTime::Format::Alami::ID->parse_datetime("hari ini");
my $dtdur = DateTime::Format::Alami::ID->parse_datetime_duration("2h"); # 2 days
This class parses human/natural date/time/duration string and returns DateTime (or DateTime::Duration) object. Currently it supports English and Indonesian. The goal of this module is to make it easier to add support for other human languages.
To actually use this class, you must use one of its subclasses for each human language that you want to parse.
There are already some other DateTime human language parsers on CPAN and elsewhere, see "SEE ALSO".
DateTime::Format::Alami is base class. Each human language is implemented in a separate DateTime::Format::Alami::<ISO_CODE> module (e.g. DateTime::Format::Alami::EN and DateTime::Format::Alami::EN) which is a subclass.
DateTime::Format::Alami::<ISO_CODE>
Parsing is done using a single recursive regex (i.e. containing (?&NAME) and (?(DEFINE)) patterns, see perlre). This regex is composed from pieces of pattern strings in the p_* and o_* methods, to make it easier to override in an OO-fashion.
(?&NAME)
(?(DEFINE))
p_*
o_*
A pattern string that is returned by the p_* method is a normal regex pattern string that will be compiled using the /x and /i regex modifier. The pattern string can also refer to pattern in other o_* or p_* method using syntax <o_foo> or <p_foo>. Example, o_today for English might be something like:
<o_foo>
<p_foo>
o_today
sub p_today { "(?: today | this \s+ day )" }
Other examples:
sub p_yesterday { "(?: yesterday )" } sub p_dateymd { join( "", '(?: <o_dayint> \\s* ?<o_monthname> | <o_monthname> \\s* <o_dayint>\\b|<o_monthint>[ /-]<o_dayint>\\b )', '(?: \\s*[,/-]?\\s* <o_yearint>)?' )} sub o_date { "(?: <p_today>|<p_yesterday>|<p_dateymd>)" } sub p_time { "(?: <o_hour>:<o_minute>(?:<o_second>)? \s* <o_ampm> )" } sub p_date_time { "(?: <o_date> (?:\s+ at)? <o_time> )" }
When a pattern from p_* matches, a corresponding action method a_* will be invoked. Usually the method will set or modify a DateTime object in $self->{_dt}. For example, this is code for a_today:
a_*
$self->{_dt}
a_today
sub a_today { my $self = shift; $self->{_dt} = DateTime->today; }
The patterns from all p_* methods will be combined in an alternation to form the final pattern.
An o_* pattern is just like p_*, but they will not be combined into the final pattern and matching it won't execute a corresponding a_* method.
And there are also w_* methods which return array of strings.
w_*
Parsing duration is similar, except the method names are pdur_*, odur_* and adur_*.
pdur_*
odur_*
adur_*
See an example in existing DateTime::Format::Alami::* module. Basically you just need to supply the necessary patterns in the p_* methods. If you want to introduce new p_* method, don't forget to supply the action too in the a_* method.
DateTime::Format::Alami::*
Constructor. You actually must instantiate subclass instead.
Parse/extract date/time expression in $str. Die if expression cannot be parsed. Otherwise return DateTime object (or string/number if format option is verbatim/epoch, or hash if format option is combined) or array of objects/strings/numbers (if returns option is all/all_cron).
$str
format
verbatim
epoch
combined
returns
all
all_cron
Known options:
time_zone => str
Will be passed to DateTime constructor.
format => str (DateTime|verbatim|epoch|combined)
The default is DateTime, which will return DateTime object. Other choices include verbatim (returns the original text), epoch (returns Unix timestamp), combined (returns a hash containing keys like DateTime, verbatim, epoch, and other extra information: pos [position of pattern in the string], pattern [pattern name], m [raw named capture groups], uses_time [whether the date involves time of day]).
DateTime
pos
pattern
m
uses_time
You might think that choosing epoch or verbatim could avoid the overhead of DateTime, but actually you can't since DateTime is used as the primary format during parsing. The epoch is retrieved from the DateTime object using the epoch method.
prefers => str (nearest|future|past)
NOT YET IMPLEMENTED.
This option decides what happens when an ambiguous date appears in the input. For example, "Friday" may refer to any number of Fridays. Possible choices are: nearest (prefer the nearest date, the default), future (prefer the closest future date), past (prefer the closest past date).
nearest
future
past
returns => str (first|last|earliest|latest|all|all_cron)
If the text has multiple possible dates, then this argument determines which date will be returned. Possible choices are: first (return the first date found in the string, the default), last (return the final date found in the string), earliest (return the date found in the string that chronologically precedes any other date in the string), latest (return the date found in the string that chronologically follows any other date in the string), all (return all dates found in the string, in the order they were found in the string), all_cron (return all dates found in the string, in chronological order).
first
last
earliest
latest
When all or all_cron is chosen, function will return array(ref) of results instead of a single result, even if there is only a single actual result.
Parse/extract duration expression in $str. Die if expression cannot be parsed. Otherwise return DateTime::Duration object (or string/number if format option is verbatim/seconds, or hash if format option is combined) or array of objects/strings/numbers (if returns option is all/all_sorted).
seconds
all_sorted
format => str (Duration|verbatim|seconds|combined)
The default is Duration, which will return DateTime::Duration object. Other choices include verbatim (returns the original text), seconds (returns number of seconds, approximated), combined (returns a hash containing keys like Duration, verbatim, seconds, and other extra information: pos [position of pattern in the string], pattern [pattern name], m [raw named capture groups]).
Duration
You might think that choosing seconds or verbatim could avoid the overhead of DateTime::Duration, but actually you can't since DateTime::Duration is used as the primary format during parsing. The number of seconds is calculated from the DateTime::Duration object using an approximation (for example, "1 month" does not convert exactly to seconds).
returns => str (first|last|smallest|largest|all|all_sorted)
If the text has multiple possible durations, then this argument determines which date will be returned. Possible choices are: first (return the first duration found in the string, the default), last (return the final duration found in the string), smallest (return the smallest duration), largest (return the largest duration), all (return all durations found in the string, in the order they were found in the string), all_sorted (return all durations found in the string, in smallest-to-largest order).
smallest
largest
When all or all_sorted is chosen, function will return array(ref) of results instead of a single result, even if there is only a single actual result.
It is an Indonesian word, meaning "natural".
DateTime::Format::Natural (DF:Natural) is a more established module (first released on 2006) and can understand a bit more English expression like 'last day of Sep'. Aside from English, it does not yet support other languages.
DFA:EN's parse_datetime_duration() produces a DateTime::Duration object while DF:Natural's parse_datetime_duration() returns two DateTime objects instead. In other words, DF:Natural can parse "from 23 Jun to 29 Jun" in addition to "for 2 weeks".
parse_datetime_duration()
DF:Natural in general is slightly more strict about the formats it accepts, e.g. it rejects Jun 23st (the error message even gives hints that the suffix must be 'rd'). DF:Natural can give a detailed error message on why parsing has failed (see its error() method).
Jun 23st
error()
DateTime::Format::Flexible (DF:Flexible) is another established module (first released in 2007) that, aside from parsing human expression (like 'tomorrow', 'sep 1st') can also parse date/time in several other formats like RFC 822, making it a convenient module to use as a 'one-stop' solution to parse date. Compared to DF:Natural, it has better support for timezone but cannot parse some English expressions. Aside from English, it currently supports German and Spanish. It does not support parsing duration expression.
This module itself: DateTime::Format::Alami (DF:Alami) is yet another implementation. Internally, it uses recursive regex to make parsing simpler and adding more languages easier. It requires perl 5.14.0 or newer due to the use of (?{ ... }) code blocks inside regular expression (while DF:Natural and DF:Flexible can run on perl 5.8+). It currently supports English and Indonesian. It supports parsing duration expression and returns DateTime::Duration object. It has the smallest startup time (see see Bencher::Scenario::DateTimeFormatAlami::Startup).
(?{ ... })
Performance-wise, all the modules are within the same order of magnitude (see Bencher::Scenario::DateTimeFormatAlami::Parsing).
Please visit the project's homepage at https://metacpan.org/release/DateTime-Format-Alami.
Source repository is at https://github.com/perlancar/perl-DateTime-Format-Alami.
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=DateTime-Format-Alami
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
Date::Extract. DateTime::Format::Alami has some features of Date::Extract so it can be used to replace Date::Extract.
DateTime::Format::Flexible. See "FAQ".
For Indonesian: DateTime::Format::Indonesian, Date::Extract::ID (currently this module uses DateTime::Format::Alami::ID as its backend).
For English: DateTime::Format::Natural. See "FAQ".
DateTime::Format::Human deals with formatting and not parsing.
Natt Java library, which the last time I tried sometimes gives weird answer, e.g. "32 Oct" becomes 1 Oct in the far future. http://natty.joestelmach.com/
Duckling Clojure library, which can parse date/time as well as numbers with some other units like temperature. https://github.com/wit-ai/duckling
perlancar <perlancar@cpan.org>
This software is copyright (c) 2017, 2016, 2014 by perlancar@cpan.org.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install DateTime::Format::Alami, copy and paste the appropriate command in to your terminal.
cpanm
cpanm DateTime::Format::Alami
CPAN shell
perl -MCPAN -e shell install DateTime::Format::Alami
For more information on module installation, please visit the detailed CPAN module installation guide.