Genealogy::Gedcom - An OS-independent processor for GEDCOM data
See Genealogy::Gedcom::Reader::Lexer.
Genealogy::Gedcom provides a processor for GEDCOM data.
See The GEDCOM Specification Ged551-5.pdf.
This module is available as a Unix-style distro (*.tgz).
See http://savage.net.au/Perl-modules/html/installing-a-module.html for help on unpacking and installing distros.
Install Genealogy::Gedcom as you would for any Perl module:
Perl
Run:
cpanm Genealogy::Gedcom
or run:
sudo cpan Genealogy::Gedcom
or unpack the distro, and then either:
perl Build.PL ./Build ./Build test sudo ./Build install
or:
perl Makefile.PL make (or dmake or nmake) make test make install
Yes. The input files are assumed to be in utf8. Files in ISO-8859-1 work automatically, too.
The default output log also handles utf8.
No. ANSEL was an invention before Unicode. Just create a utf-8 encoded file, such as data/sample.7.ged.
That file was generated from data/GEDCOMANSELTable.xhtml by scripts/parse.sample.7.pl.
Thanx for Tamura Jones for creating that web page.
In the same way as GEDCOM tags.
They are defined by having a leading '_', as well as same syntax as GEDCOM files. That is:
Each user-defined tag is stand-alone, meaning they can't be extended with CONC or CONT tags in the way some GEDCOM tags can.
See data/sample.4.ged.
Nothing is done with them, meaning e.g. text flowing from a NOTE (say) onto a CONC or CONT is not concatenated.
Currently then, even GEDCOM tags are stand-alone.
Items are stored in an arrayref. This arrayref is available via the "items()" method.
This method returns the same data as does "items()" in Genealogy::Gedcom::Reader.
Each element in the array is a hashref of the form:
{ count => $n, data => $a_string level => $n, line_count => $n, tag => $a_tag, type => $a_string, xref => $a_string, }
Key-value pairs are:
Items are numbered from 1 up, so this is the array index + 1.
Note: Blank lines in the input file are skipped.
This is any data associated with the tag.
Given the GEDCOM record:
1 NAME Given Name /Surname/
then data will be 'Given Name /Surname/', i.e. the text after the tag.
1 SUBM @SUBM1@
then data will be 'SUBM1'.
As with xref (below), the '@' characters are stripped.
The is the level from the GEDCOM data.
This is the line number from the GEDCOM data.
This is the GEDCOM tag.
This is a string indicating what broad class the tag refers to. Values:
Used for various cases.
If the type is 'Date', then it has been successfully parsed.
If parsing failed, the value will be 'Invalid date'.
0 @I82@ INDI
then xref will be 'I82'.
As with data (above), the '@' characters are stripped.
There is no perfect answer as to what should be a warning and what should be an error.
So, the author's philosophy is that unrecoverable states are errors, and the code calls 'die'. See "Under what circumstances does the code call 'die'?".
And, the log level 'error' is not used. All validation failures are logged at level warning, leaving interpretation up to the user. See "How does logging work?".
Details:
Xrefs (pointers) are checked that they point to an xref which exists. Each dangling xref is only reported once.
Xrefs which are (potentially) pointed to are checked for uniqueness.
Maximum string lengths are checked as per the GEDCOM Specification.
Minimum string lengths are checked as per the value of the 'strict' option to new().
Validation is mandatory, even with the 'strict' option set to 0. 'strict' only affects the minimum string length acceptable.
Tag nesting is validated by the mechanism of nested method calls, with each method (called tag_*) knowing what tags it handles, and with each nested call handling its own tags.
This process starts with the call to tag_lineage(0, $line) in method "run()".
The lexer reports the first unexpected tag, meaning it is not a GEDCOM tag and it does not start with '_'.
All validation failures are reported as log messages at level 'warning'.
Here are some suggestions from the mailing list:
This means check that each tag has all its mandatory sub-tags.
http://www.tamurajones.net/GEDCOMValidation.xhtml.
Many such checks are possible. E.g. Attribute type (p 43 of GEDCOM Specification) must be one of: CAST | EDUC | NATI | OCCU | PROP | RELI | RESI | TITL | FACT.
A proposal re UUIDs.
When new() is called as new(maxlevel => 'debug'), each method entry is logged at level 'debug'.
This has the effect of tracing all code which processes tags.
Since the default value of 'maxlevel' is 'info', all this output is suppressed by default. Such output is mainly for the author's benefit.
Log levels are, from highest (i.e. most output) to lowest: 'debug', 'info', 'warning', 'error'. No lower levels are used. See Log::Handler::Levels.
'maxlevel' defaults to 'info' and 'minlevel' defaults to 'error'. In this way, levels 'info' and 'warning' are reported by default.
Currently, level 'error' is not used. Fatal errors cause 'die' to be called, since they are unrecoverable. See "Under what circumstances does the code call 'die'?".
When new() is called as new(report_items => 1), the items are logged at level 'info'.
These are reported at level 'warning'.
This is a programming error.
This is a user (run time) error.
This is a user (data preparation) error.
By sub-classing.
It's the basis of a long-term project to write a new interface to GEDCOM files.
This is a dummy module at the moment, which just occupies the namespace. It holds the FAQ though.
This employs the lexer to do the work. It may one day use the new (currently non-existent) parser too.
This does the real work for finding tokens within GEDCOM files.
Run: perl scripts/lex.pl -help
Helps me debug code.
Runs the lexer on a file and reports some statictics. Try lex.pl -h.
This reads data/sample.7.html and writes data/sample.7.ged.
Reads all files in data/ and checks that any each date is valid.
https://github.com/ronsavage/Genealogy-Gedcom
Genealogy::Gedcom::Date.
<Gedcom::Date>.
The file Changes was converted into Changelog.ini by Module::Metadata::Changes.
Version numbers < 1.00 represent development versions. From 1.00 up, they are production versions.
Many thanks are due to the people who worked on Gedcom.
Email the author, or log a bug on RT:
https://rt.cpan.org/Public/Dist/Display.html?Name=Genealogy::Gedcom.
Genealogy::Gedcom was written by Ron Savage <ron@savage.net.au> in 2011.
Home page: http://savage.net.au/index.html.
Australian copyright (c) 2011, Ron Savage.
All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Perl License, a copy of which is available at: http://dev.perl.org/licenses/
To install Genealogy::Gedcom, copy and paste the appropriate command in to your terminal.
cpanm
CPAN shell
perl -MCPAN -e shell install Genealogy::Gedcom
For more information on module installation, please visit the detailed CPAN module installation guide.