NAME

Chess::PGN::Parse - reads and parses PGN (Portable Game Notation) Chess files

SYNOPSIS

    use Chess::PGN::Parse;
    use English qw( -no_match_vars );
    my $pgnfile = "kk_2001.pgn";
    my $pgn = new Chess::PGN::Parse $pgnfile 
        or die "can't open $pgnfile\n";
    while ($pgn->read_game()) {
        print $pgn->white, ", " , $pgn->black, ", ", 
            $pgn->result, ", ",
            $pgn->game, "\n";
    }


    use Chess::PGN::Parse;
    my $text ="";
    {
        local $INPUT_RECORD_SEPARATOR = undef;
        open PGN "< $pgnfile" or die;
        $text = <PGN>;
        close $text;
    }
    # reads from string instead of a file
    my $pgn = new Chess::PGN::Parse undef, $text; 
    while ($pgn->read_game()) {
        print $pgn->white, ", " , $pgn->black, ", ", 
            $pgn->result, ", ",
            $pgn->game, "\n";
    }

    use Chess::PGN::Parse;
    my $pgnfile = "kk_2001.pgn";
    my $pgn = new Chess::PGN::Parse $pgnfile 
        or die "can't open $pgnfile\n";
    my @games = $pgn->smart_read_all();

DESCRIPTION

Chess::PGN::Parse offers a range of methods to read and manipulate Portable Game Notation files. PGN files contain chess games produced by chess programs following a standard format (http://www.schachprobleme.de/chessml/faq/pgn/). It is among the preferred means of chess games distribution. Being a public, well established standard, PGN is understood by many chess archive programs. Parsing simple PGN files is not difficult. However, dealing with some of the intricacies of the Standard is less than trivial. This module offers a clean handle toward reading and parsing complex PGN files.

A PGN file has several tags, which are key/values pairs at the header of each game, in the format [key "value"]

After the header, the game follows. A string of numbered chess moves, optionally interrupted by braced comments and recursive parenthesized variants and comments. While dealing with simple braced comments is straightforward, parsing nested comments can give you more than a headache.

Chess::PGN::Parse most immediate methods are: read_game() reads one game, separating the tags and the game text.

    parse_game() parse the current game, and stores the moves into an 
        array and optionally saves the comments into an array of hashes
        for furter usage. It can deal with nested comments and recursive
        variations.

    quick_parse_game() Same as the above, but doesn't save the comments, 
        which are just stripped    from the text. It can't deal with nested
        comments. Should be the preferred method when we know that we are
        dealing with simple PGNs.

    smart_parse_game() Best of the above methods. A  preliminary check
        will call parse_game() or quick_parse_game(), depending on the
        presence of nested comments in the game.

    read_all(), quick_read_all(), smart_read_all() will read all the records
        in the current PGN file and return an array of hashes with all the
        parsed details from the games.

Parsing games

Parsing PGN games is actually two actions: reading and parsing. The reading will only identify the two components of a game, i.e. the tags and the moves text. During this phase, the tags are decomposed and stored into an internal hash for future use, while the game text is left untouched.

Reading a game is accomplished through the read_game() method, which will identify not only the standard game format but also some unorthodox cases, such as games with no separating blank line between tags and moves, games with no blank lines at the end of the moves, leading blank lines, tags spanning over several lines and some minor quibbles. If you know that your games don't have any of these problems, you might choose the read_standard_game() method, which is a bit faster.

After the reading, you can either use the game text as it is, or you can ask for parsing. What is it? Parsing is the process of identifying and isolating the moves from the rest of the game text, such as comments and recursive variations. This process can be accomplished in two ways: using quick_parse_game(), the non moves elements are just stripped off and discarded, leaving an array of bare moves. If the comments and the recursive variations (RAV) are valuable to you, you can use the parse_game() method, which will strip the excess text, but it can store it into an appropriate data structure. Passing the option {save_comments =>'yes'} to parse_game(), game comments will be stored into a hash, having as key the move number + color. Multiple comments for the same move are appended to the previous one. If this structure doesn't provide enough details, a further option {comments_struct => 'array'} will store an array of comments for each move. Even more details are available using {comments_struct => 'hol'}, which will trigger the creation of a hash of lists (hol), where the key is the comment type (RAV, NAG, brace, semicolon, escaped) and the value is a list of homogeneous comments belonging to the same move.

A further option {log_errors => 'yes'} will save the errors into a structure similar to the comments (no options on the format, though. All errors for one given move are just a string). What are errors? Just anything that is not recognized as any of the previous elements. Not a move, or a move number, or a comment, either text or recursive. Anything that the parser cannot actively classify as 'known' will be stored as error.

Getting the parsed values

At the end of the exercise, you can access the components through some standard methods. The standard tags have their direct access method (white, black, site, event, date, result, round). More methods give access to some commonly used elements: game() is the unparsed text, moves() returns an array of parsed moves, without move numbers, comments() and errors() return the relative structures after parsing. About game(), it's worth mentioning that, using quick_parse_game(), the game text is stripped of all non moves elements. This is an intended feature, to privilege speed. If you need to preserve the original game text after parsing, either copy it before calling quick_parse_game() or use parse_game() instead.

Recursive Parsing

PGN games may include RAV (Recursive Annotated Variations) which is just game text inside parentheses. This module can recognize RAV sequences and store them as comments. One of the things you can do with these sequences is to parse them again and get bare moves that you can feed to a chess engine or a move analyzer (Chess::PGN::EPD by H.S.Myers is one of them). Chess::PGN::Parse does not directly support recursive parsing of games, but it makes it possible. Parse a game, saving the comments as hash of list (see above), and then check for comments that are of 'RAV' type. For each entry in the comments array, strip the surrounding parentheses and create a new Chess::PGN::Parse object with that text. Easier to do than to describe, actually. For an example of this technique, check the file examples/test_recursive.pl.

EXPORT

new, STR, read_game, tags, event, site, white, black, round, date, result, game , NAG, moves

DEPENDENCIES

IO::File

Class methods

new()

Create a new Chess::PGN::Parse object (requires file name) my $pgn = Chess::PGN::Parse->new "filename.pgn" or die "no such file \n";

NAG() returns the corresponding Numeric Annotation Glyph
STR()

returns the Seven Tags Roster array

    @array = $pgn->STR();
    @array = PGNParser::STR();
event()

returns the Event tag

site()

returns the Site tag

date()

returns the Date tag

white()

returns the White tag

black()

returns the Black tag

result()

returns the result tag

round()

returns the Round tag

game()

returns the unparsed game moves

time()

returns the Time tag

eco()

returns the ECO tag

eventdate()

returns the EventDate tag

moves()

returns an array reference to the game moves (no numbers)

comments()

returns a hash reference to the game comments (the key is the move number and the value are the comments for such move)

errors()

returns a hash reference to the game errors (the key is the move number and the value are the errors for such move)

set_event()

returns or modifies the Event tag

set_site()

returns or modifies the Site tag

set_date()

returns or modifies the Date tag

set_white()

returns or modifies the White tag

set_black()

returns or modifies the Black tag

set_result()

returns or modifies the result tag

set_round()

returns or modifies the Round tag

set_game()

returns or modifies the unparsed game moves

set_time()

returns or modifies the Time tag

set_eco()

returns or modifies the ECO tag

set_eventdate()

returns or modifies the EventDate tag

set_moves()

returns or modifies an array reference to the game moves (no numbers)

tags()

returns a hash reference to all the parsed tags

    $hash_ref = $pgn->tags();
read_all()

Will read and parse all the games in the current file and return a reference to an array of hashes. Each hash item contains both the raw data and the parsed moves and comments

Same parameters as for parse_game(). Default : discard comments

    my $games_ref = $pgn->read_all();
quick_read_all()

Will read and quick parse all the games in the current file and return a reference to an array of hashes. Each hash item contains both the raw data and the parsed moves Comments are discarded. Same parameters as for quick_parse_game().

    my $games_ref = $pgn->quick_read_all();
smart_read_all()

Will read and quick parse all the games in the current file and return a reference to an array of hashes. Each hash item contains both the raw data and the parsed moves Comments are discarded. Calls smart_read_game() to decide which method is best to parse each given game.

    my $games_ref = $pgn->smart_read_all();
read_game()

reads the next game from the given PGN file. Returns TRUE (1) if successful (= a game was read) or FALSE (0) if no more games are available or an unexpected EOF occurred before the end of parsing

    while ($pgn->read_game()) {
        do_something_smart;
    }
    

It can read standard and in some cases even non-standard PGN games. The following deviance from the standard are handled:

    1. no blank line between tags and moves;
    2. no blank line between games
    3. blank line(s) before a game (start of file)
    4. multiple tags in the same line
    5. tags spanning over more lines 
       (can't cumulate with rule 4)
    6. No tags (only moves). 
       (can't cumulate with rule 2)
    7. comments (starting with ";") outside the game text
    
read_standard_game()

reads the next game from the given PGN file. Returns TRUE (1) if successful (= a game was read) or FALSE (0) if no more games are available or an unexpected EOF occurred before the end of parsing

    while ($pgn->read_standard_game()) {
        do_something_smart;
    }

This method deals only with well formed PGN games. Use the more forgiving read_game() for PGN files that don't fully respect the PGN standard.

 _get_tags() returns a list of tags depending on the parameters

 _get_format() returns a format to be used when printing tags

 _get_formatted_tag() returns a tag formatted according to the
 given template.
standard_PGN()
 returns a string containing all current PGN tags, including
 the game.
 Parameters are passed through a hash reference. None is
 required.

 tags => [tag list], # default is the Seven Tags Roster.
                     # You may specify only the tags you want to 
                     # print 
                     # tags => [qw(White Black Result)]
 
 all_tags => 'no',   # default 'no'. If yes (or 1), it outputs all the tags
                     # if 'tags' and 'all_tags' are used, 'all_tags' 
                     # prevails

 nl => q{\n},        # default '\n'. Tag separator. Can be changed
                     # according to your needs.
                     # nl => '<br>\n' is a good candidate for HTML 
                     # output.
 
 brackets => q{[]},  # default '[]'. Output tags within brackets.
                     # Bracketing can be as creative as you want.
                     # If the left and rigth bracketing sequence are
                     # longer than one character, they must be separated
                     # by a pipe (|) symbol.
                     # '()', '(|)\t,'{|}\n' and '{}' are valid 
                     # sequences.
                     # 
                     # '<h1>|</h1>' will output HTML header 1
                     # '<b>{</b>|<b>}</b>\n' will enclose each tag
                     # between bold braces.
 
 quotes => q{"},     # default '"'. Quote tags values.
                     # As for brackets, quotes can be specified in
                     # pairs: '<>' and '<|>' are equivalent.
                     # If the quoting sequence is more than one char,
                     # the pipe symbol is needed to separate the left
                     # quote from the right one.
                     # '<i>|</i>' will produce HTML italicized text.
                     
 game => 'yes',      # default 'yes'. Output the game text 
                     # If the game was parsed, returns a clean list
                     # of moves, else the unparsed text

 comments => 'no'    # Default 'no'. Output the game comments.
                     # Requires the 'game' option
 
smart_parse_game()

Parses the current game, returning the moves only. Uses by default quick_parse_game(), unless recursive comments are found in the source game.

quick_parse_game()

Parses the current game, returning the moves only. Comments are discarded. This function does FAIL on Recursive Annotated Variation or nested comments. Parameters (passed as a hash reference): check_moves = 'yes'|'no'. Default : no. If requested, each move is checked against a RegEx, to filter off possible unbraced comments.

parse_game()

Parses the current game (after read_game() was called). Accepts parameters as hash reference.

    $pgn->parse_game(); # default save_comments => 'no'

    $pgn->parse_game({
        save_comments => 'yes',
        comments_struct => 'string'});
    

{comments_struct => 'string'} is the default value When 'comments_struct' is 'string', multiple comments for the same move are concatenated to one string

{comments_struct => 'array'} If 'array', comments are stored as an anonymous array, one comment per element

{comments_struct => 'hol'} If 'hol', comments are stored as a hash of lists, where there is a list of comments for each comment type (NAG, RAV, braced, semicolon, escaped)

    $pgn->parse_game({save_comments => 'yes', 
        log_errors => 'yes'});

parse_game() implements a finite state machine on two assumptions:

    1. No moves or move numbers are truncated at the end of a line;
    2. the possible states in a PGN game are:

        a. move number
        b. move
        c. braced comment
        d. EOL comment
        e. Numeric Annotation Glyph
        f. Recursive Annotated Variation
        g. Result
        h. unbraced comments (barewords, "!?+-=")

Items from "a" to "g" are actively parsed and recognized. Anything unrecognized goes into the "h" state and discarded (or stored, if log_errors was requested)

add_comments()

Allows inserting comments for an already parsed game; it accepts comments passed as an anonymous hash. An optional second parameter sets the storage type. They are the same as for parse_game(); 'string' (default) all comments for a given move are concatenated together 'array' each comment for a given move is stored as an array element 'hol' Comments are stored in a hash of lists different for each comment type.

shrink_epd()

Given a EPD (Extended Position Description) string, shrink_epd() will convert it into a bit string, which reduces the original by about 50%. It can be restored to the original string by expand_epd()

expand_epd()

given a EPD bitstring created by shrink_epd(), expand_epd() will restore the original text.

AUTHOR

Giuseppe Maxia, gmax@cpan.org

THANKS

Thanks to - Hugh S. Myers for advice, support, testing and brainstorming; - Damian Conway for the recursive Regular Expressions used to parse comments; - all people at PerlMonks (www.perlmonks.org) for advice and good developing environment. - Nathan Neff for pointing out an insidious, hard-to-spot bug in my RegExes.

COPYRIGHT

The Chess::PGN::Parse module is Copyright (c) 2002 Giuseppe Maxia, Sardinia, Italy. All rights reserved.

You may distribute this software under the terms of either the GNU General Public License version 2 or the Artistic License, as specified in the Perl README file. The embedded and encosed documentation is released under the GNU FDL Free Documentation License 1.1