The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

YAML - YAML Ain't Markup Language (tm)

SYNOPSIS

    use YAML;

    my ($hashref, $arrayref, $string) = Load(<<'--END');
    ---
    name: ingy
    age: old
    weight: heavy
    # I should comment that I also like pink, but don't tell anybody.
    favorite colors:
     - red
     - white
     - blue
    ---
    - clark
    - oren
    - ingy
    --- \
    You probably think YAML stands for "Yet Another Markup Language". It
    ain't! YAML is really a data serialization language. But if you want
    to think of it as a markup, that's OK with me. A lot of people try
    to use XML as a serialization format.
    
    "YAML" is catchy and fun to say. Try it. "YAML, YAML, YAML!!!"
    --END
    
    print Store($string, $arrayref, $hashref); 
    
    use Data::Dumper;
    print Dumper($string, $arrayref, $hashref); 

DESCRIPTION

The YAML.pm module implements a YAML Loader and Dumper based on the YAML 1.0 specification. http://www.yaml.org/spec/

YAML is a generic data serialization language that is optimized for human readability. It can be used to express the data structures of most modern programming languages. (Including Perl!!!)

For information on the YAML syntax, please refer to the YAML specification at http://www.yaml.org/spec/.

WHY YAML IS COOL

YAML is readable for people.

It makes clear sense out of complex data structures. You should find that YAML is an exceptional data dumping tool. Structure is shown through indentation, YAML supports recursive data, and hash keys are sorted by default. In addition, YAML supports several styles of scalar formatting for different types of data.

YAML is editable.

YAML was designed from the ground up to be the ultimate syntax for configuration files. All almost all programs need configuration files, so why invent a new syntax for each one. And why subject users to the complexities of XML or native Perl code.

YAML is multilingual.

Yes, YAML supports Unicode. But I'm actually referring to programming languages. YAML was designed to meet the serialization needs of Perl, Python, Tcl, PHP, Java. It was also designed to be interoperable between those languages. That means any YAML serialization produced by Perl can be processed by Python, and be guaranteed to return the data structure intact. (Even if it contained Perl specific structures like GLOBs)

YAML is taint safe.

Using modules like Data::Dumper for serialization is fine as long as you can be sure that nobody can tamper with your data files or transmissions. That's because you need to use Perl's eval() built-in to deserialize the data. Somebody could add a snippet of Perl to erase your files.

YAML's parser does not need to eval anything.

YAML is full featured.

YAML can accurately serialize all of the common Perl data structures and deserialize them again without losing data relationships. Although it is not 100% perfect (no serializer is or can be perfect), it fares as well as the popular current modules: Data::Dumper, Storable, XML::Dumper and Data::Denter.

YAML.pm also has the ability to handle code (subroutine) references and typeglobs. (Still experimental) These features are not found in Perl's other serialization modules.

YAML is extensible.

YAML has been designed to be flexible enough to solve it's own problems. The markup itself has 3 basic construct which resemble Perl's hash, array and scalar. By default, these map to their Perl equivalents. But each YAML node also supports a type (or "transfer method") which can cause that node to be interpreted in a completely different manner. That's how YAML can support oddball structures like Perl's typeglob.

YAML.pm plays well with others.

YAML has been designed to interact well with other Perl Modules like POE, Date::ICal and Time::Object. (date support coming soon)

USAGE

Exported Functions

The following functions are exported by YAML.pm by default when you use YAML.pm like this:

    use YAML;

To prevent YAML.pm from exporting functions, say:

    use YAML ();
Store(list of Perl data structures)

Turn Perl data into YAML. This function works very much like Data::Dumper::Dumper(). It takes a list of Perl data strucures and dumps them into a serialized form. It returns a string containing the YAML stream. The structures can references or plain scalars.

Load(string containing a YAML stream)

Turn YAML into Perl data. This is the opposite of Data::Dumper; kind of like the eval() function. It parses a string containing a valid YAML stream into a list of Perl data structures. In list context YAML will return a stucture for each YAML document in the stream.

Exportable Functions

StoreFile(filepath, list)

Writes the YAML stream to a file instead of just returning a string.

LoadFile(filepath)

Reads the YAML stream from a file instead of a string.

Dumper()

Alias to Store(). For Data::Dumper fans.

Eval()

Alias to Load(). For Data::Dumper fans.

Indent()

Alias to Store(). For Data::Denter fans.

Undent()

Alias to Load(). For Data::Denter fans.

Denter()

Alias to Store(). For Data::Denter fans.

freeze()

Alias to Store(). For Storable fans.

This will also allow YAML.pm to be plugged directly into POE.pm.

thaw()

Alias to Load(). For Storable fans.

This will also allow YAML.pm to be plugged directly into POE.pm.

Exportable Function Groups

This is a list of the various groups of exported functions that you can import using the following syntax:

    use YAML ':groupname';
all

Imports Store(), Load(), StoreFile() and LoadFile().

POE

Imports freeze() and thaw().

Dumper

Imports Dumper() and Eval().

Denter

Imports Denter(), Indent() and Undent().

Storable

Imports freeze() and thaw().

Class Methods

YAML can also be used in an object oriented manner. At this point it offers no real advantage.

new()

New returns a new YAML object. Options may be passed in as key/value pairs. For example:

    my $y = YAML->new(Separator => '--foo',
                      SortKeys => 0,
                     );
    $y->store($foo, $bar);

Object Methods

store()

OO version of Store().

load()

OO version of Load().

Options

YAML options are set using a group of global variables in the YAML namespace. This is similar to how Data::Dumper works.

For example, to change the separator string, do something like:

    $YAML::Separator = '--<$>';

The current options are:

Separator

Default is '---'.

This is the MIME like string that separates YAML documents in a stream. It must start with two dashes, and then be followed by one or more non-whitespace characters. ( '---' is the canonical form. )

UseHeader

Default is 1. (true)

This tells YAML.pm whether use a separator string for a Store operation.

NOTE: It is currently illegal to not begin a YAML stream with a separator if there are more than one documents in the YAML stream.

UseVersion

Default is 1. (true)

Tells YAML.pm whether to include the YAML version on the separator/header.

The canonical form is:

    --- YAML:1.0
SortKeys

Default is 1. (true)

Tells YAML.pm whether or not to sort hash keys when storing a document.

PerlCode

Setting the PerlCode option is a shortcut to set both the StoreCode and LoadCode options at once. Setting PerlCode to '1' tells YAML.pm to dump Perl code references as Perl (using B::Deparse) and to load them back into memory using eval(). The reason this has to be an option is that using eval() to parse untrusted code is, well, untrustworthy. Safe deserialization is one of the core goals of YAML.

StoreCode

Determines if and how YAML.pm should serialize Perl code references. By default YAML.pm will store code references as dummy placeholders (much like Data::Dumper). If StoreCode is set to '1' or 'deparse', code references will be dumped as actual Perl code.

StoreCode can also be set to a subroutine reference so that you can write your own serializing routine. YAML.pm passes you the code ref. You pass back the serialization (as a string) and a format indicator. The format indicator is a simple string like: 'deparse' or 'bytecode'.

LoadCode

LoadCode is the opposite of StoreCode. It tells YAML if and how to deserialize code references. When set to '1' or 'deparse' it will use eval(). Since this is potentially risky, only use this option if you know where your YAML has been.

LoadCode can also be set to a subroutine reference so that you can write your own deserializing routine. YAML.pm passes the serialization (as a string) and a format indicator. You pass back the code reference.

YAML TERMINOLOGY

YAML is a full featured data serialization language, and thus has its own terminology.

It is important to remember that although YAML is heavily influenced by Perl and Python, it is a language in it's own right, not merely just a representation of Perl structures.

YAML has three constructs that are conspicuously similar to Perl's hash, array, and scalar. They are called map, sequence, and scalar respectively. By default, they do what you would expect. But each instance may have an explicit or implicit type that makes it behave differently. In this manner, YAML can be extended to represent Perl's Glob or Python's tuple, or Ruby's Bigint.

stream

A YAML stream is the full sequence of bytes that a YAML parser would read or a YAML emitter would write. A stream may contain one or more YAML documents separated by YAML headers.

    ---
    a: map
    foo: bar
    ---
    - a
    - sequence
document

A YAML document is an independent data structure representation within a stream. It is a top level node.

    --- YAML:1.0
    This: top level map
    is:
     - a
     - YAML
     - document
node

A YAML node is the representation of a particular data stucture. Nodes may contain other nodes. (In Perl terms, nodes are like scalars. Strings, arrayrefs and hashrefs. But this refers to the serialized format, not the in-memory structure.)

transfer method

This is similar to a type. It indicates how a particular YAML node serialization should be transferred into or out of memory. For instance a Foo::Bar object would use the transfer 'perl/map:Foo::Bar':

    - !perl/map:Foo::Bar
     foo: 42
     bar: stool

A transfer method can be used to 'cast' a YAML node to a different type of Perl Structure. For instance, the following represents a perl array with 43 entries:

    sparse array: !seq
     1: one
     2: two
     42: forty two
collection

A collection is a YAML data grouping. YAML has two types of collections: maps and sequences. (Similar to hashes and arrays)

map

A map is a YAML collection defined by key/value pairs. By default YAML maps are loaded into Perl hashes.

    a map:
     foo: bar
     two: times two is 4
sequence

A sequence is a YAML collection defined by an ordered list of elements. By default YAML sequences are loaded into Perl arrays.

    a sequence:
     - one bourbon
     - one scotch
     - one beer
scalar

A scalar is a YAML node that is a single value. By default YAML scalars are loaded into Perl scalars.

    a scalar key: a scalar value

YAML has six styles for representing scalars. This is important because varying data will have varying formatting requirements to retain the optimum human readability.

simple scalar

This is a single line of unquoted text. All simple scalars are automatic candidates for "implicit transferring". This means that their type is determined automatically by examination. Unless they match a set of predetermined YAML regex patterns, they will raise a parser exception. The typical uses for this are simple alpha strings, integers, real numbers, dates, times and currency.

    - a simple string
    - -42
    - 3.1415
    - 12:34
    - 123 this is an error
single quoted scalar

This is similar to Perl's use of single quotes. It means no escaping and no implicit transfer. It must be used on a single line.

    - 'When I say ''\n'' I mean "backslash en"'
double quoted scalar

This is similar to Perl's use of double quotes. Character escaping can be used. There is no implicit transfer and it must still be single line.

    - "This scalar\nhas two lines"
plain scalar

This is a multiline scalar which begins on the next line. It is indicated by a single backslash. It is unescaped like the single quoted scalar. Line folding is also performed.

    - \
     This is a multiline scalar which begins
     on the next line. It is indicated by a
     single backslash. It is unescaped like
     the single quoted scalar. Line folding is
     also performed.
escaped scalar

This is a multiline scalar which begins on the next line. It is indicated by a double backslash. It is escaped like the double quoted scalar and subject to line folding.

    - \\ 
     This is a multiline scalar which
     begins on the next line.\nIt is
     indicated by a double backslash.\nIt
     is escaped like the double quoted
     scalar and subject to line folding.
block scalar

This final multiline form is akin to Perl's here-document except that (as in all YAML data) scope is indicated by indentation. Therefore, no ending marker is required. The data is verbatim. No escaping, no line folding.

    - |
        QTY  DESC          PRICE  TOTAL
        ---  ----          -----  -----
          1  Foo Fighters  $19.95 $19.95
          2  Bar Belles    $29.95 $59.90
parser

A YAML processor has for stages: parse, load, dump, emit.

A parser parses a YAML stream. YAML.pm's Load() function contains a parser.

loader

The other half of the Load() function is a loader. This takes the information from the parser and loads it into a Perl data structure.

dumper

The Store() function consists of a dumper and an emitter. The dumper walks through each Perl data structure and gives info to the emitter.

emitter

The emitter takes info from the dumper and turns it into a YAML stream.

NOTE: In YAML.pm the parser/loader and the dumper/emitter code are currently very closely tied together. When libyaml is written (in C) there will be a definite separation. libyaml will contain a parser and emitter, and YAML.pm (and YAML.py etc) will supply the loader and dumper.

For more information please refer to the immensely helpful YAML specification available at http://www.yaml.org/spec/.

ysh - The YAML Shell

The YAML distribution ships with a script called 'ysh', the YAML shell. ysh provides a simple, interactive way to play with YAML. If you type in Perl code, it displays the result in YAML. If you type in YAML it turns it into Perl code.

To run ysh, (assuming you installed it along with YAML.pm) simply type:

    ysh [options]

ysh has a few handy command line options:

-r

Test round-tripping. When you enter Perl code, ysh will Store it, Load it and Store it again. If the two Stores don't match, it will bark.

-R

Same as -r except that it also prints a confirmation when the data does round-trip.

-c

Sets $YAML::PerlCode to '1' automatically. This allows you to dump Perl code references.

-l

Turns on logging. Will write the log to './ysh.log'. Concatenates to the log file if the file already exists.

-L

Same as -l, but overwrites the log file if it exists.

-v

Print the versions for ysh and YAML.pm and exit.

-V

Same as -v, but prints the version of perl and YAML related modules as well.

-h

Print a help screen and exit.

BUGS & DEFICIENCIES

If you find a bug in YAML, please try to recreate it in the YAML Shell (ysh) with logging turned on. When you have successfully reproduced the bug, please mail the LOG file to the author (ingy@cpan.org).

WARNING: This is *ALPHA* code. It is brand spanking new. It probably has lots of bugs and speling mistakes.

BIGGER WARNING: This is *TRIAL1* of the YAML 1.0 specification. The YAML syntax may change before it is finalized. Based on past experience, it probably will change. The authors of this spec have worked for the past seven months putting together YAML 1.0, and we have flipped it on it's syntactical head almost every week. We're a fickle lot, we are. So use this at your own risk!!!

(OK, don't get too scared. We *are* pretty happy with the state of things right now. And we *have* agreed to freeze this TRIAL1 for the next couple of months to get user feedback. At the end of the trial period, the syntax may end up changing slightly, but the spirit should remain the same.)

Circular Leaves

YAML is quite capable of serializing circular references. And for the most part it can deserialize them correctly too. One notable exception is a reference to a leaf node containing itself. This is hard to do from pure Perl in any elegant way. The "canonical" example is:

    $foo = \$foo;

This serializes fine, but I can't parse it correctly yet. Unfortunately, every wiseguy programmer in the world seems to try this first when you ask them to test your serialization module. Even though it is of almost no real world value. So please don't report this bug unless you have a pure Perl patch to fix it for me.

By the way, similar non-leaf structures Store and Load just fine:

    $foo->[0] = $foo;
Unicode

Unicode is not yet supported. The YAML specification dictates that all strings be unicode, but this early implementation just uses ASCII.

Structured Keys

Python, Java and perhaps others support using any data type as the key to a hash. YAML also supports this. Perl5 only uses strings as hash keys.

YAML.pm can currently parse structured keys, but their meaning gets lost when they are loaded into a Perl hash. Consider this example using the YAML Shell:

    ysh > ---
    yaml> ?
    yaml>  foo: bar
    yaml> : baz
    yaml> ...
    $VAR1 = {
              'HASH(0x1f1d20)' => 'baz'
            };
    ysh >

YAML.pm will need to be fixed to preserve these keys somehow. Why? Because if YAML.pm gets a YAML document from YAML.py it must be able to return it with the Python data intact.

Tuples and Others

YAML.pm will also support other non-Perl data structures like Python's tuples. In this case, and many others, Perl will still be able to make partial use of the foreign critters, because YAML.pm will attempt to map them into something close. Since a Python tuple is close to a Perl array, that's what YAML.pm will map it into. It will then either tie or bless the array into the special class 'org.yaml.tuple', so it can keep track of what the structure originally was. In this way, a Perl program can still make intuitive use of the structure.

'org.yaml.tuple' and other special types have not yet been implemented.

Globs, Subroutines, Regexes and Tied Data

As far as I know, other Perl serialization modules are not capable of serializing and deserializing typeglobs, subroutines (code refs), regexes and tied data structures. This release adds preliminary support for serializing code refs. (Using B::Deparse and eval()). It also adds Load support (to the existing Store support) for globs.

NOTE: For a (huge) dump of Perl's global guts, try:

    perl -MYAML -e '$YAML::PerlCode=1; print Store \%main::'

To limit this to a single namespace try:

    perl -MCGI -MYAML -e '$YAML::PerlCode=1; print Store \%CGI::'
Speed

This is a pure Perl implementation that has been optimized for programmer readability, not for computational speed. The hope is that others will be able to quickly convert this module into other languages like Python, Tcl, PHP, Ruby, JavaScript and Java.

Eventually there will be a core library, libyaml, written in C. Most implementations, including this one, should be migrated to use that core library.

Please join us on the YAML mailing list if you are interested in implementing something.

https://lists.sourceforge.net/lists/listinfo/yaml-core

RESOURCES

http://www.yaml.org is the official YAML website.

http://www.yaml.org/spec/ is the YAML 1.0 specification.

YAML has been registered as a Source Forge project. (http://www.sourceforge.net) Currently we are only using the mailing list facilities there.

IMPLEMENTATIONS

This is the first implementation of YAML functionality based on the 1.0 specification.

The following people have shown an interest in doing implementations. Please contact them if you are also interested in writing an implementation.

    -
     name: ingy
     project: YAML.pm
     email: ingy@ttul.org
    -
     name: Clark Evans
     project: libyaml
     email: cce@clarkevans.com
    -
     name: Oren Ben-Kiki
     project: Java Loader/Dumper
     email: orenbk@richfx.com
    -
     name: Jon Prettyman
     project: libyaml
     email: jon@prettyman.org
    -
     name: Paul Prescod
     project: YAML Antagonist/Anarchist
     email: paul@prescod.net
    -
     name: Patrick Leboutillier
     project: Java Loader/Dumper
     email: patrick_leboutillier@hotmail.com
    -
     name: Shane Caraveo
     project: PHP Loader/Dumper
     email: shanec@activestate.com
    -
     name: Neil Kandalgoankar
     project: Python Loader/Dumper
     email: neil_j_k@yahoo.com
    -
     name: Brian Quinlan
     project: Python Loader/Dumper
     email: brian@sweetapp.com
    -
     name: Jeff Hobbs
     project: Tcl Loader/Dumper
     email: jeff@hobbs.org
    -
     name: Claes Jacobsson
     project: JavaScript Loader/Dumper
     email: claes@contiller.se
    -
     name: Neil Watkiss
     project: YAML mode for the vim editor
     email: nwatkiss@ttul.org

AUTHOR

Brian Ingerson <INGY@cpan.org> is resonsible for YAML.pm.

The YAML language is the result of a ton of collaboration between Oren Ben-Kiki, Clark Evans and Brian Ingerson. Several others have added help along the way.

COPYRIGHT

Copyright (c) 2001, 2002. Brian Ingerson. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html