The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

YAML::PP - YAML 1.2 processor

SYNOPSIS

WARNING: This is not yet stable.

Here are a few examples of the basic load and dump methods:

    use YAML::PP;
    my $ypp = YAML::PP->new;
    my $yaml = <<'EOM';
    --- # Document one is a mapping
    name: Tina
    age: 29
    favourite language: Perl

    --- # Document two is a sequence
    - plain string
    - 'in single quotes'
    - "in double quotes we have escapes! like \t and \n"
    - | # a literal block scalar
      line1
      line2
    - > # a folded block scalar
      this is all one
      single line because the
      linebreaks will be folded
    EOM

    my @documents = $ypp->load_string($yaml);
    my @documents = $ypp->load_file($filename);

    my $yaml = $ypp->dump_string($data1, $data2);
    $ypp->dump_file($filename, $data1, $data2);

    # The loader offers JSON::PP::Boolean, boolean.pm or
    # perl 1/'' (currently default) for booleans
    my $ypp = YAML::PP->new(boolean => 'JSON::PP');
    my $ypp = YAML::PP->new(boolean => 'boolean');
    my $ypp = YAML::PP->new(boolean => 'perl');

    # Legacy interface
    use YAML::PP qw/ Load Dump LoadFile DumpFile /;
    my @documents = Load($yaml);
    my @documents = LoadFile($filename);
    my @documents = LoadFile($filehandle);
    my $yaml = = Dump(@documents);
    DumpFile($filename, @documents);
    DumpFile($filenhandle @documents);

    my $ypp = YAML::PP->new(schema => [qw/ JSON Perl /]);
    my $yaml = $yp->dump_string($data_with_perl_objects);

Some utility scripts, mostly useful for debugging:

    # Load YAML into a data structure and dump with Data::Dumper
    yamlpp5-load < file.yaml

    # Load and Dump
    yamlpp5-load-dump < file.yaml

    # Print the events from the parser in yaml-test-suite format
    yamlpp5-events < file.yaml

    # Parse and emit events directly without loading
    yamlpp5-parse-emit < file.yaml

    # Create ANSI colored YAML. Can also be useful for invalid YAML, showing
    # you the exact location of the error
    yamlpp5-highlight < file.yaml

DESCRIPTION

YAML::PP is a modern, modular YAML processor.

It aims to support YAML 1.2 and YAML 1.1. See http://yaml.org/.

YAML is a serialization language. The YAML input is called "YAML Stream". A stream consists of one or more "Documents", seperated by a line with a document start marker ---. A document optionally ends with the document end marker ....

This allows to process continuous streams additionally to a fixed input file or string.

The YAML::PP frontend will currently load all documents, and return only the last if called with scalar context.

The YAML backend is implemented in a modular way that allows to add custom handling of YAML tags, perl objects and data types. The inner API is not yet stable. Suggestions welcome.

You can check out all current parse and load results from the yaml-test-suite here: https://perlpunk.github.io/YAML-PP-p5/test-suite.html

PLUGINS

You can alter the behaviour of YAML::PP by using the following schema classes:

YAML::PP::Schema::Failsafe

One of the three YAML 1.2 official schemas

YAML::PP::Schema::JSON

One of the three YAML 1.2 official schemas. Default

YAML::PP::Schema::Core

One of the three YAML 1.2 official schemas

YAML::PP::Schema::YAML1_1

Schema implementing the most common YAML 1.1 types

YAML::PP::Schema::Perl

Serializing Perl objects and types

YAML::PP::Schema::Binary

Serializing binary data

YAML::PP::Schema::Tie::IxHash

In progress. Keeping hash key order.

YAML::PP::Schema::Merge

YAML 1.1 merge keys for mappings

YAML::PP::Schema::Include

Include other YAML files via !include tags

To make the parsing process faster, you can plugin the libyaml parser with YAML::PP::LibYAML.

IMPLEMENTATION

The process of loading and dumping is split into the following steps:

    Load:

    YAML Stream        Tokens        Event List        Data Structure
              --------->    --------->        --------->
                lex           parse           construct


    Dump:

    Data Structure       Event List        YAML Stream
                --------->        --------->
                represent           emit

You can dump basic perl types like hashes, arrays, scalars (strings, numbers). For dumping blessed objects and things like coderefs have a look at YAML::PP::Perl/YAML::PP::Schema::Perl.

For keeping your ordered Tie::IxHash hashes, try out YAML::PP::Schema::Tie::IxHash.

YAML::PP::Lexer

The Lexer is reading the YAML stream into tokens. This makes it possible to generate syntax highlighted YAML output.

Note that the API to retrieve the tokens will change.

YAML::PP::Parser

The Parser retrieves the tokens from the Lexer. The main YAML content is then parsed with the Grammar.

YAML::PP::Grammar
YAML::PP::Constructor

The Constructor creates a data structure from the Parser events.

YAML::PP::Loader

The Loader combines the constructor and parser.

YAML::PP::Dumper

The Dumper will delegate to the Representer

YAML::PP::Representer

The Representer will create Emitter events from the given data structure.

YAML::PP::Emitter

The Emitter creates a YAML stream.

YAML::PP::Parser

Still TODO:

Implicit collection keys
    ---
    [ a, b, c ]: value
Implicit mapping in flow syle sequences
    ---
    [ a, b, c: d ]
    # equals
    [ a, b, { c: d } ]
Plain mapping keys ending with colons
    ---
    key ends with two colons::: value
Supported Characters

If you have valid YAML that's not parsed, or the other way round, please create an issue.

Line and Column Numbers

You will see line and column numbers in the error message. The column numbers might still be wrong in some cases.

Error Messages

The error messages need to be improved.

Unicode Surrogate Pairs

Currently loaded as single characters without validating

Possibly more

YAML::PP::Constructor

The Constructor now supports all three YAML 1.2 Schemas, Failsafe, JSON and JSON. Additionally you can choose the schema for YAML 1.1 as YAML1_1.

Too see what strings are resolved as booleans, numbers, null etc. look at t/31.schema.t.

You can choose the Schema, however, the API for that is not yet fixed. Currently it looks like this:

    my $ypp = YAML::PP->new(schema => ['Core']); # default is 'JSON'

The Tags !!seq and !!map are still ignored for now.

It supports:

Handling of Anchors/Aliases

Like in modules like YAML, the Constructor will use references for mappings and sequences, but obviously not for scalars.

Boolean Handling

You can choose between 'perl' (1/'', currently default), 'JSON::PP' and 'boolean'.pm for handling boolean types. That allows you to dump the data structure with one of the JSON modules without losing information about booleans.

Numbers

Numbers are created as real numbers instead of strings, so that they are dumped correctly by modules like JSON::PP or JSON::XS, for example.

See "NUMBERS" for an example.

Complex Keys

Mapping Keys in YAML can be more than just scalars. Of course, you can't load that into a native perl structure. The Constructor will stringify those keys with Data::Dumper instead of just returning something like HASH(0x55dc1b5d0178).

Example:

    use YAML::PP;
    use JSON::PP;
    my $ypp = YAML::PP->new;
    my $coder = JSON::PP->new->ascii->pretty->allow_nonref->canonical;
    my $yaml = <<'EOM';
    complex:
        ?
            ?
                a: 1
                c: 2
            : 23
        : 42
    EOM
    my $data = $yppl->load_string($yaml);
    say $coder->encode($data);
    __END__
    {
       "complex" : {
          "{'{a => 1,c => 2}' => 23}" : 42
       }
    }

TODO:

Parse Tree

I would like to generate a complete parse tree, that allows you to manipulate the data structure and also dump it, including all whitespaces and comments. The spec says that this is throwaway content, but I read that many people wish to be able to keep the comments.

YAML::PP::Dumper, YAML::PP::Emitter

The Dumper should be able to dump strings correctly, adding quotes whenever a plain scalar would look like a special string, like true, or when it contains or starts with characters that are not allowed.

Most strings will be dumped as plain scalars without quotes. If they contain special characters or have a special meaning, they will be dumped with single quotes. If they contain control characters, including <"\n">, they will be dumped with double quotes.

It will recognize JSON::PP::Boolean and boolean.pm objects and dump them correctly.

Numbers which also have a PV flag will be recognized as numbers and not as strings:

    my $int = 23;
    say "int: $int"; # $int will now also have a PV flag

That means that if you accidentally use a string in numeric context, it will also be recognized as a number:

    my $string = "23";
    my $something = $string + 0;
    print $yp->dump_string($string);
    # will be emitted as an integer without quotes!

The layout is like libyaml output:

    key:
    - a
    - b
    - c
    ---
    - key1: 1
      key2: 2
      key3: 3
    ---
    - - a1
      - a2
    - - b1
      - b2

METHODS

new
    my $ypp = YAML::PP->new;
    # load booleans via boolean.pm
    my $ypp = YAML::PP->new( boolean => 'boolean' );
    # load booleans via JSON::PP::true/false
    my $ypp = YAML::PP->new( boolean => 'JSON::PP' );
    
    # use YAML 1.2 Failsafe Schema
    my $ypp = YAML::PP->new( schema => ['Failsafe'] );
    # use YAML 1.2 JSON Schema
    my $ypp = YAML::PP->new( schema => ['JSON'] );
    # use YAML 1.2 Core Schema
    my $ypp = YAML::PP->new( schema => ['Core'] );
    
    # Die when detecting cyclic references
    my $ypp = YAML::PP->new( cyclic_refs => 'fatal' );
    # Other values:
    # warn   - Just warn about them and replace with undef
    # ignore - replace with undef
    # allow  - Default
    
    my $ypp = YAML::PP->new(
        boolean => 'JSON::PP',
        schema => ['JSON'],
        cyclic_refs => 'fatal',
        indent => 4, # use 4 spaces for dumping indentation
        header => 1, # default 1; print document header ---
        footer => 1, # default 0; print document footer ...
    );
load_string
    my $doc = $ypp->load_string("foo: bar");
    my @docs = $ypp->load_string("foo: bar\n---\n- a");

Input should be Unicode characters (decoded).

load_file
    my $doc = $ypp->load_file("file.yaml");
    my @docs = $ypp->load_file("file.yaml");

Strings will be loaded as unicode characters (decoded).

dump_string
    my $yaml = $ypp->dump_string($doc);
    my $yaml = $ypp->dump_string($doc1, $doc2);
    my $yaml = $ypp->dump_string(@docs);

Input strings should be Unicode characters. If not, they will be upgraded with utf8::upgrade.

Output will return Unicode characters (decoded).

dump_file
    $ypp->dump_file("file.yaml", $doc);
    $ypp->dump_file("file.yaml", $doc1, $doc2);
    $ypp->dump_file("file.yaml", @docs);

Input data should be UTF-8 decoded. If not, it will be upgraded with utf8::upgrade.

dump

This will dump to a predefined writer. By default it will just use the YAML::PP::Writer and output a string.

    my $writer = MyWriter->new(\my $output);
    my $yp = YAML::PP->new(
        writer => $writer,
    );
    $yp->dump($data);
loader

Returns or sets the loader object, by default YAML::PP::Loader

dumper

Returns or sets the dumper object, by default YAML::PP::Dumper

schema

Returns or sets the schema object

default_schema

Creates and returns the default schema

FUNCTIONS

The functions Load, LoadFile, Dump and DumpFile are provided as a drop-in replacement for other existing YAML processors. No function is exported by default.

Load
    use YAML::PP qw/ Load /;
    my $doc = Load($yaml);
    my @docs = Load($yaml);

Works like load_string.

LoadFile
    use YAML::PP qw/ LoadFile /;
    my $doc = LoadFile($file);
    my @docs = LoadFile($file);
    my @docs = LoadFile($filehandle);

Works like load_file.

Dump
    use YAML::PP qw/ Dump /;
    my $yaml = Dump($doc);
    my $yaml = Dump(@docs);

Works like dump_string.

DumpFile
    use YAML::PP qw/ DumpFile /;
    DumpFile($file, $doc);
    DumpFile($file, @docs);
    DumpFile($filehandle, @docs);

Works like dump_file.

NUMBERS

Compare the output of the following YAML Loaders and JSON::PP dump:

    use JSON::PP;
    use Devel::Peek;

    use YAML::XS ();
    use YAML ();
        $YAML::Numify = 1; # since version 1.23
    use YAML::Syck ();
        $YAML::Syck::ImplicitTyping = 1;
    use YAML::Tiny ();
    use YAML::PP;

    my $yaml = "foo: 23";

    my $d1 = YAML::XS::Load($yaml);
    my $d2 = YAML::Load($yaml);
    my $d3 = YAML::Syck::Load($yaml);
    my $d4 = YAML::Tiny->read_string($yaml)->[0];
    my $d5 = YAML::PP->new->load_string($yaml);

    Dump $d1->{foo};
    Dump $d2->{foo};
    Dump $d3->{foo};
    Dump $d4->{foo};
    Dump $d5->{foo};

    say encode_json($d1);
    say encode_json($d2);
    say encode_json($d3);
    say encode_json($d4);
    say encode_json($d5);

    SV = PVIV(0x55bbaff2bae0) at 0x55bbaff26518
      REFCNT = 1
      FLAGS = (IOK,POK,pIOK,pPOK)
      IV = 23
      PV = 0x55bbb06e67a0 "23"\0
      CUR = 2
      LEN = 10
    SV = PVMG(0x55bbb08959b0) at 0x55bbb08fc6e8
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 23
      NV = 0
      PV = 0
    SV = IV(0x55bbaffcb3b0) at 0x55bbaffcb3c0
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 23
    SV = PVMG(0x55bbaff2f1f0) at 0x55bbb08fc8c8
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      IV = 0
      NV = 0
      PV = 0x55bbb0909d00 "23"\0 [UTF8 "23"]
      CUR = 2
      LEN = 10
    SV = PVMG(0x55bbaff2f6d0) at 0x55bbb08b2c10
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 23
      NV = 0
      PV = 0

    {"foo":"23"}
    {"foo":23}
    {"foo":23}
    {"foo":"23"}
    {"foo":23}

WHY

All the available parsers and loaders for Perl are behaving differently, and more important, aren't conforming to the spec. YAML::XS is doing pretty well, but libyaml only handles YAML 1.1 and diverges a bit from the spec. The pure perl loaders lack support for a number of features.

I was going over YAML.pm issues end of 216, integrating old patches from rt.cpan.org and creating some pull requests myself. I realized that it would be difficult to patch YAML.pm to parse YAML 1.1 or even 1.2, and it would also break existing usages relying on the current behaviour.

In 2016 Ingy döt Net initiated two really cool projects:

"YAML TEST SUITE"
"YAML EDITOR"

These projects are a big help for any developer. So I got the idea to write my own parser and started on New Year's Day 2017. Without the test suite and the editor I would have never started this.

I also started another YAML Test project which allows to get a quick overview of which frameworks support which YAML features:

"YAML TEST MATRIX"

YAML TEST SUITE

https://github.com/yaml/yaml-test-suite

It contains about 230 test cases and expected parsing events and more. There will be more tests coming. This test suite allows to write parsers without turning the examples from the Specification into tests yourself. Also the examples aren't completely covering all cases - the test suite aims to do that.

The suite contains .tml files, and in a seperate 'data' branch you will find the content in seperate files, if you can't or don't want to use TestML.

Thanks also to Felix Krause, who is writing a YAML parser in Nim. He turned all the spec examples into test cases.

YAML EDITOR

This is a tool to play around with several YAML parsers and loaders in vim.

https://github.com/yaml/yaml-editor

The project contains the code to build the frameworks (16 as of this writing) and put it into one big Docker image.

It also contains the yaml-editor itself, which will start a vim in the docker container. It uses a lot of funky vimscript that makes playing with it easy and useful. You can choose which frameworks you want to test and see the output in a grid of vim windows.

Especially when writing a parser it is extremely helpful to have all the test cases and be able to play around with your own examples to see how they are handled.

YAML TEST MATRIX

I was curious to see how the different frameworks handle the test cases, so, using the test suite and the docker image, I wrote some code that runs the tests, manipulates the output to compare it with the expected output, and created a matrix view.

https://github.com/perlpunk/yaml-test-matrix

You can find the latest build at http://matrix.yaml.io

As of this writing, the test matrix only contains valid test cases. Invalid ones will be added.

CONTRIBUTORS

Ingy döt Net

Ingy is one of the creators of YAML. In 2016 he started the YAML Test Suite and the YAML Editor. He also made useful suggestions on the class hierarchy of YAML::PP.

Felix "flyx" Krause

Felix answered countless questions about the YAML Specification.

SEE ALSO

YAML
YAML::XS
YAML::Syck
YAML::Tiny

SPONSORS

The Perl Foundation https://www.perlfoundation.org/ sponsored this project (and the YAML Test Suite) with a grant of 2500 USD in 2017-2018.

COPYRIGHT AND LICENSE

Copyright 2018 by Tina Müller

This library is free software and may be distributed under the same terms as perl itself.