The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

JSON::Parse - Read JSON into a Perl variable

SYNOPSIS

    use JSON::Parse 'parse_json';
    my $json = '["golden", "fleece"]';
    my $perl = parse_json ($json);
    # Same effect as $perl = ['golden', 'fleece'];
    

Convert JSON into Perl.

DESCRIPTION

JSON means "JavaScript Object Notation" and it is specified in "RFC 4627".

JSON::Parse converts JSON into the nearest equivalent Perl. The function "parse_json" takes one argument, a string containing JSON, and returns a Perl reference. The input to parse_json must be a complete JSON structure.

The module differs from the JSON module by simplifying the handling of Unicode. If its input is marked as Unicode characters, the strings in its output are also marked as Unicode characters.

JSON::Parse also provides two high speed validation functions, "valid_json" and "assert_valid_json", and a function to read JSON from a file, "json_file_to_perl".

FUNCTIONS

parse_json

    use JSON::Parse 'parse_json';
    my $perl = parse_json ('{"x":1, "y":2}');

This function converts JSON into a Perl structure, either an array reference or a hash reference.

If the first argument does not contain a complete valid JSON text, parse_json throws a fatal error ("dies"). If the first argument is the undefined value, an empty string, or a string containing only whitespace, parse_json returns the undefined value.

If the argument contains valid JSON, the return value is either a hash or an array reference. If the input JSON text is a serialized object, a hash reference is returned:

    use JSON::Parse ':all';
    my $perl = parse_json ('{"a":1, "b":2}');
    print ref $perl, "\n";
    # Prints "HASH".
    

If the input JSON text is a serialized array, an array reference is returned:

    use JSON::Parse ':all';
    my $perl = parse_json ('["a", "b", "c"]');
    print ref $perl, "\n";
    # Prints "ARRAY".
    

json_file_to_perl

    use JSON::Parse 'json_file_to_perl';
    my $p = json_file_to_perl ('filename');

This is exactly the same as "parse_json" except that it reads the JSON from the specified file rather than a scalar. The file must be in the UTF-8 encoding, and is opened as a character file using :encoding(UTF-8) (see PerlIO::encoding and perluniintro for details). The output is marked as character strings.

valid_json

    use JSON::Parse 'valid_json';
    if (valid_json ($json)) {
        # do something
    }

Valid_json returns 1 if its argument is valid JSON and 0 if not. It also returns 0 if the input is undefined or the empty string.

This is a high-speed validator which runs between roughly two and eight times faster than "parse_json".

Valid_json does not supply the actual errors which caused invalidity. Use "assert_valid_json" to get error messages when the JSON is invalid.

assert_valid_json

    use JSON::Parse 'assert_valid_json';
    eval {
        assert_valid_json ('["xyz":"b"]');
    };
    if ($@) {
        print "Your JSON was invalid: $@\n";
    }
    # Prints "Unexpected character ':' parsing array"

This is the underlying function for "valid_json". It runs at the same high speed, but throws an error if the JSON is wrong, rather than returning 1 or 0. See "DIAGNOSTICS" for the error format, which is identical to "parse_json".

OLD INTERFACE

The following alternative function names are accepted. These are the names used for the functions in old versions of this module. These names are not deprecated and will never be removed from the module.

json_to_perl

This is exactly the same function as "parse_json".

validate_json

This is exactly the same function as "assert_valid_json".

Mapping from JSON to Perl

JSON elements are mapped to Perl as follows:

JSON numbers

JSON numbers become Perl numbers, either integers or double-precision floating point numbers, or possibly strings containing the number if parsing of a number by the usual methods fails somehow.

JSON does not allow leading zeros, or leading plus signs, so numbers like +100 or 0123 cause an "Unexpected character" error. JSON also does not allow numbers of the form 1. but it does allow things like 0e0 or 1E999999. As far as possible these are accepted by JSON::Parse.

JSON strings

JSON strings become Perl strings. The JSON escape characters such as \t for the tab character (see section 2.5 of "RFC 4627") are mapped to the equivalent ASCII character.

Handling of Unicode

If the input to "parse_json" is marked as Unicode characters, the output strings will be marked as Unicode characters. If the input is not marked as Unicode characters, the output strings will not be marked as Unicode characters. Thus,

    use JSON::Parse ':all';
    # The scalar $sasori looks like Unicode to Perl
    use utf8;
    my $sasori = '["蠍"]';
    my $p = parse_json ($sasori);
    print utf8::is_utf8 ($p->[0]);
    # Prints 1.
    

but

    use JSON::Parse ':all';
    # The scalar $ebi does not look like Unicode to Perl
    no utf8;
    my $ebi = '["海老"]';
    my $p = parse_json ($ebi);
    print utf8::is_utf8 ($p->[0]);
    # Prints nothing.
    

Escapes of the form \uXXXX (see page three of "RFC 4627") are mapped to ASCII if XXXX is less than 0x80, or to UTF-8 if XXXX is greater than or equal to 0x80.

Strings containing \uXXXX escapes greater than 0x80 are also upgraded to character strings, regardless of whether the input is a character string or a byte string, thus regardless of whether Perl thinks the input string is Unicode, escapes like \u87f9 are converted into the equivalent UTF-8 bytes and the particular string in which they occur is marked as a character string:

    use JSON::Parse ':all';
    no utf8;
    # 蟹
    my $kani = '["\u87f9"]';
    my $p = parse_json ($kani);
    print "It's marked as a character string" if utf8::is_utf8 ($p->[0]);
    # Prints "It's marked as a character string" because it's upgraded
    # regardless of the input string's flags.

This is modelled on the behaviour of Perl's chr:

    no utf8;
    my $kani = '87f9';
    print "hex is character string\n" if utf8::is_utf8 ($kani);
    # prints nothing
    $kani = chr (hex ($kani));
    print "chr makes it a character string\n" if utf8::is_utf8 ($kani);
    # prints "chr makes it a character string"

Since every byte of input is validated as UTF-8 (see "UTF-8 only"), this hopefully will not upgrade invalid strings.

Surrogate pairs in the form \uD834\uDD1E are also handled. If the second half of the surrogate pair is missing, an "Unexpected character" or "Unexpected end of input" error is thrown. If the second half of the surrogate pair is present but contains an impossible value, a "Not surrogate pair" error is thrown.

JSON arrays

JSON arrays become Perl array references. The elements of the Perl array are in the same order as they appear in the JSON.

Thus

    my $p = parse_json ('["monday", "tuesday", "wednesday"]');

has the same result as a Perl declaration of the form

    my $p = [ 'monday', 'tuesday', 'wednesday' ];

JSON objects

JSON objects become Perl hashes. The members of the JSON object become key and value pairs in the Perl hash. The string part of each object member becomes the key of the Perl hash. The value part of each member is mapped to the value of the Perl hash.

Thus

    my $j = <<EOF;
    {"monday":["blue", "black"],
     "tuesday":["grey", "heart attack"],
     "friday":"Gotta get down on Friday"}
    EOF

    my $p = parse_json ($j);

has the same result as a Perl declaration of the form

    my $p = {
        monday => ['blue', 'black'],
        tuesday => ['grey', 'heart attack'],
        friday => 'Gotta get down on Friday',
    };

null

The JSON null literal is mapped to a readonly scalar $JSON::Parse::null containing the undefined value.

true

The JSON true literal is mapped to a readonly scalar $JSON::Parse::true containing the value 1.

false

The JSON false literal is mapped to a readonly scalar $JSON::Parse::false containing the value 0.

RESTRICTIONS

This module imposes the following restrictions on its input.

JSON only

JSON::Parse is a strict parser. It only accepts input which exactly meets the criteria of "RFC 4627". That means, for example, JSON::Parse does not accept single quotes (') instead of double quotes ("), or numbers with leading zeros, like 0123. JSON::Parse does not accept control characters (0x00 - 0x1F) in strings, missing commas between array or hash elements like ["a" "b"], or trailing commas like ["a","b","c",]. It also does not accept trailing non-whitespace, like the second "]" in ["a"]].

No incremental parsing

JSON::Parse does not do incremental parsing. JSON::Parse only parses fully-formed JSON strings which include all opening and closing brackets.

UTF-8 only

Although JSON may come in various encodings of Unicode, JSON::Parse only parses the UTF-8 format. If input is in a different Unicode encoding than UTF-8, convert the input before handing it to this module. For example, for the UTF-16 format,

    use Encode 'decode';
    my $input_utf8 = decode ('UTF-16', $input);
    my $perl = parse_json ($input_utf8);

or, for a file, use :encoding (see PerlIO::encoding and perluniintro):

    open my $input, "<:encoding(UTF-16)", 'some-json-file'; 

JSON::Parse does not determine the nature of the octet stream, as described in part 3 of "RFC 4627".

This restriction to UTF-8 applies regardless of whether Perl thinks that the input string is a character string or a byte string. Non-UTF-8 input will cause an "Unexpected character" error to be thrown.

DIAGNOSTICS

"valid_json" does not produce error messages. "parse_json" and "assert_valid_json" die on encountering invalid input.

Error messages have the line number and the byte number where appropriate of the input which caused the problem. The line number is formed simply by counting the number of "\n" (linefeed, ASCII 0x0A) characters in the whitespace part of the JSON.

Parsing errors are fatal, so to continue after an error occurs, put the parsing into an eval block:

    my $p;                       
    eval {                       
        $p = parse_json ($j);  
    };                           
    if ($@) {                    
        # handle error           
    }

The following error messages are produced:

Unexpected character

An unexpected character (byte) was encountered in the input. For example, when looking at the beginning of a string supposedly containing JSON, there are six possible characters, the four JSON whitespace characters plus "[" and "{". If the module encounters a plus sign, it will give an error like this:

    assert_valid_json ('+');

gives output

    Undefined subroutine &main::validate_json called 

The message always includes a list of what characters are allowed.

If there is some recognizable structure being parsed, the error message will include its starting point in the form "starting from byte n":

    assert_valid_json ('{"this":"\a"}');

gives output

    Undefined subroutine &main::validate_json called 

A feature of JSON is that parsing it requires only one byte to be examined at a time. Thus almost all parsing problems can be handled using the "Unexpected character" error type, including spelling errors in literals:

    assert_valid_json ('[true,folse]');

gives output

    Undefined subroutine &main::validate_json called 

and the missing second half of a surrogate pair:

    assert_valid_json ('["\udc00? <-- should be a second half here"]');

gives output

    Undefined subroutine &main::validate_json called 

All kinds of errors can occur parsing numbers, for example a missing fraction,

    assert_valid_json ('[1.e9]');

gives output

    Undefined subroutine &main::validate_json called 

and a leading zero,

    assert_valid_json ('[0123]');

gives output

    Undefined subroutine &main::validate_json called 

The error message is this complicated because all of the following are valid here: whitespace: [0 ]; comma: [0,1], end of array: [0], dot: [0.1], or exponential: [0e0].

These are all handled by this error. Thus the error messages are a little confusing as diagnostics.

Versions of this module prior to 0.29 gave more informative messages like "leading zero in number". (The messages weren't documented.) The reason to change over to the single message was because it makes the parsing code simpler, and because the testing code described in "TESTING" makes use of the internals of this error to check that the error message produced actually do correspond to the invalid and valid bytes allowed by the parser, at the exact byte given.

This is a bytewise error, thus for example if a miscoded UTF-8 appears in the input, an error message saying what bytes would be valid at that point will be printed.

    no utf8;
    use JSON::Parse 'assert_valid_json';
    
    # Error in first byte:
    
    my $bad_utf8_1 = chr (hex ("81"));
    eval { assert_valid_json ("[\"$bad_utf8_1\"]"); };
    print "$@\n";
    
    # Error in third byte:
    
    my $bad_utf8_2 = chr (hex ('e2')) . chr (hex ('9C')) . 'b';
    eval { assert_valid_json ("[\"$bad_utf8_2\"]"); };
    print "$@\n";

prints

    JSON error at line 1, byte 3/5: Unexpected character 0x81 parsing string starting from byte 2: expecting printable ASCII or first byte of UTF-8: '\x20-\x7f', '\xC2-\xF4' at examples/bad-utf8.pl line 10.
    
    JSON error at line 1, byte 5/7: Unexpected character 'b' parsing string starting from byte 2: expecting bytes in range 80-bf: '\x80-\xbf' at examples/bad-utf8.pl line 16.
    

Unexpected end of input

The end of the string was encountered before the end of whatever was being parsed was. For example, if a quote is missing from the end of the string, it will give an error like this:

    assert_valid_json ('{"first":"Suzuki","second":"Murakami","third":"Asada}');

gives output

    Undefined subroutine &main::validate_json called 

Not surrogate pair

While parsing a string, a surrogate pair was encountered. While trying to turn this into UTF-8, the second half of the surrogate pair turned out to be an invalid value.

    assert_valid_json ('["\uDC00\uABCD"]');

gives output

    Undefined subroutine &main::validate_json called 

Empty input

This error occurs for "assert_valid_json" when it's given an empty or undefined value. Given empty input, "parse_json" returns an undefined value rather than throwing an error.

SPEED

On the author's computer, the module's speed of parsing is approximately the same as JSON::XS, with small variations depending on the type of input. For validation, "valid_json" is faster than any other module known to the author, and up to ten times faster than JSON::XS.

Some special types of input, such as floating point numbers containing an exponential part, like "1e09", seem to be about two or three times faster to parse with this module than with JSON::XS. In JSON::Parse, parsing of exponentials is done by the system's strtod function, but JSON::XS contains its own parser for exponentials, so these results may be system-dependent.

On the other hand, JSON::XS makes better use of Perl's inbuilt string handling than JSON::Parse and so it's faster for some types of strings. The main focus of the version 0.29 release is increased accuracy and better handling of edge cases. I'm planning to attend to the speed issues in future versions.

There is some benchmarking code in the github repository under the directory "benchmarks" for those wishing to test these claims. The script benchmarks/bench is an adaptation of the similar script in the JSON::XS distribution.

The following benchmark tests used version 0.29 of JSON::Parse and version 3.01 of JSON::XS on the files in the "benchmarks" directory of JSON::Parse. "short.json" and "long.json" are the benchmarks used by JSON::XS.

short.json
    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     | 358487.521 |  0.0000279 |
    JSON::Parse   | 179243.761 |  0.0000558 |
    JSON::XS      | 156503.881 |  0.0000639 |
    --------------+------------+------------+
long.json
    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     |   6385.968 |  0.0015659 |
    JSON::Parse   |   2803.492 |  0.0035670 |
    JSON::XS      |   3506.357 |  0.0028520 |
    --------------+------------+------------+
words-array.json
    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     | 164482.510 |  0.0000608 |
    JSON::Parse   |  22622.999 |  0.0004420 |
    JSON::XS      |  21936.736 |  0.0004559 |
    --------------+------------+------------+
exp.json
    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     |  88487.426 |  0.0001130 |
    JSON::Parse   |  35726.610 |  0.0002799 |
    JSON::XS      |  13662.228 |  0.0007319 |
    --------------+------------+------------+
literals.json
    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     | 204600.195 |  0.0000489 |
    JSON::Parse   |  31230.856 |  0.0003202 |
    JSON::XS      |  17578.810 |  0.0005689 |
    --------------+------------+------------+
cpantesters.json
    Repetitions: 10 x 100 = 1000
    --------------+------------+------------+
    module        |      1/min |        min |
    --------------|------------|------------|
    JP::valid     |    631.187 |  0.0158432 |
    JSON::Parse   |    132.401 |  0.0755279 |
    JSON::XS      |    131.020 |  0.0763240 |
    --------------+------------+------------+

SEE ALSO

RFC 4627

JSON is specified in RFC 4627 "The application/json Media Type for JavaScript Object Notation (JSON)".

json.org

http://json.org is the website for JSON, authored by Douglas Crockford.

JSON, JSON::XS, and friends

These modules allow both reading and writing of JSON. JSON::Parse originated as a response to the overcomplex interface of JSON, in particular its exasperating handling of Unicode.

There are also a lot of other modules for parsing and producing JSON on CPAN. I have found the following ones: JSON::DWIW, JSON::Any, JSON::YAJL, JSON::Util, JSON::Tiny, Pegex::JSON, JSON::Streaming::Reader, JSON::Syck, Mojo::JSON. Please let me know of any others I've missed.

A fork of JSON::XS also exists as Cpanel::JSON::XS. This is related to a disagreement about how to report bugs. Please see the module for details. Another module, JSON::XS::VersionOneAndTwo, supports two different interfaces of JSON::XS. However, JSON::XS is now onto version 3. JSON::SL also seems to be a fork of JSON::XS. I was unable to run its example code.

TEST RESULTS

The CPAN testers results are at the usual place. At the time of release of this 0.29 version of the module, apart from pre-5.8.9 versions of Perl, there is only one CPAN testers testing machine on which JSON::Parse fails its tests, a Windows 5.16.3 multithreaded Perl. So far I have been unable to work out why these tests are failing on that machine. If JSON::Parse does not install on your machine, let me know.

The ActiveState test results are at http://code.activestate.com/ppm/JSON-Parse/.

EXPORTS

The module exports nothing by default. All of the functions, "parse_json", "json_file_to_perl", "valid_json" and "assert_valid_json", as well as the old function names "validate_json" and "json_to_perl", can be exported on request.

All of the functions can be exported using the tag ':all':

    use JSON::Parse ':all';

SUPPORT

There is a mailing list at <json-parse@googlegroups.com> for announcements and discussions about the module. You can read it on the web at https://groups.google.com/forum/#!forum/json-parse. Membership is open to anyone.

TESTING

The module incorporates extensive testing related to the production of error messages and validation of input. Some of the testing code is supplied with the module in the /t/ subdirectory of the distribution.

More extensive testing code is in the git repository. This is not supplied in the CPAN distribution. A script, randomjson.pl, generates a set number of bytes of random JSON and checks that the module's bytewise validation of input is correct. This setup relies on a C file Json3-random-test.c which isn't in the CPAN distribution, and it also requires Json3.xs to be edited to make the macro TESTRANDOM true (uncomment line 7 of the file). The testing code uses C setjmp/longjmp, so it's not guaranteed to work on all operating systems and is commented out for CPAN releases.

A pure C version called "random-test.c" also exists. This applies exactly the same tests, and requires no Perl at all.

AUTHOR

Ben Bullock, <bkb@cpan.org>

LICENSE

JSON::Parse can be used, copied, modified and redistributed under the same terms as Perl itself.