NAME
JSON::Schema::Validate - Lean, recursion-safe JSON Schema validator (Draft 2020-12)
SYNOPSIS
use JSON::Schema::Validate;
use JSON ();
my $schema = {
'$schema' => 'https://json-schema.org/draft/2020-12/schema',
'$id' => 'https://example.org/s/root.json',
type => 'object',
required => [ 'name' ],
properties => {
name => { type => 'string', minLength => 1 },
next => { '$dynamicRef' => '#Node' },
},
'$dynamicAnchor' => 'Node',
additionalProperties => JSON::false,
};
my $js = JSON::Schema::Validate->new( $schema )
->compile
->content_checks
->ignore_unknown_required_vocab
->prune_unknown
->register_builtin_formats
->trace
->trace_limit(200); # 0 means unlimited
my $ok = $js->validate({ name => 'head', next=>{ name => 'tail' } })
or die( $js->error );
print "ok\n";
VERSION
v0.4.1
DESCRIPTION
JSON::Schema::Validate is a compact, dependency-light validator for JSON Schema draft 2020-12. It focuses on:
Correctness and recursion safety (supports
$ref,$dynamicRef,$anchor,$dynamicAnchor).Draft 2020-12 evaluation semantics, including
unevaluatedItemsandunevaluatedPropertieswith annotation tracking.A practical Perl API (constructor takes the schema; call
validatewith your data; inspecterror/errorson failure).Builtin validators for common
formats (date, time, email, hostname, ip, uri, uuid, JSON Pointer, etc.), with the option to register or override custom format handlers.
This module is intentionally minimal compared to large reference implementations, but it implements the parts most people rely on in production.
Supported Keywords (2020-12)
Types
type(string or array of strings), including union types. Unions may also include inline schemas (e.g.type => [ 'integer', { minimum => 0 } ]).Constant / Enumerations
const,enum.Numbers
multipleOf,minimum,maximum,exclusiveMinimum,exclusiveMaximum.Strings
minLength,maxLength,pattern,format.Arrays
prefixItems,items,contains,minContains,maxContains,uniqueItems,unevaluatedItems.Objects
properties,patternProperties,additionalProperties,propertyNames,required,dependentRequired,dependentSchemas,unevaluatedProperties.Combinators
allOf,anyOf,oneOf,not.Conditionals
if,then,else.Referencing
$id,$anchor,$ref,$dynamicAnchor,$dynamicRef.
Formats
Call register_builtin_formats to install default validators for the following format names:
date-time,date,time,durationLeverages DateTime and DateTime::Format::ISO8601 when available (falls back to strict regex checks). Duration uses DateTime::Duration.
email,idn-emailImported and use the very complex and complete regular expression from Regexp::Common::Email::Address, but without requiring this module.
hostname,idn-hostnameidn-hostnameuses Net::IDN::Encode if available; otherwise, applies a permissive Unicode label check and thenhostnamerules.ipv4,ipv6Strict regex-based validation.
uri,uri-reference,iriReasonable regex checks for scheme and reference forms (heuristic, not a full RFC parser).
uuidHyphenated 8-4-4-4-12 hex.
json-pointer,relative-json-pointerConformant to RFC 6901 and the relative variant used by JSON Schema.
regexChecks that the pattern compiles in Perl.
Custom formats can be registered or override builtins via register_format or the format => { ... } constructor option (see "METHODS").
CONSTRUCTOR
new
my $js = JSON::Schema::Validate->new( $schema, %opts );
Build a validator from a decoded JSON Schema (Perl hash/array structure), and returns the newly instantiated object.
Options (all optional):
compile => 1|0-
Defaults to
0Enable or disable the compiled-validator fast path.
When enabled and the root has not been compiled yet, this triggers an initial compilation.
content_assert => 1|0-
Defaults to
0Enable or disable the content assertions for the
contentEncoding,contentMediaTypeandcontentSchematrio.When enabling, built-in media validators are registered (e.g.
application/json). format => \%callbacks-
Hash of
format_name => sub{ ... }validators. Each sub receives the string to validate and must return true/false. Entries here take precedence when you later callregister_builtin_formats(i.e. your callbacks remain in place). ignore_unknown_required_vocab => 1|0-
Defaults to
0If enabled, required vocabularies declared in
$vocabularythat are not advertised as supported by the caller will be ignored instead of causing the validator todie.You can also use
ignore_req_vocabfor short. max_errors-
Defaults to
200Sets the maximum number of errors to be recorded.
normalize_instance => 1|0-
Defaults to
1When true, the instance is round-tripped through JSON before validation, which enforces strict JSON typing (strings remain strings; numbers remain numbers). This matches Python
jsonschema’s type behaviour. Set to0if you prefer Perl’s permissive numeric/string duality. prune_unknown => 1|0-
Defaults to
0When set to a true value, unknown object properties in the instance are pruned (removed) prior to validation, based on the schema’s structural keywords.
Pruning currently takes into account:
propertiespatternPropertiesadditionalProperties(item value or subschema, including within
allOf)allOf(for merging additional object or array constraints)
For objects:
Any property explicitly declared under
propertiesis kept, and its value is recursively pruned according to its subschema (if it is itself an object or array).Any property whose name matches one of the
patternPropertiesregular expressions is kept, and pruned recursively according to the associated subschema.If
additionalPropertiesisfalse, any object property not covered bypropertiesorpatternPropertiesis removed.If
additionalPropertiesis a subschema, any such additional property is kept, and its value is pruned recursively following that subschema.
For arrays:
Items covered by
prefixItems(by index) oritems(for remaining elements) are kept, and if they are objects or arrays, they are pruned recursively. Existing positions are never removed; pruning only affects the nested contents.
The pruner intentionally does not interpret
anyOf,oneOfornotwhen deciding which properties to keep or drop, because doing so would require running full validation logic and could remove legitimate data incorrectly. In those cases, pruning errs on the side of keeping more data rather than over-pruning.When
prune_unknownis disabled (the default), the instance is not modified for validation purposes, and no pruning is performed. trace-
Defaults to
0Enable or disable tracing. When enabled, the validator records lightweight, bounded trace events according to "trace_limit" and "trace_sample".
trace_limit-
Defaults to
0Set a hard cap on the number of trace entries recorded during a single
validatecall (0= unlimited). trace_sample => $percent-
Enable probabilistic sampling of trace events.
$percentis an integer percentage in[0,100].0disables sampling. Sampling occurs per-event, and still respects "trace_limit". vocab_support => {}-
A hash reference of support vocabularies.
METHODS
compile
$js->compile; # enable compilation
$js->compile(1); # enable
$js->compile(0); # disable
Enable or disable the compiled-validator fast path.
When enabled and the root hasn’t been compiled yet, this triggers an initial compilation.
Returns the current object to enable chaining.
content_checks
$js->content_checks; # enable
$js->content_checks(1); # enable
$js->content_checks(0); # disable
Turn on/off content assertions for the contentEncoding, contentMediaType and contentSchema trio.
When enabling, built-in media validators are registered (e.g. application/json).
Returns the current object to enable chaining.
POD::Coverage enable_content_checks
error
my $msg = $js->error;
Returns the first error JSON::Schema::Validate::Error object out of all the possible errors found (see "errors"), if any.
When stringified, the object provides a short, human-oriented message for the first failure.
errors
my $array_ref = $js->errors;
All collected error objects (up to the internal max_errors cap).
get_trace
my $trace = $js->get_trace; # arrayref of trace entries (copy)
Return a copy of the last validation trace (array reference of hash references) so callers cannot mutate internal state. Each entry contains:
{
inst_path => '#/path/in/instance',
keyword => 'node' | 'minimum' | ...,
note => 'short string',
outcome => 'pass' | 'fail' | 'visit' | 'start',
schema_ptr => '#/path/in/schema',
}
get_trace_limit
my $n = $js->get_trace_limit;
Accessor that returns the numeric trace limit currently in effect. See "trace_limit" to set it.
ignore_unknown_required_vocab
$js->ignore_unknown_required_vocab; # enable
$js->ignore_unknown_required_vocab(1); # enable
$js->ignore_unknown_required_vocab(0); # disable
If enabled, required vocabularies declared in $vocabulary that are not advertised as supported by the caller will be ignored instead of causing the validator to die.
Returns the current object to enable chaining.
is_compile_enabled
my $bool = $js->is_compile_enabled;
Read-only accessor.
Returns true if compilation mode is enabled, false otherwise.
is_content_checks_enabled
my $bool = $js->is_content_checks_enabled;
Read-only accessor.
Returns true if content assertions are enabled, false otherwise.
is_trace_on
my $bool = $js->is_trace_on;
Read-only accessor.
Returns true if tracing is enabled, false otherwise.
is_unknown_required_vocab_ignored
my $bool = $js->is_unknown_required_vocab_ignored;
Read-only accessor.
Returns true if unknown required vocabularies are being ignored, false otherwise.
prune_instance
my $pruned = $jsv->prune_instance( $instance );
Returns a pruned copy of $instance according to the schema that was passed to new. The original data structure is not modified.
The pruning rules are the same as those used when the constructor option prune_unknown is enabled (see "prune_unknown"), namely:
For objects, only properties allowed by
properties,patternPropertiesandadditionalProperties(including those brought in viaallOf) are kept. Their values are recursively pruned when they are objects or arrays.If
additionalPropertiesisfalse, properties not matched bypropertiesorpatternPropertiesare removed.If
additionalPropertiesis a subschema, additional properties are kept and pruned recursively according to that subschema.For arrays, items are never removed by index. However, for elements covered by
prefixItemsoritems, their nested content is pruned recursively when it is an object or array.anyOf,oneOfandnotare not used to decide which properties to drop, to avoid over-pruning valid data without performing full validation.
This method is useful when you want to clean incoming data structures before further processing, without necessarily performing a full schema validation at the same time.
register_builtin_formats
$js->register_builtin_formats;
Registers the built-in validators listed in "Formats". Existing user-supplied format callbacks are preserved if they already exist under the same name.
User-supplied callbacks passed via format => { ... } are preserved and take precedence.
register_content_decoder
$js->register_content_decoder( $name => sub{ ... } );
or
$js->register_content_decoder(rot13 => sub
{
$bytes =~ tr/A-Za-z/N-ZA-Mn-za-m/;
return( $bytes ); # now treated as (1, undef, $decoded)
});
Register a content decoder for contentEncoding. The callback receives a single argument: the raw data, and should return one of:
a decoded scalar (success);
undef(failure);or the triplet
( $ok, $msg, $out )where$okis truthy on success,$msgis an optional error string, and$outis the decoded value.
The $name is lower-cased internally. Returns the current object.
Throws an exception if the second argument is not a code reference.
register_format
$js->register_format( $name, sub { ... } );
Register or override a format validator at runtime. The sub receives a single scalar (the candidate string) and must return true/false.
register_media_validator
$js->register_media_validator( 'application/json' => sub{ ... } );
Register a media validator/decoder for contentMediaType. The callback receives 2 arguments:
$bytesThe data to validate
\%paramsA hash reference of media-type parameters (e.g.
charset).
It may return one of:
( $ok, $msg, $decoded )— canonical form. On success$okis true,$msgis optional, and$decodedcan be either a Perl structure or a new octet/string value.a reference — treated as success with that reference as
$decoded.a defined scalar — treated as success with that scalar as
$decoded.undefor empty list — treated as failure.
The media type key is lower-cased internally.
It returns the current object.
It throws an exception if the second argument is not a code reference.
set_comment_handler
$js->set_comment_handler(sub
{
my( $schema_ptr, $text ) = @_;
warn "Comment at $schema_ptr: $text\n";
});
Install an optional callback for the Draft 2020-12 $comment keyword.
$comment is annotation-only (never affects validation). When provided, the callback is invoked once per encountered $comment string with the schema pointer and the comment text. Callback errors are ignored.
If a value is provided, and is not a code reference, a warning will be emitted.
This returns the current object.
set_resolver
$js->set_resolver( sub { my( $absolute_uri ) = @_; ...; return $schema_hashref } );
Install a resolver for external documents. It is called with an absolute URI (formed from the current base $id and the $ref) and must return a Perl hash reference representation of a JSON Schema. If the returned hash contains '$id', it will become the new base for that document; otherwise, the absolute URI is used as its base.
set_vocabulary_support
$js->set_vocabulary_support( \%support );
Declare which vocabularies the host supports, as a hash reference:
{
'https://example/vocab/core' => 1,
...
}
Resets internal vocabulary-checked state so the declaration is enforced on next validate.
It returns the current object.
trace
$js->trace; # enable
$js->trace(1); # enable
$js->trace(0); # disable
Enable or disable tracing. When enabled, the validator records lightweight, bounded trace events according to "trace_limit" and "trace_sample".
It returns the current object for chaining.
trace_limit
$js->trace_limit( $n );
Set a hard cap on the number of trace entries recorded during a single validate call (0 = unlimited).
It returns the current object for chaining.
trace_sample
$js->trace_sample( $percent );
Enable probabilistic sampling of trace events. $percent is an integer percentage in [0,100]. 0 disables sampling. Sampling occurs per-event, and still respects "trace_limit".
It returns the current object for chaining.
validate
my $ok = $js->validate( $data );
Validate a decoded JSON instance against the compiled schema. Returns a boolean. On failure, inspect $js->error to retrieve the error object that stringifies for a concise message (first error), or $js->errors for an array reference of error objects like:
my $err = $js->error;
say $err->path; # #/properties~1name
say $err->message; # string shorter than minLength 1
say "$err"; # error object will stringify
BEHAVIOUR NOTES
Recursion & Cycles
The validator guards on the pair
(schema_pointer, instance_address), so self-referential schemas and cyclic instance graphs won’t infinite-loop.Union Types with Inline Schemas
typemay be an array mixing string type names and inline schemas. Any inline schema that validates the instance makes thetypecheck succeed.Booleans
For practicality in Perl,
type => 'boolean'accepts JSON-like booleans (e.g. true/false, 1/0 as strings) as well as Perl boolean objects (if you use a boolean class). If you need stricter behaviour, you can adapt_match_typeor introduce a constructor flag and branch there.Unevaluated*
Both
unevaluatedItemsandunevaluatedPropertiesare enforced using annotation produced by earlier keyword evaluations within the same schema object, matching draft 2020-12 semantics.RFC rigor and media types
URI/
IRIand media‐type parsing is intentionally pragmatic rather than fully RFC-complete. For example,uri,iri, anduri-referenceuse strict but heuristic regexes;contentMediaTypevalidates UTF-8 fortext/*; charset=utf-8and supports pluggable validators/decoders, but is not a general MIME toolkit.Compilation vs. Interpretation
Both code paths are correct by design. The interpreter is simpler and great while developing a schema; toggle
->compilewhen moving to production or after the schema stabilises. You may enable compilation lazily (callcompileany time) or eagerly via the constructor (compile => 1).
WHY ENABLE COMPILE?
When compile is ON, the validator precompiles a tiny Perl closure for each schema node. At runtime, those closures:
avoid repeated hash lookups for keyword presence/values;
skip dispatch on absent keywords (branchless fast paths);
reuse precompiled child validators (arrays/objects/combinators);
reduce allocator churn by returning small, fixed-shape result hashes.
In practice this improves steady-state throughput (especially for large/branchy schemas, or hot validation loops) and lowers tail latency by minimising per-instance work. The trade-offs are:
a one-time compile cost per node (usually amortised quickly);
a small memory footprint for closures (one per visited node).
If you only validate once or twice against a tiny schema, compilation will not matter; for services, batch jobs, or streaming pipelines it typically yields a noticeable speedup. Always benchmark with your own schema+data.
CREDITS
Albert from OpenAI for his invaluable help.
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
perl, DateTime, DateTime::Format::ISO8601, DateTime::Duration, Regexp::Common, Net::IDN::Encode, JSON::PP
python-jsonschema, fastjsonschema, Pydantic, RapidJSON Schema
https://json-schema.org/specification
COPYRIGHT & LICENSE
Copyright(c) 2025 DEGUEST Pte. Ltd.
All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.