The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

XML::Invisible - transform "invisible XML" documents into XML using a grammar

SYNOPSIS

  use XML::Invisible qw(make_parser);
  my $transformer = make_parser(from_file($ixmlspec));
  my $ast = $transformer->(from_file($ixml_input));

  use XML::Invisible qw(make_parser ast2xml);
  my $transformer = make_parser(from_file($ixmlspec));
  my $xmldoc = ast2xml($transformer->(from_file($ixml_input)));
  to_file($outputfile, $xmldoc->toStringC14N(1));

  # or, with conventional pre-compiled Pegex grammar
  my $parser = Pegex::Parser->new(
    grammar => My::Thing::Grammar->new,
    receiver => XML::Invisible::Receiver->new,
  );
  my $got = $parser->parse($input);

  # from command line
  cpanm XML::Invisible XML::Twig
  perl -MXML::Invisible=make_parser,ast2xml -e \
    'print ast2xml(make_parser(join "", <>)->("(a+b)"))->toStringC14N(1)' \
    examples/arith-grammar.ixml | xml_pp

  # canonicalise a document
  use XML::Invisible qw(make_parser make_canonicaliser);
  my $ixml_grammar = from_file('examples/arith-grammar.ixml');
  my $transformer = make_parser($ixml_grammar);
  my $ast = $transformer->(from_file($ixml_input));
  my $canonicaliser = make_canonicaliser($ixml_grammar);
  my $canonical = $canonicaliser->($ast);

DESCRIPTION

An implementation of Steven Pemberton's Invisible XML concept, using Pegex. Supply it with your grammar, in Pegex format (slightly different from Steven's specification due to differences between his notation and Pegex's), it returns you a function, which you can call to transform "invisible XML" documents into actual XML.

This is largely a Pegex "receiver" class that exploits the + and - syntax in rules in slightly unintended ways, and a wrapper to make this operate.

GRAMMAR SYNTAX

See Pegex::Syntax. Generally, all rules will result in an XML element. All terminals will need to capture with () (see example below).

However, if you specify a dependent token with + it will instead become an attribute (equivalent of Steven's @). If -, this will "flatten" (equivalent of Steven's -) - the children will be included without making an element of that node. Since in Pegex any element can be skipped entirely with ., you can use that instead of - to omit terminals.

E.g.

  expr: +open -arith +close
  open: /( LPAREN )/
  close: /( RPAREN )/
  arith: left -op right
  left: +name
  right: -name
  name: /(a)/ | /(b)/
  op: +sign
  sign: /( PLUS )/

When given (a+b) yields:

  <expr open="(" sign="+" close=")">
    <left name="a"/>
    <right>b</right>
  </expr>

FUNCTIONS

make_parser

Exportable. Returns a function that when called with an "invisible XML" document, it will return an abstract syntax tree (AST), of the general form:

  {
    nodename => 'expr',
    attributes => { open => '(', sign => '+', close => ')' },
    children => [
      { nodename => 'left', attributes => { name => 'a' } },
      { nodename => 'right', children => [ 'b' ] },
    ],
  }

Arguments:

an "invisible XML" Pegex grammar specification, OR a Pegex::Grammar object

See XML::Invisible::Receiver for more.

ast2xml

Exportable. Turns an AST, as output by "make_parser", from XML::Invisible::Receiver into an object of class XML::LibXML::Document. Needs XML::LibXML installed, which as of version 0.05 of this module is only a suggested dependency, not required.

Arguments:

an AST from XML::Invisible::Receiver

make_canonicaliser

Exportable. Returns a function that when called with an AST as produced from a document by a "make_parser", returns a canonical version of the original document, or undef if it failed.

Arguments:

an XML::Invisible grammar

It uses a few heuristics:

literals that are 0-1 (?) or any number (*) will be omitted
literals that are at least one (+) will be inserted once
if an "any" group is given, the first one that matches will be selected

This last one means that if you want a canonical representation that is not the bare minimum, provide that as a literal first choice (see the assign rule below - while it will accept any or no whitespace, the "canonical" version is given):

  expr: target .assign source
  target: +name
  assign: ' = ' | (- EQUAL -)
  source: -name
  name: /( ALPHA (: ALPHA | DIGIT )* )/

DEBUGGING

To debug, set environment variable XML_INVISIBLE_DEBUG to a true value.

SEE ALSO

Pegex

https://homepages.cwi.nl/~steven/ixml/ - Steven Pemberton's Invisible XML page

AUTHOR

Ed J, <etj at cpan.org>

BUGS

Please report any bugs or feature requests on https://github.com/mohawk2/xml-invisible/issues.

Or, if you prefer email and/or RT: to bug-xml-invisible at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=XML-Invisible. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

LICENSE AND COPYRIGHT

Copyright 2018 Ed J.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0