The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

YATT::Lite::LRXML::Syntax - Loose but Recursive XML (LRXML) format.

SYNOPSIS

  require YATT::Lite::LRXML;
  my $container = YATT::Lite::LRXML->load_from(string => <<'END');
  <!yatt:args x y>
  <h2>&yatt:x;</h2>
  &yatt:y;

  <!yatt:widget foo id x>
  <div id="&yatt:id;">
    &yatt:x;
  </div>
  END

DESCRIPTION

Loose but Recursive XML (LRXML), which I'm defining here, is an XML-like template format. LRXML is first used in my template engine YATT and then extended in my latest template engine YATT::Lite.

LRXML format consists of 3 layers of syntax definitions which are "LRXML multipart container" (or simply container), "LRXML template" (template) and "LRXML entity reference" (entref). A container can carry multiple parts. Each part can have a boundary (header) and it can carry meta information (usually used as a declaration) for the body of the part. Each part can be a template or other type of text payload. Entref can appear in templates and other text payload.

LRXML format only defines syntax and doesn't touch semantics, like S-expression in Lisp. Actually, the current implementation of LRXML parser determines the types of each part by (predefined) declaration keywords (such as "widget", "page", "action"...), but the declaration keywords are not part of this LRXML format specification. It is opened for each user of LRXML format.

XXX: Brief introduction of LRXML

FORMAT SPECIFICATION

Syntax Notation (ABNF with negative-match)

In this document, I (roughly) use ABNF, with some modifications/extensions.

[..] means a character set, like regexp in perl5.

In original ABNF, [..] means optional element.

The operator "?" is equivalent of *1 and indicates optional element.

For optional element, I chose ?<elem> instead of [<elem>].

The operator " ¬ " preceding an element indicates negative-match.

If an element is written like:

   ¬ elem

then this pattern matches longest possible character sequence which do not match elem. This operator helps defining customizable namespace.

Rule can take parameters.

If left-hand-side of a rule definition consists of two or more words, it is a parametric rule. Parametric rule is used like <rule Param>.

   group C          =  *term C

   ...other rule... =   <group ")">

Customizable namespace qualifier

In LRXML, every top-level constructs are marked by namespace qualifier (or simply namespace). Namespace can be customized to arbitrary set of words. For simplicity, in this document, I put a "sample" definition of customizable namespace rule CNS like:

  CNS             = ("yatt")

But every implementation of LRXML parser should allow overriding this rule like following instead:

  CNS             = ("yatt" / "js" / "perl")

BNF of LRXML multipart container

  lrxml-container = ?(lrxml-payload) *( lrxml-boundary lrxml-payload
                                      / lrxml-comment )

  lrxml-boundary  = "<!" CNS ":" NSNAME decl-attlist ">" EOL

  lrxml-comment   = "<!--#" CNS *comment-payload "-->"

  lrxml-payload   = ¬("<!" (CNS ":" / "#" CNS))

  decl-attlist    = *(1*WS / inline-comment / att-pair / decl-macro)

  inline-comment  = "--" comment-payload "--"

  comment-payload = *([^-] / "-" [^-])

  decl-macro      = "%" NAME *[0-9A-Za-z_:\.\-=\[\]\{\}\(,\)] ";"

  att-pair        = ?(NSNAME "=") att-value

  att-value       = squoted-att / dquoted-att / nested-att / bare-att

  squoted-att     = ['] *[^'] [']

  dquoted-att     = ["] *[^"] ["]

  nested-att      = '[' decl-attlist ']'

  bare-att        = 1*[^'"\[\]\ \t\n<>/=]

  NSNAME          = NAME *(":" NAME)

  NAME            = 1*[0-9A-Za-z_]

  WS              = [\ \t\n]

  EOL             = ?[\r] [\n]

Some notes on current spec and future changes:

NAME may be allowed to contain unicode word.

In current YATT::Lite, NAME can cotain \w in perl unicode semantics.

BNF of LRXML template syntax .

  lrxml-template   = ?(template-payload) *( (template-tag / lrxml-entref )
                                           ?(template-payload) )

  template-payload = ¬( tag-leader / ent-leader )

  tag-leader       = "<" ( CNS ":"
                         / "?" CNS
                         )

  ent-leader       = "&" ( CNS (":" / lcmsg )
                         / special-entity
                         )

  template-tag     = element / pi

  element          = "<" (single-elem / open-tag / close-tag) ">"

  pi               = "<?" CNS ?NSNAME pi-payload "?>"

  single-elem      = CNS NSNAME elem-attlist "/"

  open-tag         = CNS NSNAME elem-attlist

  close-tag        =  "/" CNS NSNAME *WS

  elem-attlist     = *(1*WS / inline-comment / att-pair)

  pi-payload       = *([^?] / "?" [^>])

BNF of LRXML entity reference syntax

  lrxml-entref     = "&" ( CNS (pipeline / lcmsg)
                         / special-entity "(" <group ")">
                         )
                     ";"

  pipeline         = 1*( ":" NAME ?( "(" <group ")">)
                       / "[" <group "]">
                       / "{" <group "}">
                       )

  group CLO        = *ent-term CLO

  ent-term         = ( ","
                     / ( etext / pipeline ) ?[,:]
                     )

  etext            = etext-head *etext-body

  etext-head       = ( ETEXT *( ETEXT / ":" )
                     / paren-quote
                     )

  etext-body       = ( ETEXT *( ETEXT / ":" )
                     / paren-quote
                     / etext-any-group
                     )

  etext-any-group  = ( "(" <etext-group ")">
                     / "{" <etext-group "}">
                     / "[" <etext-group "]">
                     )

  etext-group CLO  = *( ETEXT / [:,] ) *etext-any-group CLO

  paren-quote      = "(" *( [^()] / paren-quote ) ")"

  lcmsg            = lcmsg-open / lcmsg-sep / lcmsg-close

  lcmsg-open       = ?("#" NAME) 2*"["

  lcmsg-sep        = 2*"|"

  lcmsg-close      = 2*"]"

  special-entity   = SPECIAL_ENTNAME

  ETEXT            = [^\ \t\n,;:(){}\[\]]

Special entity name

Special entity is another customizable syntax element. For example, it is usually defined like:

  SPECIAL_ENTNAME  = ("HTML")

And then you can write &HTML(:var);.

But every implementation of LRXML parser should allow overriding this rule like following instead:

  SPECIAL_ENTNAME  = ("HTML" / "JSON" / "DUMP")

AUTHOR

"KOBAYASI, Hiroaki" <hkoba@cpan.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.