The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Rodney::XML::QuickStruct - Quick Perl data structures from XML.

VERSION

1.1

Please note, this API is currently beta software. I've tested it to my deisgns, but I haven't had input from any other users, yet. If you have comments or other input, please send them to me: Rodney Broom <perl@rbroom.com>.

SYNOPSIS

    # Setup:
    use Rodney::XML::QuickStruct;

    %tag_map = (
        person   => 'list',
        hobby    => 'list',
        name     => 'scalar',
        age      => 'scalar',
    );


    # Then:
    $parser = Rodney::XML::QuickStruct->new;
    $data_struct = $parser->parse_file($xml_file, %tag_map);

    # Or:
    $data_struct = Rodney::XML::QuickStruct::parse_file($xml_file, %tag_map);

Be sure to see the EXAMPLES section.

DESCRIPTION

This API provides a quick and easy way to get XML-like markup into a Perl data structure. This API isn't intended to be terribly powerful, or at all extendable, but it is easy. Also, it's pure Perl so it doesn't require installing any of the usual Perl XML modules.

This API provides both an OO interface and a function interface. My preference is the OO interface, but the function interface is probably what you'll want since it's a touch simpler to get the job done. It should be noted that we let you use which ever interface you like, but that the two do not share information. For example, if you have an error recorded in your $parser object, it will only be available through that object and not via the function interface. So this snippet will always die(), without an error message.

    $data_struct = $parser->parse_file($bad_file_name);
    die Rodney::XML::QuickStruct::error() unless $data_struct;

FUNCTIONS

These are the functions that are intended for public use.

new()

This function is the intended starting place for most folks. If you don't create an object to work with, then we'll quietly create one underneath for our own use.

    $parser = Rodney::XML::QuickStruct->new($tag_map);
    $parser = Rodney::XML::QuickStruct->new(
        tag_map => \%tag_map,
        debug   => 1
    );

    unless ($parser) {
        die Rodney::XML::QuickStruct::error()."\n";
    }

All parameters are optional and case insensative.

    These are the arguments supported by new().

    tag_map

    Loads the given tag map into the object. This needs to be a ref to a hash and will get loaded via the tag_map() method. (Don't worry about your ref, we won't make changes inside of it.) This loaded tag map will serve as the default map for any calls by this object that don't receive an overloaded map.

    debug

    Turns on debugging messages printed to standard error. This is really intended for debugging the API, but you may find it usefull. This is implimented via the debug() method and accepts the same value.

parse_file()

Loads a file and processes through parse_content(). Return is in a hash or hashref on success, undef on failure. The hashref is prefered as it's a bit cheaper for you.

OO interface:
    $data_struct = $parser->parse_file($file_name);
    # Or:
    $data_struct = $parser->parse_file($file_name, %tag_map);
    # Or:
    $data_struct = $parser->parse_file($file_name, \%tag_map);
Function interface:
    $data_struct = Rodney::XML::QuickStruct::parse_file($file_name, %tag_map);
    # Or:
    $data_struct = Rodney::XML::QuickStruct::parse_file($file_name, \%tag_map);

$file may be a file name or a file handle. See the error() routine also.

parse_content()

This is where the work gets done. This routine parses the XML-like content that gets passed to it. On success, returns a hash or hashref data structure representing that content. On failure, returns undef. Also see the error() routine.

OO interface:

In all cases, we'll try to use the arguments that are passed instead of what's currently in the object. However, if you pass new data, we will NOT load it into your established object. The reason for this is to allow you to use the object as a source of default information. Currently this only applies to the tag map.

    # If you've already loaded a tag map.
    $data_struct = $parser->parse_content($content);
    # Or, with a new tag map:
    $data_struct = $parser->parse_content($content, %new_tag_map);
    # Or:
    $data_struct = $parser->parse_content($content, \%new_tag_map);
Function interface:
    Rodney::XML::QuickStruct::parse_content($content, %tag_map);
    # Or:
    Rodney::XML::QuickStruct::parse_content($content, \%tag_map);

tag_map()

Accessor/mutator for the object's tag map. This data controls what the parser thinks of a given tag.

    # Assign/reassign new tag map.
    $self->tag_map(%tag_info);
    # Or:
    $self->tag_map(\%tag_info);

    # Read current tag map;
    %curr_map = $self->tag_map;
    # Or:
    $curr_map = $self->tag_map;     # No cheaper, this ref is a copy.


    # Clear the tag map
    $self->tag_map(undef);

If you are assigning a new tag map, the keys will be cast to lower case, and the values will be checked agains the known data types. See TAG_MAP.

error()

Accesses the error stack. If called in scalar context, returns the most recent error. If called in list context, returns all errors in reverse order of occurance. The latter isn't very usefull yet, since virtually everyting that sets an error also fails it's return.

OO interface:
    $last_error = $parser->error;
    @all_errors = $parser->error;
Function interface:
    $last_error = Rodney::XML::QuickStruct::error;
    @all_errors = Rodney::XML::QuickStruct::error;

debug()

Gets/sets the debug level for the calling object.

If you don't pass an argument, we'll simply return the current debug level. If you pass a real integer argument, we'll set the debug level to that and return the previous debug level. If you pass a non-integer argument, we'll set the debug level to 1 (one) or 0 (zero), depending on Perl's idea of TRUE in relation to your argument, and then return the previous debug level.

    $old_debug  = $self->debug($integer);
    $curr_debug = $self->debug;

TAG_MAP

A tag map is a description of your data. You use a tag map to tell the parser the names of the tags that you want the parser to recognize and to define what data type each tag should be treated as. There are three data types that a tag can fall under: scalar, list, and hash. These data types are loosly ananougous to the Perl data types of the same names and refer to the storage used for the content found inside of a tag.

There is a pseudo data type available for list type tags, that being casthash. This is actually a directive telling the parser to store blocks of list tag content in hashrefs instead of in scalars. The casthash directive gets used inside the tag definition as a bareword, not in the tag map.

Don't worry about any of this too much if it sounds confusing, the EXAMPLES section will explain it all. But see the CONTENT_STRUCTURE first.

CONTENT_STRUCTURE

If you already know how to write XML, then you can probably just skip to the EXAMPLES section.

This document has refered to the content as being "XML-like". That's because we've used tokens that start with a less-than character, end in a greater-than character, can take named parameters, and can have a forward slash to say that the token is alone and without a closing token. We've called these tokens "tags", and this makes the whole thing "like" XML. It should be understood that no effort has been made to follow anybody elses protocol. Instead, the focus has been to make a quick and easy to use markup handler for simple data interchange.

All tag names are case insensitive and tags can be singular or paired:

  <single/>
  <pair>some content</pair>

Paired tags, "containers", may not have a trailing slash or they will get mis-interpreted as a single tag. If you use an opening tag, you need to use a closing tag. Not doing so will result in an error.

Parameters

All tags may use optional parameters in the form of param_name="param value". Parameters get handled like scalar tags, but don't require a definition in your tag map. So these three chunks would be equivilent. (The third would require the age tag to be defined as a scalar data type in your tag map.)

    <myContainer age="15"></myContainer>

    <myContainer age="15"/>

    <myContainer>
      <age>15</age>
    </myContainer>

Parameters are handled in a fairly forgiving manor. They are defined as a parameter name, an equal symbol, and a value. You may have space between the parameter name, the equal symbol, and the value, and the value may optionally be quoted with single or double quotes. Which ever quote type you use to start a value, if any, must be used to complete that same value. If you need quotes inside of your value, either use the other type of quote mark to encapsulate your string, or just escape them with a backslash (\). Note, escaping only works for quotes and we consider a back slash to be a litteral character any time that it isn't needed for escaping. All parameter names are cast to lower case.

Parameter Examples

    <tag parm1=one>
    <tag parm1='one'>
    <tag parm1="one">

    <tag parm1=one parm2 = two parm3 = three>

    <tag parm1 = "This is a string with an escaped \" quote mark">
    <tag parm1 = 'This is a string with an embedded " quote mark'>
    <tag parm1 = "This is a string with a backslash and a single quote \' quote mark">

    <tag parm1 = one two>
    # This example will yield parm1="one" and a bareword of "two"

Sloppy

If you are using the object interface, you may construct your object with the 'sloppy' option. If you use this option, we'll allow for somewhat less strict parsing of certain parts of your markup. This fledgling feature is an early attempt at heuristic handling of human data. Note, this isn't really recomended, but it's available. Here are the documented effects of "sloppy" mode.

"value" parameter

We will consider a tag to be single if it includes the special parameter of "value". Meaning that the trailing slash isn't required. So these two examples would be equivilant:

  <single value="my value"/>

  <single value="my value">

Be careful, this means any tag that includes the "value" parameter will get treated as a single, not just the ones that look like they should be a single tag anyway. So this markup would give you "21" in normal mode and "18" in 'sloppy' mode.

    <age value="18">
        21
    </age>

The reason for this is that in 'sloppy' mode the opening tag will get treated as a single with a value of "18". Then 'sloppy' mode will prevent the failed return that would have usually resulted from the invalid closing tag. However, the error message will still be in the object's error stack:

    Rodney::XML::QuickStruct::_process_hash(): Invalid tag type (endtag), with tag name "age".
Missing closure

If you fail to close a tag or to end it with a slash, and it looks like this markup was really supposed to mean a single tag that is just missing the trailing slash, then we'll try to fix this problem. So this:

    <person casthash>
        <age>
        <name>Jack</name>
    </person>

Would result in this:

  {
    'person' => [
        {
            'name' => 'Jack',
            'age' => undef
        }
    ]
  }

EXAMPLES

Here are some examples. They include the tag map used, the markup content used, and the resulting data structure as represented by the Data::Dumper package. Remember, the data structure is always a hashref.

Also, check your distribution for these same examples along with a script that runs them.

Basic

This example is a single tag used to define some keyed data.

Tag map
    groceries => 'hash'
Markup
    <groceries crackers=1 soup="2" milk='1'/>
Data structure
  {
    'groceries' => {
        'milk' => 1,
        'crackers' => 1,
        'soup' => 2
    }
  }

Contained tags

This example adds some tags to get the data from.

Tag map
    groceries  => 'hash'
    soup       => 'scalar'
    milk       => 'scalar'
    vegitables => 'list'
Markup
    <groceries crackers=1>
        <soup>2</soup>
        <milk value="1"/>
        <vegitables value="brocoli"/>
        <vegitables value="corn"/>
        <vegitables value="peas"/>
    </groceries>
Data structure
  {
    'groceries' => {
        'milk' => 1,
        'vegitables' => [
            'brocoli',
            'corn',
            'peas'
        ],
        'crackers' => 1,
        'soup' => 2
    }
  }

Small error

This example adds an intuative, but incorrect usage of a list type tag. (See the "corn" line.)

Tag map
    groceries  => 'hash'
    soup       => 'scalar'
    milk       => 'scalar'
    vegitables => 'list'
Markup
    <groceries crackers=1>
        <soup>2</soup>
        <milk value="1"/>
        <vegitables value="brocoli"/>
        <vegitables>corn</vegitables>
        <vegitables value="peas"/>
    </groceries>
Data structure
  {
    'groceries' => {
        'milk' => 1,
        'vegitables' => [
            'brocoli',
            'peas'
        ],
        'crackers' => 1,
        'soup' => 2
    }
  }

You'll notice that "corn" didn't make it into the vegitables list. That's because hash and list type tags are intended strictly as containers for other data tags. This means that loose text will get ignored. The exception to this loose text rule is unknown tags, which will cause errors and a failed return. Always use parameters of a scalar tag to encapsulate actual data. The next example is a solution.

Solution

Here we've added the "lit" tag to our map as a scalar in order to encapsulate litteral pieces of text. Remember, "lit" could have been any name we liked, it doesn't acutally mean "litteral".

Tag map
    groceries  => 'hash'
    soup       => 'scalar'
    milk       => 'scalar'
    vegitables => 'list'
    lit        => 'scalar'
Markup
    <groceries crackers=1>
        <soup>2</soup>
        <milk value="1"/>
        <vegitables value="brocoli"/>
        <vegitables>
            <lit>corn</lit>
            <lit>carrots</lit>
            <lit>okra</lit>
        </vegitables>
        <vegitables value="peas"/>
    </groceries>
Data structure
  {
    'groceries' => {
        'milk' => 1,
        'vegitables' => [
            'brocoli',
            'corn',
            'carrots',
            'okra',
            'peas'
        ],
        'crackers' => 1,
        'soup' => 2
    }
  }

CASTHASH

This example shows the use of the casthash directive. We'll use it here to allow us to build a list of data structures. This example is starting to be a usefull demonstration of the value of the general data structures that this API generates.

Tag map
    person   => 'list'
    name     => 'scalar'
    age      => 'scalar'
    hobby    => 'list'
    lit      => 'scalar'
Markup
    <person casthash>
        <name>Jack</name>
        <age value="25"/>

        <hobby value="Climbing trees"/>
        <hobby>
            <lit>Climbing rocks</lit>
            <lit>Flying kites</lit>
        </hobby>
    </person>

    <person name="Jill" casthash>
        <age value="Are you kidding?"/>
        <hobby value="Scrap booking"/>
        <hobby value="Plantting"/>
    </person>
Data structure
  {
    'person' => [
        {
            'name' => 'Jack',
            'hobby' => [
                'Climbing trees',
                'Climbing rocks',
                'Flying kites'
            ],
            'age' => 25
        },
        {
            'name' => 'Jill',
            'hobby' => [
                'Scrap booking',
                'Plantting'
            ],
            'age' => 'Are you kidding?'
        }
    ]

  }

Tag vs. params

Here we will show another example of casthash, demonstrate the fact that tag names and parameter names are not related, and show that tags may span lines.

Tag map
    company  => 'list'
    person   => 'list'
    name     => 'scalar'
    age      => 'scalar'
    hobby    => 'list'
    lit      => 'scalar'
Markup
    <person company="ACME" casthash>
        <name>Jack</name>
        <age value="25"/>

        <hobby value="Climbing trees"/>
        <hobby>
            <lit>Climbing rocks</lit>
            <lit>Flying kites</lit>
        </hobby>
    </person>

    <person company="ACME - Perfume Division" name="Jill" casthash>
        <age value="Are you kidding?"/>
        <hobby value="Scrap booking"/>
        <hobby value="Plantting"/>
    </person>

    <company addr1="123 Road Runner blvd." casthash>
        <name value="ACME"/>
    </company>

    <company
        name  = "ACME - Perfume Division"
        addr1 = "1313 Mockingbird Lane"
        addr2 = "Room # 5"
        phone = "PA-65000"
        casthash
    />
Data structure
  {
    'company' => [
        {
            'name' => 'ACME',
            'addr1' => '123 Road Runner blvd.'
        },
        {
            'name' => 'ACME - Perfume Division',
            'phone' => 'PA-65000',
            'addr1' => '1313 Mockingbird Lane',
            'addr2' => 'Room # 5'
        }
    ],
    'person' => [
        {
            'name' => 'Jack',
            'hobby' => [
                'Climbing trees',
                'Climbing rocks',
                'Flying kites'
            ],
            'age' => 25,
            'company' => 'ACME'
        },
        {
            'name' => 'Jill',
            'hobby' => [
                'Scrap booking',
                'Plantting',
            ],
            'age' => 'Are you kidding?',
            'company' => 'ACME - Perfume Division'
        }
    ]
  }

Be careful with casthash. If you use it with one tag, you'll probably want to use it in the rest of the tags of the same name. Not doing so will work, but will likely give you results other than what you intended.

You can see in the person and company definitions that we've used parameters that don't correlate with any know tags. That's because they don't have to. This functionality allows the content writer to prepare data without having full specification of what tag map it will be read with.

AUTHOR

Rodney Broom <perl@rbroom.com>

R.Broom Consulting, http://www.rbroom.com/consulting/

1 POD Error

The following errors were encountered while parsing the POD:

Around line 122:

You can't have =items (as at line 126) unless the first thing after the =over is an =item