Oliver M. Kellogg


CORBA::IDLtree - OMG IDL to symbol tree translator


Version 2.01


Subroutine Parse_File is the universal entry point (to be called by the main program.) It takes an IDL file name as the input parameter and parses that file, constructing one or more symbol trees for the outermost declarations encountered. It returns a reference to an array containing references to those trees. In case of errors during parsing, Parse_File returns 0.


    use CORBA::IDLtree;

    my $ref_to_array_of_outermost_declarations = CORBA::IDLtree::Parse_File("myfile.idl");

    $ref_to_array_of_outermost_declarations or die "File had syntax errors\n";
    foreach my $node (@$ref_to_array_of_outermost_declarations) {
        # Query $node->[TYPE] to find out what each node is;
        # use $node->[SUBORDINATES] according to the $node->[TYPE].
        # For example:
        if ($node->[CORBA::IDLtree::TYPE] == CORBA::IDLtree::MODULE) {
            foreach my $subnode @{$node->[CORBA::IDLtree::SUBORDINATES]}) {
                # Assuming your "sub process" codes your business logic:
        } elsif ($node->[CORBA::IDLtree::TYPE] == CORBA::IDLtree::...) {
            # And so on, decode and process all the types you need ...
            # For further details see the demo application in subdir demoapp.


A "thing" in the symbol tree can be either a reference to a node, or a reference to an array of references to nodes.

Each node is a six element array with the elements

      [1] => NAME
      [2] => SUBORDINATES
      [3] => ANNOTATIONS
      [4] => COMMENT
      [5] => SCOPEREF

The TYPE element, instead of holding a type ID number (see the following list under SUBORDINATES), can also be a reference to the node defining the type. When the TYPE element can contain either a type ID or a reference to the defining node, we will call it a type descriptor. Which of the two alternatives is in effect can be determined via the isnode function.

The NAME element, unless specified otherwise, simply holds the name string of the respective IDL syntactic item.

The SUBORDINATES element depends on the type ID:


Reference to an array of nodes (symbols) which are defined within the module or interface. In the case of INTERFACE, element [0] in this array will contain a reference to a further array which in turn contains references to the parent interface(s) if inheritance is used, or the null value if the current interface is not derived by inheritance. Element [1] is the "local/abstract" flag which is ABSTRACT for abstract interfaces, or LOCAL for interfaces declared local.


Reference to the node of the full interface declaration.


Reference to an array of node references representing the member components of the struct or exception. Each member representative node is a quintuplet consisting of (TYPE, NAME, <dimref>, ANNOTATIONS, COMMENT). The <dimref> is a reference to a list of dimension numbers, or is 0 if no dimensions were given.


Similar to STRUCT/EXCEPTION, reference to an array of nodes. For union members, the member node has the same structure as for STRUCT/EXCEPTION. However, the first node contains a type descriptor for the discriminant type. The switch node does not follow the usual quadruplet structure of members; it is a single item. The TYPE of a member node may also be CASE or DEFAULT. When the TYPE is CASE or DEFAULT, this means that the following member node will be the union branch controlled by the CASE or DEFAULT. For CASE, the NAME is unused, and the SUBORDINATES contains a reference to a list of the case values for the following member node. For DEFAULT, both the NAME and the SUBORDINATES are unused.


Reference to an array describing the enum value literals. Each element in the aray is a reference to a triplet (three element array): The first element in the triplet is the enum literal value. The second element is a reference to an array of annotations as described in the ANNOTATIONS documentation (see below). The third element is a reference to the trailing comment list.


Reference to a two-element array: element 0 contains a reference to the type descriptor of the original type; element 1 contains a reference to an array of dimension expressions, or the null value if no dimensions are given. When given, the dimension expressions are plain strings.


As a special case, the NAME element of a SEQUENCE node does not contain a name (as sequences are anonymous types), but instead is used to hold the bound number. If the bound number is 0 then it is an unbounded sequence. The SUBORDINATES element contains the type descriptor of the base type of the sequence. This descriptor could itself be a reference to a SEQUENCE defining node (that is, a nested sequence definition.)


Bounded strings are treated as a special case of sequence. They are represented as references to a node that has BOUNDED_STRING or BOUNDED_WSTRING as the type ID, the bound number in the NAME, and the SUBORDINATES element is unused.


Reference to a two-element array. Element 0 is a type descriptor of the const's type; element 1 is a reference to an array containing the RHS expression symbols.


Reference to a two-element array. Element 0 contains the digit number and element 1 contains the scale factor. The NAME component in a FIXED node is unused.


Uses the following structure:

      [0] => $is_abstract (boolean)
      [1] => reference to a tuple (two-element list) containing
             inheritance related information:
             [0] => $is_truncatable (boolean)
             [1] => \@ancestors (reference to array containing
                    references to ancestor nodes)
      [2] => \@members: reference to array containing references
             to tuples (two-element lists) of the form:
             [0] => 0|PRIVATE|PUBLIC
                    A zero for this value means the element [1]
                    contains a reference to a declaration, such
                    as a METHOD or ATTRIBUTE.
                    In case of METHOD, the first element in the
                    method node subordinates (i.e., the return
                    type) may be FACTORY.
                    However, unlike interface methods, the last
                    element is _not_ a reference to the 'raises'
                    list.  Support for 'raises' of valuetype
                    methods may be added in a future version.
             [1] => reference to the defining node.
                    In case of PRIVATE or PUBLIC state member,
                    the SUBORDINATES of the defining node
                    contains a dimref (reference to dimensions
                    list, see STRUCT.)

Reference to the defining type node.


Reference to the node of the full valuetype declaration.


Subordinates unused.


Reference to a two-element array; element 0 is the read- only flag (0 for read/write attributes), element 1 is a type descriptor of the attribute's type.


Reference to a variable length array; element 0 is a type descriptor for the return type. Elements 1 and following are references to parameter descriptor nodes with the following structure:

      elem. 0 => parameter type descriptor
      elem. 1 => parameter name
      elem. 2 => parameter mode (IN, OUT, or INOUT)

The last element in the variable-length array is a reference to the "raises" list. This list contains references to the declaration nodes of exceptions raised, or is empty if there is no "raises" clause.


Reference to an array of nodes (symbols) which are defined within the include file. The Name element of this node contains the include file name.


Subordinates unused.


Version string.


ID string.


This is for the general case of pragmas that are none of the above, i.e. pragmas unknown to IDLtree. The NAME holds the pragma name, and SUBORDINATES holds all further text appearing after the pragma name.


The NAME of the node contains the starting line number of the comment text. The SUBORDINATES component contains a reference to a list of comment lines. The comment lines are not newline terminated. The source line number of each comment line can be computed by adding the starting line number and the array index of the comment line. By default, REMARK nodes will not be generated; generation of REMARK nodes can be enabled by setting the $enable_comments global variable to non zero.

The ANNOTATIONS element holds the reference to an array of annotation nodes if IDL4 style annotations are present (if no annotations are present then the ANNOTATIONS element holds 0). Each entry in this array is an array reference. The first element in the array referenced is a reference to an entry in @annoDefs (see comments at declaration of @annoDefs). The following elements contain the concrete values for the parameters, in the order as defined by the entry in @annoDefs. If the user omitted the value of the parameter then the default as specified by the entry in @annoDefs is filled in.

The COMMENT element holds the comment text that follows the IDL declaration on the same line. Usually this is just a single line. However, if a multi- line comment is started on the same line after a declaration, the multi-line comment may extend to further lines - therefore we use a list of lines. The lines in this list are not newline terminated. The COMMENT field is a reference to a tuple of starting line number and reference to the line list, or contains 0 if no trailing comment is present at the IDL item.

The SCOPEREF element is a reference back to the node of the module or interface enclosing the current node. If the current node is already at the global scope level then the SCOPEREF is 0. Special case: For a reopened module, the SCOPEREF points to the previous opening of the same module. In case of multiple reopenings, each reopening points to the previous opening. The SCOPEREF of the initial module finally points to the enclosing scope. All nodes have this element except for the parameter nodes of methods and the component nodes of structs/unions/exceptions.


Variables that can be set by client code


Paths where to look for included IDL files.


Symbol definitions for preprocessor.


Values 0 or 1, default 0. By default, do not cache trees of #included files.


Values 0 or 1, default 0. By default, do not generate REMARK nodes.


Values 0 or 1, default 0. Change struct into equivalent valuetype


Values 0 or 1, default 0. Change valuetype into equivalent struct


Values 0 or 1, default 0. Print cache statistics


Values 0 or 1, default 0. Switch on support for IDL long double.


Values 0 or 1, default 1. Switch off permission that a union's default branch may be empty.


Value 1 will remove the leading underscore. Value 2 will preserve the leading underscore.


Values 0 or 1, default 0. By default, misuse of IDL keywords as identifiers is a hard error.

Variables written by CORBA::IDLtree

These are to be considered read-only from outside:


Cumulative number of errors for a Parse_File call.


Copy of filename passed into most recent call of sub Parse_File


Constants for accessing the elements of a node

Constants for indexing the elements of a node

As explained in STRUCTURE OF THE SYMBOL TREE, each node is represented as a six element array. These constants are intended for indexing the array:

     sub TYPE ()         { 0 }
     sub NAME ()         { 1 }
     sub SUBORDINATES () { 2 }
     sub MODE ()         { 2 }
     sub ANNOTATIONS ()  { 3 }
     sub COMMENT ()      { 4 }
     sub SCOPEREF ()     { 5 }

The constant MODE is an alias of SUBORDINATES for method parameter nodes.

Method parameter modes
      sub IN ()    { 1 }
      sub OUT ()   { 2 }
      sub INOUT () { 3 }
Meanings of the TYPE entry in the symbol node
      sub NONE ()            { 0 }   # error/illegality value
      sub BOOLEAN ()         { 1 }
      sub OCTET ()           { 2 }
      sub CHAR ()            { 3 }
      sub WCHAR ()           { 4 }
      sub SHORT ()           { 5 }
      sub LONG ()            { 6 }
      sub LONGLONG ()        { 7 }
      sub USHORT ()          { 8 }
      sub ULONG ()           { 9 }
      sub ULONGLONG ()       { 10 }
      sub FLOAT ()           { 11 }
      sub DOUBLE ()          { 12 }
      sub LONGDOUBLE ()      { 13 }
      sub STRING ()          { 14 }
      sub WSTRING ()         { 15 }
      sub OBJECT ()          { 16 }
      sub TYPECODE ()        { 17 }
      sub ANY ()             { 18 }
      sub FIXED ()           { 19 }  # node
      sub BOUNDED_STRING ()  { 20 }  # node
      sub BOUNDED_WSTRING () { 21 }  # node
      sub SEQUENCE ()        { 22 }  # node
      sub ENUM ()            { 23 }  # node
      sub TYPEDEF ()         { 24 }  # node
      sub NATIVE ()          { 25 }  # node
      sub STRUCT ()          { 26 }  # node
      sub UNION ()           { 27 }  # node
      sub CASE ()            { 28 }
      sub DEFAULT ()         { 29 }
      sub EXCEPTION ()       { 30 }  # node
      sub CONST ()           { 31 }  # node
      sub MODULE ()          { 32 }  # node
      sub INTERFACE ()       { 33 }  # node
      sub INTERFACE_FWD ()   { 34 }  # node
      sub VALUETYPE ()       { 35 }  # node
      sub VALUETYPE_FWD ()   { 36 }  # node
      sub VALUETYPE_BOX ()   { 37 }  # node
      sub ATTRIBUTE ()       { 38 }  # node
      sub ONEWAY ()          { 39 }  # implies "void" as the return type
      sub VOID ()            { 40 }
      sub FACTORY ()         { 41 }
      sub METHOD ()          { 42 }  # node
      sub INCFILE ()         { 43 }  # node
      sub PRAGMA_PREFIX ()   { 44 }  # node
      sub PRAGMA_VERSION ()  { 45 }  # node
      sub PRAGMA_ID ()       { 46 }  # node
      sub PRAGMA ()          { 47 }  # node
      sub REMARK ()          { 48 }  # node
      sub NUMBER_OF_TYPES () { 49 }

The constant FACTORY can only occur as the return type of a method in a valuetype.

Interface/valuetype flag values
      sub ABSTRACT      { 1 }
      sub LOCAL         { 2 }
      sub TRUNCATABLE   { 2 }
      sub CUSTOM        { 3 }
Valuetype member flags
      sub PRIVATE       { 1 }
      sub PUBLIC        { 2 }



Parses the file name given as argument. Returns reference to array of nodes representing the top level (global) declarations in the file. Returns 0 if the file had syntax errors. Parse_File writes the error messages to STDERR.


Symbol tree dumper (for debugging etc.) reconstructs the IDL source notation from the parsed symbol tree. Parameters:

  1. Reference to a symbol array (return value from a previous call to Parse_File).

  2. Optional parameter controlling the output:

    • If given as string then it is the name of a file into which to dump the IDL source.

    • If given as array reference then the IDL source will be placed in the referenced array, one line per element, where each line is not newline terminated.

    • If the optional parameter is not given or is given as undef then the IDL source will be dumped to STDOUT.


Given a node reference, returns the type constant if the node prepresents an elementary type. Returns 0 if the type is not elementary.


Given a type name (as string), returns the type constant if the type name is that of an elementary type. Returns 0 if the type is not elementary.


Given a "thing", returns 1 if it is a reference to a node, 0 otherwise.


Given a "thing", returns 1 if it's a ref to a MODULE, INTERFACE, or INCFILE node.


Looks up a name in the symbol tree(s) constructed so far. Returns the node ref if found, else 0.


Given a type descriptor, returns the type as a string in IDL syntax.


Call this to make the parser tell us what it's doing.


Determine if typeid is of given type, recursing through TYPEDEFs.


Get the original type of a TYPEDEF, i.e. recurse through all non array TYPEDEFs until the original type is reached.


Return 1 if the given type constant or node is a pragma.


Returns an array with the names of files #included.


Get default value for type. Uses comment directives object if available.


Splits a given IDL expression into its individual tokens. Returns the tokens as a list. Example: The call

    idlsplit("(m_a::myconst+1.0) / scale")

returns the list

    "(", "m_a::myconst", "+", "1.0", ")", "/", "scale"


Returns 1 if the argument is a valid IDL identifier.


Expects a symbol node as the input argument and returns its fully qualified name in IDL syntax.


Utility for collecting #included files. Parameters:

  1. Reference to node list to analyze.

  2. Reference to hash in which to add the includefile names encountered. The includefile names are added as key fields of the hash. The value fields are not used.


Computes numeric value of expression.


The SUBORDINATES of ENUM contains more than just the actual enum literal values (the additional data are: annotations, trailing comments). This is a convenience subroutine which returns the net literals of the given $enumnode[SUBORDINATES].


Oliver M. Kellogg, <okellogg at users.sourceforge.net>


Please report any bugs or feature requests to bug-corba-idltree at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=CORBA-IDLtree. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.


You can find documentation for this module with the perldoc command.

    perldoc CORBA::IDLtree

You can also look for information at:


Thanks to Heiko Schroeder for contributing.


Copyright (C) 1998-2018, Oliver M. Kellogg

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.