Author image Alessandro Zummo

NAME

Geo::E00 - Perl extension for reading Esri-E00 formats

SYNOPSIS

  use Geo::E00;

  $e00 = new Geo::E00;

  $e00->open($file);

  $e00data = $e00->parse();

  $arcdata = $e00data->{arc};

  print "Arcpoint";
  foreach $arc (@$arcdata) {
        print $arc->{npoints},"\n";
  }

STRUCTURES

ARC
                        $arc = {
                                'cov-num'       => SCALAR,
                                'cov-id'        => SCALAR,
                                'node-from'     => SCALAR,
                                'node-to'       => SCALAR,
                                'poly-left'     => SCALAR,
                                'poly-right'    => SCALAR,
                                'npoints'       => SCALAR,
                                'points'        => ARRAY,
                                'LENGTH'        => SCALAR,      <- From AAT
                                ....            => SCALAR,      <- From AAT
                        # containing x,y pairs
                                'coord' => ARRAY,
                        # pointing to 
                                        ->{x} and ->{y}
                        };
CNT
                        my $cnt = {
                                'cnt-id'        => SCALAR,
                                'x'             => SCALAR,
                                'y'             => SCALAR,
                        }
LAB
                        my $lab = {
                                'cov-id'        => SCALAR,
                                'poly-id'       => SCALAR,
                                'x'             => SCALAR,
                                'y'             => SCALAR,
                        };
LOG
                 $log = {
                        'year'          => SCALAR,
                        'month'         => SCALAR,
                        'day'           => SCALAR,
                        'hour'          => SCALAR,
                        'minute'        => SCALAR,
                        'connecttime'   => SCALAR,
                        'cputime'       => SCALAR,
                        'iotime'        => SCALAR,
                        'commandline'   => SCALAR,
                };
PAL
                        my $pal = {
                                'npoints'       => SCALAR,
                                'xmin'          => SCALAR,
                                'ymin'          => SCALAR,
                                'xmax'          => SCALAR,
                                'ymax'          => SCALAR,
                                'points'        => ARRAY,
                                # pointing to 
                                        {'arc-number'}  = SCALAR;
                                        {'node-number'} = SCALAR;
                                        {'polygon-number'} = SCALAR;
                        };

DESCRIPTION

Hereafter follows an

INTRODUCTION

Note: ESRI considers the export/import file format to be proprietary. As a consequence, the identified format can only constitute a "best guess" and must always be considered as tentative and subject to revision, as more is learned.

It appears that all ARC/INFO files except user-created lookup tables are exported, including .ACODE and .PCODE. =head1 OVERALL ORGANIZATION

The export file begins with a line with three fields.

        1-      an initial 'EXP'
        2-      what appears to be a constant of '0'
        3-      the pathname for the creation of the export file

The export file ends with a line beginning 'EOS'.

The ARC files are included first, in alphabetical order except for the SIN, LOG, and PRJ files which occur last. Then the INFO files are included in alphabetical order.

The beginning of each ARC file is indicated by the file name (a three-character identifier) followed by ' 2' for single-precision or ' 3' for double-precision. Single-precision carries 8 digits, and double-precision carries 15 digits.

Each ARC file ends with a line of seven numbers beginning with a -1 and followed by six zeros, except the SIN, LOG, and PRJ files which end in 'EOX', 'EOL', and 'EOP', respectively. The LAB file uses a slight variation of this -1 ending line (see below). The format for each ARC file is specific to that type of file. These formats are covered below.

The beginning of the INFO file section is indicated by 'IFO 2', and its end is indicated by 'EOI'. The INFO files each begin with the file name. For example, the polygon attribute table would be 'STDFIG24C.PAT' on a line by itself. The format is the same for every INFO file. This format is given below. =head1 ARC FILE FORMATS

Formats will be given for the most common ARC files:

        -       ARC
        -       CNT
        -       LAB
        -       LOG
        -       PAL
        -       PAR
        -       PRJ
        -       SIN
        -       TOL

1 ARC

The ARC (arc coordinates and topology) file consists of repeating sets of arc information. The first line of each set has seven numbers:

        1.      coverage#
        2.      coverage-ID
        3.      from node
        4.      to node
        5.      left polygon
        6.      right polygon
        7.      number of coordinates

The subsequent lines of a set are the coordinates with two x-y pairs per line, if the coverage is single-precision. If there are an odd number of coordinates, the last line will have only one x-y pair. Double-precision puts one coordinate pair on each line.

2 CNT

The CNT (Polygon Centroid Coordinates) file contains the centroid of each polygon in the coverage. It has sets of centroid information with an initial coordinate line and, if there are labels, one line per label giving the number for the label. The coordinate line has three fields:

        1-      number of labels in polygon
        2-      centroid x
        3-      centroid y

3 LAB

The LAB (label point coordinates and topology) file consists of repeating sets of label point information. The first line of each set has four numbers:

        1.      coverage-ID
        2.      polygon which encloses it
        3.      x coordinate
        4.      y coordinate

The second and final line of the set gives the label box window. This information is marked as marked as obsolete in the SDL documentation. It currently contains repetitions of the x and y coordinates.

4 LOG

The LOG (Coverage History) file contains a free form set of lines of indeterminate number which are separated by lines which begins with a tilde, "~".

ARC records many commands and their resource impacts in this file. The standard ARC format for writing in the LOG has nine fields:

        -       Year (I4)
        -       Month (I2)
        -       Day (I2)
        -       Hours (I2)
        -       Minutes (I2)
        -       Connect Time in minutes (I4)
        -       CPU Time in seconds (I6)
        -       I/O Time in seconds (I6)
        -       Command line (A100)

However, any information can be added to the LOG file in free-form format.

5 PAL

The PAL (Polygon Topology) file consists of repeating sets of polygon information. The first line of each set has five numbers:

        1.      number of arcs in polygon
        2.      x min of polygon
        3.      y min of polygon
        4.      x max of polygon
        5.      y max of polygon

The subsequent lines of a set give information on the arcs which comprise the polygon. There are three numbers per arc with information for two arcs per line.

        1.      the arc number (negative if reversed)
        2.      the node number
        3.      the polygon number

The first polygon given is the universal polygon.

"The PAL file contains the polygon topology for a coverage and min-max boxes for the polygons. For each polygon in a coverage the PAL file has a (usually) clockwise list of the arcs, nodes that comprise the polygons, as well as the adjacent polygons, and a min-max box. To keep a continuous list, 'virtual' arcs with arc# of 0 are used to connect to holes (thus forming donuts), which are connected in counter-clockwise order. The PAL file is a random access, variable record length file, with the length dependent on the number of arcs surrounding the polygon (1 to 10000).

The arc# in the PAL file is the record number of that arc within the coverage's ARC file, the node# is the same as the node# in the arc file at the appropriate end, and the polygon# is the record number of that polygon within the coverage's PAL file. The PAL file record number for a polygon is the same as the PAT file record number and the CNT file record number." SDL documentation, July 1989, p. 24.

6 PRJ

The PRJ (Projection Parameters) file consists of a set of projection keywords and values including a set of parameters following the keyword "Parameters".

This file needs further research for specific keywords and parameters for the projections supported by ADS and MOSS.

7 SIN

Spacial Index

It usually is comprised of a single line with the value "EOX".

8 TOL

This consists of ten lines with a tolerance type, a tolerance status, and a tolerance value on each line. The tolerance types are:

        1.      fuzzy
        2.      generalize (unused)
        3.      node match (unused)
        4.      dangle
        5.      tic match
        6.      undefined
        7.      undefined
        8.      undefined
        9.      undefined
        10.     undefined

The tolerance status "is set to 1 if the tolerance is verified (been applied to operations of the coverage) and to 2 if the tolerance is not verified (been set by the TOLERANCE command, but not yet used in processing)."

INFO FILE FORMATS

INFO files follow the same format:

        -       name of the info file and summary information
        -       definitions for each of the items
        -       actual data values

The name line consists of six fields:

        1.      name of the INFO file
        2.      appears to be flag for ARC/INFO table ('XX') or other INFO table ('  ')
        3.      number of items
        4.      appears to repeat number of items 
        5.      length of data record
        6.      number of data records

The definitions for each item consist of eight fields:

        1-      name of item
        2-      width of item followed by a constant of '-1'
        3-      start position of item followed by a constant of '4-1'
        4-      output format of item (see below for discussion)
        5-      type of item (see below for discussion)
        6-      appears to be constant of '-1'
        7-      appears to be constant of '-1-1'
        8-      appears to be sequential identifier

The output format field is handled differently for numeric and character items. Numeric items give the output width followed by a space then the number of decimal positions. Character items give the output width followed by a constant of '-1'.

The type of the item is specified by the following codes:

        -       20-1 indicates character
        -       50-1 indicates binary integer
        -       60-1 indicates real number

The other item types have not yet been identified.

Formats will be given for the most common INFO files:

        -       .AAT
        -       .ACODE
        -       .BND
        -       .PAT
        -       .PCODE
        -       .TIC

.AAT

The .AAT (Arc Attribute Table) contains seven fields whose item names are self-explanatory. However, additional items may be added as desired, after the -ID item.

.ACODE

The .ACODE (Arc Lookup Table) contains seven fields whose item names are the same (except the -ID) as that in the ADS files documentation. However, additional items should be able to be be added as desired, after the LABEL item.

.BND

The .BND (Coverage Min/Max Coordinates) table contains four fields whose item names are self-explanatory.

.PAT

The .PAT (Polygon or Point Attribute Table) contains four fields whose item names are self-explanatory. However, additional items may be added as desired, after the -ID item.

.PCODE

The .PCODE (Polygon Lookup Table) contains eight fields whose item names are the same (except the -ID) as that in the ADS files documentation. However, additional items should be able to be be added as desired, after the LABEL item.

.TIC

The .TIC (Tic Coordinates) table contains three fields whose item names are self-explanatory.

CONCLUSION

The content and format of the ARC EXPORT file seems to be straightforward in most cases. The remaining areas of uncertainty include:

        -       confirmation of the meaning of the second field in the identification line for INFO files ('XX' or '  ')

        -       the meaning of the 'SIN 2' section

        -       the precise format of the PRJ file for different projections

        -       possible variation in the '-1' suffixes of INFO definitions

        -       INFO codes for item types other than character, integer, and real.

However, none of these appears to be that serious, and the indicated formats should be used to identify any errors or limitations.

Because this information was derived from limited experimentation, it should be considered as tentative and subject to revision at any time.

CAUTION

Note: ESRI considers the export/import file format to be proprietary. As a consequence, the identified format can only constitute a "best guess" and must always be considered as tentative and subject to revision, as more is learned.

EXAMPLES

   use Geo::E00;

   $file = shift;

   $io = new Geo::E00;

   $io->open($file);

   $e00data = $io->parse();
   $arcdata = $e00data->{arc};
   $cntdata = $e00data->{cnt};

   print "Arcpoint";
   foreach $arc (@$arcdata) {
        print $arc->{npoints},"\n";
        print $arc->{LENGTH},"\n";
   }

   print "Arcpoint longitude (x) - latitude (y)";
   foreach $arc (@$arcdata) {
        foreach $ll (@{$arc->{coord}}) {
                print "Longitude $ll->{x}\n";
                print "Latitude  $ll->{y}\n";
        }
   }

TODO

Suggestions are welcome. ;)

HISTORY

0.01 2002/10/24 initial release.
0.02 2002/10/30 Added support for LAB section
0.03 2003/05/01 Added support for PAL, IFO, PRJ, log section, + documentation
0.04 2003/05/24 Bugs removed for negative x-y pairs and PAL section
0.05 2003/05/25 Some minor style changes and fixes

AUTHORS

 Alessandro Zummo <azummo dash e00perl at towertech dot it>
 Bert Tijhuis <B dot Tijhuis at inter dot nl dot net>

SEE ALSO

 Geo::E00 is released under the GPL end its development
 is funded by Tower Technologies (http://www.towertech.it). 

COPYRIGHT AND LICENCE

 Copyright (C) 2002-2003 Tower Technologies s.r.l. 
 Copyright (C) 2002-2003 Alessandro Zummo
 Copyright (C) 2003 Bert Tijhuis

 This package is free software and is provided "as is"
 without express or implied warranty. It may be used, modified,
 and redistributed under the same terms as Perl itself.