The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

SAS::Parser - Parse a SAS program file

SYNOPSIS

 use SAS::Parser;
 $p = new SAS::Parser;
 $p->parse_file('mysas.sas');         # returns a SAS::Parser object

or

 $file = shift @ARGV;
 $p->parse_file($file, {options});

After parsing, you can access the information stored in the SAS::Parser object as follows:

 @procs = $p->procs();               # get list of procs called
 @datasets = $p->datasets();         # get list of datasets created
 $macros = $p->macros();             # get string of macros called

DESCRIPTION

SAS::Parser is a base tool for use in writing applications which deal with .sas programs. It can be used as a documentation tool, e.g., to extract lists of procedures used, data sets created, macros used, etc., and produce a nicely formatted header in a consistent format, or to produce standard documentation headers for SAS macros. It can also be used as a pre-processor to a SAS code formatter, to produce WWW documents, etc. It is not likely to be useful as a SAS syntex checker without a good deal of additional work. It does as reasonable a job on SAS macros as can be expected without being an actual macro processor.

I had written a large number of specialized scripts for some of these tasks, and found that I was re-doing similar stuff each time. SAS::Parser is an attempt to bring this to the next level, where the basic statement parsing can be assumed, and your application can just work with the info extracted.

It's just a beginning, and all the rest depends on writing Perl code making use of SAS::Parser to accomplish such tasks. See SAS::Header for one such extension.

So, what does it actually do?

Any parser works by segmenting text into 'interesting units' for the purpose at hand.

SAS::Parser parses a SAS program into statements when the parse() or parse_file() methods are called. Each statement is classified as a statement type, and further parsed depending on that statement type. Information about libnames, filenames, data sets created, procs called, macros called, and macros defined is stored in the SAS::Parser object.

In addition, the parsed description of each statement selected by the stored option (its type, the statement name, and statement text) may be stored in an array for further processing.

Presently, we just collect the information from the SAS program. To do more interesting things, one should define sub-classes for more specialized tasks. See, for example, SAS::Header. These can add items to the object structure, which, like Topsy, just grows.

USAGE

The external interface to SAS::Parser is:

$p = new SAS::Parser;

Create a new, but empty SAS::Parser object. The object constructor takes no arguments.

$p->parse( $string, \%options );

Parse the $string as a SAS program. The $string argument is typically a series of lines (separated by \n) read from a file. The parse() method may be called several times with different chunks of a large file, or with lines read from different files. The parse() method does most of the work, but most applications directly use the parse_file() method, which in turn calls parse() with the text of a file. The return value is a reference to the parser object.

$p->parse_file( $file, \%options );

This method can be called to parse text from a file. The argument can be a filename or an already opened file handle. The return value from parse_file() is a reference to the parser object.

On Unix systems, parse_file() also attempts to locate and parse the autoexec.sas file, in order to locate pre-defined libname and filename statements which may be referenced in the SAS program.

OPTIONS

The parse() and parse_file() methods take the following options as an optional second argument. All options are included as a hash of (option_name, option_value) pairs.

doincludes

Setting doincludes=>1 (non-zero) causes the parser to insert the text of included files (%include statements) in the input stream at that point, if the included file can be read. In this case, line numbers refer to the total stream, not individual files.

trim

Setting trim=>1 (non-zero) causes each statement to be trimmed of leading/trailing whitespace, and all internal C-style comments (/* ... */) to be removed before the statement is stored or printed.

store

The store option specifies either 'ALL', or 'NONE', or a list of statement types whose contents and descriptors are stored in the SAS:Parser object. The default is store = qw(data proc).

For example, to store all data and proc statements, use

 $p->parse_file($file, {store=>qw(data proc)});

For each stored statement, the SAS::Parser object stores a list of the following 5 elements:

 ($lineno, $step, $type, $stmt, $statement)

The parse_file() method uses the following call to parse the autoexec.sas file silently, storing no statements (but recording filename and libname information):

  $self -> parse($auto, {silent=>1, store=>qw(none)}) if $auto;
print

The print option specifies either 'ALL', or 'NONE', or a list of statement types whose contents and descriptors are printed as they are parsed. The default here, print = qw(data proc) prints information about each data and proc step. This option is mainly used for debugging or testing.

silent

Setting silent=>1 (non-zero) suppresses the printout of statements as they are parsed. This is equivalent to setting the print option to 'NONE'.

Methods

The following methods are available in the SAS:Parser class. Except for the output() method, they all work as both constructors and accessors. If called with an argument, that argument is added to the corresponding entry in the SAS:Parser object. If called with no argument, they return that entry.

As a convenience, the accessors which ordinarily return lists (e.g., procs(), macros(), datasets(), etc.) will return a blank-separated string if called in a scalar context, or an array if called in a list context. (But note that "print $p->procs();" supplies a list context.)

The items for all these lists are stored and returned in the order found in the file(s) parsed. To use or print these in a sorted order, use the sort() function (which also supplies a list context).

$p->procs('means')

Appends the named procedure to the list of procedures called. The constructor use of these methods is used internally during parsing.

$p->procs();

Returns a list of the unique names of procedures called in PROC statements or a blank-separated string in a scalar context. The list accessor functions such as this are used as follows:

   my @procs = $p->procs();             # list context
   print "procs called: ", join(', ', @procs), "\n" if scalar @procs;

or

   my $procs = $p->procs();             # scalar context
   print "procs called: $procs\n" if $procs;
$p->macros();

Returns a list of the unique names of macros invoked explicitly in the form %macname [(args);] or a blank-separated string in a scalar context. This does include macros invoked as part of %let other statements, e.g., %let nv = %nvar(&vars);, but not other macro statements.

$p->macdefs();

Returns a list of the unique names of macros defined or a blank-separated string in a scalar context.

$p->datasets();

Returns a list of the unique names of datasets created in DATA statements or a blank-separated string in a scalar context. Output datasets created by procedures are not tracked.

$p->includes();

Returns a list of the unique names of included files from %include statements or a blank-separated string in a scalar context.

$p->modules();

Returns a list of the unique names of IML modules defined or a blank-separated string in a scalar context.

$p->libnames();

Returns a hash of the names of SAS libraries defined. The key for each element of the hash is the libref, and the corresponding value is a string containing the folder or directory name.

The libnames and corresponding directory names (if any) may be printed as follows:

 my %libnames = $p->libnames();
 while (($libref,$value) = each %libnames) {
        print "  libname: $libref=$value\n";
 }
$p->filenames();

Returns a hash of the names of SAS filenames defined. Non-disk filenames (pipe, printer, tape, etc) are ignored. The key for each element of the hash is the fileref, and the corresponding value is a string containing the filename, or a folder or directory name, or a blank-separated list of folder/directory names (for a filename aggregate).

$p->stored();

Returns a list-of-lists of the SAS statements stored, which consists of all statements whose type matches the store option.

$p->eof(1)

Sets an end-of-file condition which terminates parsing after the current statement has been processed. The eof() method may be used by a sub-class of SAS::Parser to end the parsing after the required information has been extracted.

$p->output($lineno, $step, $type, $stmt, $statement)

This method is used to produce output from the parser as each statement is parsed. The default method provided in SAS::Parser simply prints the values of $step, $type, $stmt, and $statement. It uses a negative value of $lineno as a flag for initial processing. Sub-classes of SAS::Parser may override this method for other purposes.

For example, the following lines define a short SAS program as a here document, and parses it with SAS::Parser.

  use SAS::Parser;
  my $sascode = <<END;
  data test;
          do x=1 to 20;
                  y=x + normal(0);
                  output;
                  end;
  proc reg data=test;
          model y=x;
  proc means data=test;
          var y x;
  END
  ;

  my $p = new SAS::Parser;
  $p -> parse($sascode);

When run, this produces the following printed output:

 data data     test     data test;
 proc proc     reg      proc reg data=test;
 proc proc     means    proc means data=test;

Statement types

The parsing of each statement returns the variables $lineno, $step, $type, $stmt, and $statement, which may be printed by parser() and/or stored in the SAS::Parser object (depending on the options: silent, print, store).

$lineno is the source line number of the first line of the statement. $step is one of 'data', 'proc', or '' (for global statements outside of PROC or DATA steps. $type is a general statement type, $stmt sometimes gives a further keyword or name associated with the statement, and $statement is the actual text of the statement (possibly trimmed of whitespace and embedded /* comments */, depending on the trim option).

The statement $types currently used are:

?

parser() could not classify this statement.

assign

an assignment statement. $stmt contains the name of the variable assigned.

cards

cards; datalines;, etc.

ccomment

a C-style comment: /* ... */

data

a DATA statement. $stmt contains the name of the first data set mentionted

global

a SAS global statement: options, title, run, axis, etc. $stmt contains the statement keyword.

include

%include statement. The parser handles the forms %include 'path/filename';, %include fileref;, and %include fileref(file); where fileref was defined in a filename statement, possibly in the autoexec.sas file. If the fileref was defined, the name of the actual file is found, if the file exists.

lines

actual data lines following cards;

mcall

a macro call statement. $stmt contains the macro name.

mcomment

a macro comment statement: %* ... ;

mdef

a macro definition statement: %macro(). $stmt contains the macro name. $statement contains the text of the macro definition statement, including all arguments and default values.

mend

%mend statement

mstmt

some other macro statement: %display, %do, %else, %end, etc. $stmt contains the statement keyword.

null

null statement

proc

a PROC statement. $stmt contains the name of the procedure called.

scomment

a statement comment: * ... ;

stmt

some other SAS statement: all DATA step statements, and PROC step statements. $stmt contains the statement keyword.

Specialized parser methods

The following methods are available in the SAS:Parser class for specialized parsing of particular statement types, to extract or operate on additional information in a statement. They are designed so that they may be overridden for particular applications.

Those listed as NOOP do nothing here, except reserve a place for such additional processing. For example, you can override parse_mdef() to do further parsing of macro arguments.

$self->parse_assign($statement);

NOOP

$self->parse_ccomment($statement);

NOOP

$self->parse_data($statement);

Parse a data statement, finding all dataset names created, and storing these in $self->{datasets}. We don't bother distinguishing between permanent and temporary datasets, or store information about the SAS libraries referred to. We handle (implicit) _data_, as in data;, but don't resolve these to DATA1, DATA2, etc.

$self->parse_filename($statement);

Parse a filename statement to determine fileref and corresponding folder(s).

$self->parse_global($statement);

NOOP

C$file = $self->parse_include($statement);

Parse a %include statement to determine pathname of included file(s). For this to work, we must have seen and parsed the filename statements for any %include fileref; or %include fileref(file);. We don't actually include the file, but leave that to the higher-ups.

Returns: the resolved pathname of the included file, if it exists.

$self->parse_libname($statement);

Parse a libname statement to determine libref and corresponding folder

$self->parse_mcall($statement);

NOOP

$self->parse_mdef($statement);

NOOP

$self->parse_mend($statement);

NOOP

$self->parse_module($statement);

NOOP

$self->parse_mstmt($statement);

Parse a macro statement. As implemented here, this just looks for user-defined macro functions invoked in a %let statement, e.g.,

  %let nv = %words(&vars);

This will add %words to the list of macros called.

$self->parse_proc($statement);

NOOP

$self->parse_stmt($statement);

NOOP

Other routines

The following subroutines are exported by default.

&find_autoexec()

Find the autoexec.sas file, and return its pathname if found, else return undef. If the environment variable SAS_OPTIONS defines -autoexec, we look there first. Otherwise, we search the current directory, the user's HOME directory, or a directory specified by the environment variable SASROOT, in that order.

$new = &protect_special($text, ['char'] ['replace']);

Protect special characters from the parser by remapping them into some other string.

$text = &readfile($file)

Read a file, given filename (complete path) or filehandle (assumed open). Returns the file contents or undef if not found.

ENVIRONMENT

Uses SAS_OPTIONS and SASROOT to locate autoexec.sas.

BUGS and LIMITATIONS

  • parse() does not handle certain types of complex macros particularly well. When %do...; stuff %end; is used inside another statement to generate conditional code, that text, up to the next ';' is appended appropriately to the current statement. In other cases, it may fail, returning '?' as the statement type, because it's a static parser, not a true macro interpreter. In these cases, the parser swallows text up to the next ';' as the current statement, and soldiers on. Following statements are parsed correctly.

  • The logic used to handle ';' inside quoted strings is fooled by unmatched quotes, even those inside comments. For example,

            *--don't expect this comment to parse correctly;
  • There are still some problems with parsing line labels that look like statement types or keywords. For example, the macro statement

     %done: options notes;

    gets classified as a %do statement.

SEE ALSO

SAS::Header, SAS::Index

AUTHOR

Michael Friendly, friendly@yorku.ca

COPYRIGHT

Copyright 1999- Michael Friendly. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

7 POD Errors

The following errors were encountered while parsing the POD:

Around line 110:

=back doesn't take any parameters, but you said =back 4

Around line 169:

=back doesn't take any parameters, but you said =back 4

Around line 308:

=back doesn't take any parameters, but you said =back 4

Around line 409:

=back doesn't take any parameters, but you said =back 4

Around line 497:

=back doesn't take any parameters, but you said =back 4

Around line 524:

=back doesn't take any parameters, but you said =back 4

Around line 561:

=back doesn't take any parameters, but you said =back 4