The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

c2ast - C source analysis

VERSION

version 0.37

SYNOPSIS

 c2ast.pl [options] [file ...]

 Options:
   --help               Brief help message
   --cpp <argument>     cpp executable. Default is 'cpp'.
   --cppfile <filename> The name of the file being preprocessed.
   --cppdup <filename>  Save the preprocessed output to this filename.
   --lexeme <lexeme>    Lexemes of interest.
   --progress           Progress bar with ETA information.
   --check <checkName>  Perform some hardcoded checks on the code.
   --dump               Dump parse tree value on STDOUT.
   --dumpfile <file>    Dump parse tree value to this named file.
   --allowAmbiguity     Allow more than a single parse tree value.
   --loglevel <level>   A level that has to be meaningful for Log::Log4perl, typically DEBUG, INFO, WARN, ERROR, FATAL or TRACE.
   --logstderr          Logs to stderr or not.

  Aliased options:
   --debug              Alias to --loglevel DEBUG
   --info               Alias to --loglevel INFO
   --warn               Alias to --loglevel WARN
   --error              Alias to --loglevel ERROR
   --fatal              Alias to --loglevel FATAL
   --trace              Alias to --loglevel TRACE

  Advanced options:
   --lazy               Instruct the parser to try all alternatives on typedef/enum/identifier
   --typedef <typedef>  Comma separated list of known typedefs
   --enum <enums>       Comma separated list of known enums
   --start <startRule>  Start rule in the grammar.
   --nocpp              Do not preprocess input file, but take it as is.

If file is '-' it is assumed to refer to STDIN handle.

DESCRIPTION

This script will use Marpa::R2 to analyse the file given in argument.

A first phase will always call the preprocessor, so you need to have one on your machine. Default is 'cpp', and be overwriten on the command-line.
Then the output of the preprocessor goes through a lexing phase, using an 2011 ISO ANSI C compliant grammar.
Finally, if you ask via the command-line to have a dump of the parse tree value(s), or to perform some checks on the your code, the parse tree is evaluated.

Say --help on the command-line to have the full list of options, and examples.

NAME

c2ast.pl - C source code transformation to AST and eventual check of C Programming Best Practices

OPTIONS

--help

This help

--cpp argument

cpp executable. Default is 'cpp'.

If your setup requires additional option, then you should repeat this option. For example: your cpp setup is "cl -E". Then you say:

 --cpp cl --cpp -E

Take care: it has been observed that "cpp" output could be different than "compiler -E". If c2ast complains and the output manifestly reports something that has not been preprocessed corrected, then retry with: --cpp your_compiler --cpp your_compiler_option

This has been observed on Darwin for instance, where one have to say:

--cpp gcc --cpp -E

--cppfile filename

The name of the file being preprocessed. Usually this option is not necessary. By default this is the main file being pre-processed, as indicated by the first preprocessor line directive. (Preprocessor line directives start are lines starting with '#line'.) For the --lexeme tracing phase or the --check phase, c2ast includes only the information that is relevant to the lexemes contained in the "cppfile".

One circumstance where this option is necessary if when the C files are not the source files, but were generated by another program. For example, the input file might be named 'generated.c', but the actual source, from which 'generated.c' was generated, might be in a file named 'source.c'. If convention is being followed, 'generated.c' will contain lines of the form, to indicate which portions of the code originally came from 'source.c'.

  # line xxx "source.c"

You can tell c2ast.pl to analyze the code originally from "source.c", as indicated by the preprocessor line directives, with the option

 --cppfile "source.c"
--cppdup filename

Save the preprocessed output to this filename. Only useful for debugging c2ast.

--lexeme lexeme

Lexemes of interest. Look to the grammar to have the exhaustive list. In practice, only IDENTIFIER, TYPEDEF_NAME, ENUMERATION_CONSTANT and STRING_LITERAL_UNIT are useful. An internal lexeme, not generated by Marpa itself also exist: PREPROCESSOR_LINE_DIRECTIVE. This option must be repeated for every lexeme of interest. Giving a value __ALL__ will make all lexemes candidates for logging. The output will go to STDOUT.

--progress

Progress bar with ETA information. The "name" associated with the progress bar will the last of the arguments unknown to c2ast. So it is quite strongly suggested to always end your command-line with the file you want to analyse.

--check checkName

Perform some hardcoded checks on the code. Supported values for checkName are:

reservedNames

Check IDENTIFIER lexemes v.s. Gnu recommended list of Reserved Names [1].

Any check that is not ok will print on STDERR.

--dump

Dump parse tree value on STDOUT.

--dumpfile file

Dump parse tree value to this named file.

Take care: dumping the parse tree value can hog your memory and CPU. This will not be c2ast fault, but the module used to do the dump (currently, Data::Dumper).

--allowAmbiguity

Default is to allow a single parse tree value. Nevertheless, if the grammar in use by c2ast has a hole, use this option to allow multiple parse tree values. In case of multiple parse tree values, only the first one will be used in the check phase (option --check).

--loglevel level

A level that has to be meaningful for Log::Log4perl, typically DEBUG, INFO, WARN, ERROR, FATAL or TRACE. Default is WARN.

Note that tracing Marpa library itself is possible, but only using environment variable MARPA_TRACE /and/ saying --loglevel TRACE.

In case of trouble, typical debugging phases c2ast are: --loglevel INFO then: --loglevel DEBUG then: --loglevel TRACE

--debug

Shortcut for --loglevel DEBUG

--info

Shortcut for --loglevel INFO

--warn

Shortcut for --loglevel WARN

--error

Shortcut for --loglevel ERROR

--fatal

Shortcut for --loglevel FATAL

--trace

Shortcut for --loglevel TRACE

--logstderr

Logs to stderr or not. Default is $logstderr.

--lazy

Instruct the parser to try all alternatives on typedef/enum/identifier. Please refer to MarpaX::Languages::C::AST documentation for its new() method. Default is a false value.

--typedef typedefs

Comma separated list of known typedefs. Please refer to MarpaX::Languages::C::AST documentation for its new() method. Default is an empty list.

--enum enums

Comma separated list of known enums. Please refer to MarpaX::Languages::C::AST documentation for its new() method. Default is an empty list.

--start startRule

Start rule in the grammar. This requires knowledge of the C grammar itself. Default is an empty string.

--nocpp

Do not preprocess input file, but take it as is. When this option is used, --lazy is highly recommended, and input file must be the last argument. It is highly probable that the input will not parse nevertheless, as soon as it contains constructs that deviate too much from the C grammar. Default is a false value.

For example:

  #include <sys/types.h>
  #include <sys/stat.h>
  #include <unistd.h>
  int func1(size_t size) {
  }

will never be parsed without cpp, i.e.:

 c2ast --nocpp /tmp/test.c

because of size_t. But the lazy option will make it work, because size_t will be injected as an acceptable alternative for TYPEDEF_NAME and IDENTIFIER:

 c2ast --nocpp --lazy /tmp/test.c

If you run with the DEBUG loglevel, you will see an explanation of the successful parsing:

 c2ast --nocpp --lazy --loglevel DEBUG /tmp/test.c
 ./..
 DEBUG  13370 [parseIsTypedef] "size_t" at scope 1 is a typedef? no
 DEBUG  13370 [parseIsEnum] "size_t" is an enum at scope 1? no
 DEBUG  13256 [_doPauseBeforeLexeme] Pushed alternative TYPEDEF_NAME "size_t"
 DEBUG  13256 [_doPauseBeforeLexeme] Failed alternative ENUMERATION_CONSTANT "size_t"
 DEBUG  13256 [_doPauseBeforeLexeme] Pushed alternative IDENTIFIER "size_t"

Here you see clearly that lazy option tried TYPEDEF_NAME, ENUMERATION_CONSTANT and IDENTIFIER. The grammar natively rejected ENUMERATION_CONSTANT because this is not expected at this stage. A hint on typedef, useless here, would have nevertheless prevented lazy mode to try to push the ENUMERATION_CONSTANT alternative:

 c2ast --nocpp --lazy --loglevel DEBUG --typedef size_t /tmp/test.c
 ./..
 DEBUG  13378 [parseIsTypedef] "size_t" at scope 1 is a typedef? yes
 DEBUG  13378 [_doPauseBeforeLexeme] Pushed alternative TYPEDEF_NAME "size_t"
 DEBUG  13378 [_doPauseBeforeLexeme] Pushed alternative IDENTIFIER "size_t"

But doing a wrong hint, saying size_t is an enum will imply a parse failure, because ENUMERATION_CONSTANT is not expected at this stage, and even if IDENTIFIER is possible, the rest of the input source is invalidating it:

 c2ast --nocpp --lazy --loglevel DEBUG --enum size_t /tmp/test.c
 ./..
 DEBUG  13384 [parseIsTypedef] "size_t" at scope 1 is a typedef? no
 DEBUG  13384 [parseIsEnum] "size_t" is an enum at scope 1? yes
 DEBUG  13384 [_doPauseBeforeLexeme] Failed alternative ENUMERATION_CONSTANT "size_t"
 DEBUG  13384 [_doPauseBeforeLexeme] Pushed alternative IDENTIFIER "size_t"
  ./..
 FATAL  13384 Error in SLIF parse: No lexemes accepted at line 5, column 18
   Rejected lexeme #0: Lexer "L0"; ENUMERATION_CONSTANT; value="size"; length = 4
   Rejected lexeme #1: Lexer "L0"; TYPEDEF_NAME; value="size"; length = 4
   Rejected lexeme #2: Lexer "L0"; IDENTIFIER; value="size"; length = 4
   Rejected lexeme #3: Lexer "L0"; IDENTIFIER_UNAMBIGUOUS; value="size"; length = 4
 * String before error: /stat.h>\n#include <unistd.h>\n\nint func1(size_t\s
 * The error was at line 5, column 18, and at character 0x0073 's', ...
 * here: size) {\n}\n\n
 Marpa::R2 exception at lib/MarpaX/Languages/C/AST/Impl.pm line 107.
 Last position:
 line:column 5:11 (Unicode newline count) 5:11 (\n count)
 int func1(size_t size) {
 ----------^

In conclusion, the options --nocpp and --lazy, even with --typedef or --enum hints, should be rarelly be used, unless your engine is prepared to hand over failure. The cpretty program, for instance, is doing so.

Any option not documented upper will be considered as a cpp option, and sent to the underlying the cpp program. A restriction is that the filename must be the last argument.

EXAMPLES

Examples:

 c2ast.pl                   -D MYDEFINE1 -D MYDEFINE2 -I       /tmp/myIncludeDir            /tmp/myfile.c

 c2ast.pl                   -D MYDEFINE1 -D MYDEFINE2 -I       /tmp/myIncludeDir            /tmp/myfile.c --lexeme IDENTIFIER --lexeme TYPEDEF_NAME

 c2ast.pl --cpp cl --cpp -E -D MYDEFINE1 -D MYDEFINE2 -I C:/Windows/myIncludeDir C:/Windows/Temp/myfile.c

 c2ast.pl                   -D MYDEFINE1 -D MYDEFINE2 -I       /tmp/myIncludeDir            /tmp/myfile.c --progress --check reservedNames

Less typical usage:

 c2ast.pl -I libmarpa_build --cpp gcc --cpp -E --cppfile ./marpa.w  --progress --check reservedNames libmarpa_build/marpa.c

SEE ALSO

Reserved Names - The GNU C Library

MarpaX::Languages::C::AST

AUTHOR

Jean-Damien Durand <jeandamiendurand@free.fr>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Jean-Damien Durand.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.