Author image Andrius Merkys
and 1 contributors


DBIx::MyParsePP::Lexer - Pure-perl SQL lexer based on MySQL's source


        use DBIx::MyParsePP::Lexer;
        use Data::Dumper;

        my $lexer = DBIx::MyParsePP::Lexer->new(
                string => $string
        while ( my $token = $lexer->yylex() ) {

                print Dumper $token;
                last if $token->type() eq 'END_OF_INPUT';
                print $lexer->pos();
                print $lexer->line();


DBIx::MyParsePP::Lexer is a translation of the lexer function from MySQL into pure Perl.

The goal of the translation was to closely follow the method of operation of the original lexer -- therefore performance is suffering at the expense of compatibility. For example, the original character set definitions are used, rather than determining which letter is uppercase or lowercase using a Perl regular expression.


The following arguments are available for the constructor. They are passed from DBIx::MyParsePP:

string is the string being parsed.

charset is the character set of the string. This is important when determining what is a number and what is a separator in the string. The default value is 'ascii', which is the only charset bundled with DBIx::MyParsePP by default. Please contact the author if you need support for other character sets.

version is the MySQL version to be emulated. This only affects the processing of /*!##### sql_clause */ comments, where ##### is the minimum version required to process sql_clause. The grammar itself is taken from MySQL 5.0.45, which is the default value of version.

sql_mode contains flags that influence the behavoir of the parser. Valid constants are MODE_PIPES_AS_CONCAT, MODE_ANSI_QUOTES, MODE_IGNORE_SPACE, MODE_NO_BACKSLASH_ESCAPES and MODE_HIGH_NOT_PRECEDENCE. The flags can be combined with the | operator. By default no flags are set.

client_capabilities is flag reflecting the capabilities of the client that issued the query. Currently the only flag accepted is CLIENT_MULTI_STATEMENTS, which controls whether several SQL statements can be parsed at once. By default no flags are set.

stmt_prepare_mode controls whether the statement being parsed is a prepared statement. The default is 0, however if this flag is set to 1, multiple SQL statements can not be parsed at once.


pos() and getPos() return the current character position as counted from the start of the string

getLine() and line() return the current line number.

getTokens() returns a reference to an array containing all tokens parsed so far.


This file contains code derived from code Copyright (C) 2000-2006 MySQL AB

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License in the file named LICENCE for more details.