Often times I find myself having to handle a particular portion
of a SQL statement via a script. I've written a lot of
specialized regular expressions over time to handle these tasks,
but the one that I've had to write the most is a basic
'tokenizer' which understands the quoting semantics of a MySQL
statement. That is, it understands `BackTick`, 'Single Quote' and
"Double Quoted" strings.
#PHP Heredoc syntax $regex = <<< END_OF_REGEX / [^ \"'`(),]*\([^)]*\) #match functions like concat(x,"y",`a`.`z`) or sum(`xyz`); |\([^)]*?\) #match grouped items |"[^"]*?" #match double quoted items |'[^']*?' #match single quoted items |`[A-Za-z0-9_ .'\"()+\\-&^%\$+?%\\/\\\\!`]+` #match backtick mysql names |[^ ,]+ #match keywords, operators and aliases |, /xi END_OF_REGEX;
EDIT: After some comments and further testing, I've
modified the regex to look like the following:
$regex = <<<'END_OF_REGEX' / [A-Za-z_.]+\(.*?\)+ # Match FUNCTION(...) |\(.*?\)+ # Match grouped items |"(?:[^"]|\"|"")*"+ # Match double quotes |'[^'](?:|\'|'')*'+ # and single quotes |`(?:[^`]|``)*`+ # and backticks |[^ ,]+ |, /x END_OF_REGEX;
This will properly match escaped values within quoted statements.
Functions with nested function calls or subselects may not end up
being fully grouped together. If anybody has any idea how I might
fix that, well, feel free to chime in :D
EDIT END
When used with the preg_match_all() function, an array is
produced which represents the tokenized string. Functions calls,
quoted strings and grouped expressions are returned together, and
may need additional processing, depending on your needs.
For example:
-- select sum(a * b) sweety,'abc' as a1, -- "abc",concat(`def`.`abc`,'hello', x.y, z) as `blah`, -- null + 1 -- from `abc`.`def` as def1 -- join xyz.zzz z1 -- on (z1.a = def1.a) Array ( [0] => select [1] => sum(a * b) [2] => sweety [3] => , [4] => 'abc' [5] => as [6] => a1 [7] => , [8] => "abc" [9] => , [10] => concat(`def`.`abc`,'hello', x.y, z) [11] => as [12] => `blah` [13] => , [14] => null [15] => + [16] => 1 [17] => from [18] => `abc`.`def` [19] => as [20] => def1 [21] => join [22] => xyz.zzz [23] => z1 [24] => on [25] => (z1.a = def1.a) )
I'm using this to process SQL statements and automatically create
the Flexviews calls for the query.
The following output is generated by my program for the above SQL
statement (only the SELECT clause is currently
represented):
call flexviews.add_expr(@mvid, 'SUM', 'a * b', 'sweety'); call flexviews.add_expr(@mvid, 'GROUP', 'abc', '1'); call flexviews.add_expr(@mvid, 'GROUP', "abc", '_abc_'); call flexviews.add_expr(@mvid, 'GROUP', 'concat(`def`.`abc`,\'hello\', x.y, z)', 'blah'); call flexviews.add_expr(@mvid, 'GROUP', 'null + 1', 'null_+_1');