For decades MySQL has considered BLOB
a sink for
data junk that didn’t fit the strictly typed SQL bill. As the
outside world began voting for JavaScript, JSON and
key-value/key-document stores,
TEXT
/BLOB
was rediscovered as the only
way to store JSON inside MySQL. But having no SQL support for the
JSON serialization format, JSON remained junk for MySQL. Community developed
SQL functions could not cure the issue. That said, MySQL 5.7
introduces SQL functions to work on JSON documents! MySQL slowly
takes a different position on JSON.
From the labyrinth: labs.mysql.com
In the recent past, MySQL has developed a talent hiding feature previews at http://labs.mysql.com/. This is where you find MySQL Fabric, and this is where you find the JSON UDFs (user defined functions, pluggable SQL functions). Please keep in mind, all labyrinth treasures are of pre-production quality.
Visit Labs, click to download and install the JSON functions. If
there is no binary for you, try a source build. The README guides
you through the installation. Basically, it means copying the
library that contains the pluggable SQL functions into MySQL’s
plugin directory and loading the functions into MySQL.
CREATE FUNCTION json_valid RETURNS integer SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_search RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_extract RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_replace RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_append RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_remove RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_set RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_merge RETURNS string SONAME 'libmy_json_udf.so'; CREATE FUNCTION json_contains_key RETURNS integer SONAME 'libmy_json_udf.so';
(Note: With my source build the resulting .so file got a name –
libmy_json_udf.so – different from the one – libmy_json.so –
assumed in the README.)
Putting expectations: writing about pre-production materials…
Let’s put expectations straight first. Whoever claims all downloads from labs.mysql.com are of highest, near production quality has probably never worked himself with the materials. Labs is for early feature previews. Quality can be accordingly. I think, this is fair. Any developer pushing work out as early as possible, given the constraints of a big US-based corporate company, deserves a “hurray!”. I can only ask you for providing user feedback as early as possible. Even if a quick test for one function shows the below result. Someone has been brave enough to push out code before the maximum amount of internal QA has been applied.
JSON | JSON_VALID() | JSLint Validator | Comment |
---|---|---|---|
"a"
|
Invalid | Invalid | Invalid. Must be object or array. |
[1]
|
Invalid | Valid | Valid. Array can be top level element. |
["a"]
|
Invalid | Valid | Valid. Follow-up error. |
{"a":true}
|
Valid | Valid | Object with pair. |
{"a":false,}
|
Valid | Invalid | Pair may only be followed by comma, if another pair follows. |
{"a":null}
|
Valid | Valid | Object with pair. |
{"a":nuLL}
|
Valid | Invalid | There is exactly one way to spell null: ‘null’. |
{"a":True}
|
Valid | Invalid | Follow-up error. |
{"a":FaLSE}
|
Valid | Invalid | Follow-up error. |
{"a":12345678901234567890}
|
Valid | Valid | Any number of digits allowed. |
{"a":-0.12}
|
Valid | Valid | Negative number. |
{"a":-0.1234E+39}
|
Valid | Valid | US national finance… |
{"a":1.23e0001}
|
Valid | Valid | |
[1,{"a":"bc"}]
|
Invalid | Valid | Follow-up error. |
{"1":"d","2":{"a":"bc"}}
|
Valid | Valid | Follow-up error. |
{"a":"\z"}
|
Valid | Invalid | Requires special handling. |
{"a":"\u01"}
|
Valid | Invalid | Must be four hexadecimal digits |
{"a":[1,]}
|
Invalid | Invalid | Unlike with objects, the wrong syntax is detected. |
If you glance over the results, you may come to the conclusion:
- This is pre-production with glitches
- This is pre-production with smelly array handling
Putting expectations: TEXT/BLOB handling
As hinted in the introduction, MySQL’s support for
TEXT
/BLOB
columns is not the best
possible, but those are exactly the SQL column types one would
use for storing JSON inside MySQL. Whoever says MySQL in 2013
likely speaks InnoDB when it comes to storage. InnoDB, for
example, stores only a 768 bytes prefix of a variable-length column in
the index. The rest goes off-page causing extra I/O. Please
note, I/O could be pretty cheap once the working set has been
loaded into memory.
If your JSON document requires more than 768 bytes, storage is
not optimal. 768 bytes do not mean 768 characters. Possibly, you
will be using CHARSET=utf8
, means a character takes
1 to 4 bytes to store. However, you may also go for
latin1
and have all unicode encoded upfront as \xxxx
. In
the worst case, 768 bytes means less than 200 characters before
off-page storage happens.
Indexing is limited to the prefix. There is no function based index in MySQL to speed up searches on individual JSON document keys.
The PBXT storage engine from MySQL 5.1 times had pretty fancy BLOB streaming capabilities. If you wanted to do something like MapReduce in MySQL, you would probably love to see streaming supported to avoid materialization of intermediate results and to keep processing buffers small. Another use case for streaming, which is unrelated to JSON, are media files (video, audio).
All this does not mean MySQL becomes useless! Performance could still be pretty solid… – give it a try. I rarely show any figures because I have only one respectively two core notebooks available for testing.
Introduction to the MySQL JSON functions
The JSON UDF download contains SQL functions to search JSON documents and functions to modify JSON document. If a function wants you to define on which subset of the entire document it shall work, it is listed under “by key” in the below table. Otherwise, it is listed as driven “by value” (not very precise, however, I had no better idea).
Search | |
---|---|
by value |
JSON_SEARCH()
|
by key |
JSON_CONTAINS_KEY() , JSON_EXTRACT()
|
Modify | |
by value |
JSON_MERGE()
|
by key |
JSON_APPEND() , JSON_REMOVE() ,
JSON_REPLACE() , JSON_SET()
|
A function that works “by key” on a subset of a JSON document,
usually has the prototype: JSON_SOMETHING(string
json_document, string key [, string key...],
other_parameter)
. The first argument passed to the
functions is always the JSON document to work on. It follows a
variable number of arguments that describe on which subpart of
the entire document the function shall work. Having this
string key [, string key...]
arguments in the middle
of the function signature is a bit odd. Some programming
languages may forbid this for style considerations. However,
depending on the function, further arguments may follow, such as
the value to search for or to add.
The key syntax is the key to understanding…
Let’s use JSON_CONTAINS_KEY(string document, string key[,
string key...])
to explore the syntax at the example of a
most basic JSON document. The JSON document consists of an object
with one member called “key”.
{
"key":"value"
}
To check whether the document has a member called “key” you call
JSON_CONTAINS_KEY('{"key":"value"}', "key")
. Easy!
Albeit: not beautiful.
The deeper your JSON document is nested, the longer the variable
length string key [, string key...]
part in your
function call gets. To get down to the nested object from the
document below you write:
{
"key":"value",
"other_key_level_1":{
"another_key_level_2":1
}
}
mysql> select json_contains_key(
'{"key":"value","other_key_level_1":{"another_key_level_2":1}}',
-> "other_key_level_1", "another_key_level_2")\G
*************************** 1. row ***************************
json_contains_key(
'{"key":"value","other_key_level_1":{"another_key_level_2":1}}',
"other_key_level_1", "another_key_level_2"): 1
From my short experience with the functions, nesting of arbitrary depth is supported. Means, the limit should be in the tens of tousands or the available amount of memory.
Keys and arrays
Keys and arrays work as expected. Array keys are zero-based. The first element in an array is accessed through the key “0”, the second element through key “1” and so forth. Note, however, that you have to use strings.
mysql> select json_contains_key('{"key":[1]}', "key", "0")\G
*************************** 1. row ***************************
json_contains_key('{"key":[1]}', "key", "0"): 1
No secret key to candies…
It may sound a bit though but this is a case where you see the
difference between a MeeTooSQL and a system that has had the time
and resources to add syntactic sugar. With a bit of syntactic
sugar, say “SELECT document.member[offset]” this could look much
more appealing. On the contrary, I am talking syntaxtic sugar
only! Syntactic sugar is really hard to add with todays MySQL.
The MySQL parser is not modular, and at the hearth of the entire
system, which forbids icing on the cake in the making. Not to
speak of ignoring the SQL standard and the option to support such
a notation in UDFs at the cost of parsing string arguments (over
processing a **va_args
list). Still, as a developer,
… See also,
Searching document stores in 2013: from 1983 to
SQL:2003 in a blink?
Searching for values
Searching JSON documents is most limited to exact match. There is
no support for wildcards. Neither are regular expressions
supported not fuzzy matching as with SQL LIKE
. I am
not aware of a straight forward way to do a case insensitive
search. Converting the entire JSON document to upper or lower
case and comparing with a correspondingly modified search value
is no option as the conversion would also affect JSON object
member names (your “keys”).
Here’s what the README, the only documentation available apart
from the *.c[omment]/*.h[elp] files, says about
JSON_SEARCH()
: Searches for specified value in the
document. Returns key path of the element which contains the
value in reverse order or NULL
if parsing failed.. I
assume that there is no way to search for “keys” with a specific
name.
Whatever, here’s my attempt to find “value” in the most basic document…
{
"key":"value"
}
mysql> select json_search('{"key":"value"}', "value" )\G
*************************** 1. row ***************************
json_search('{"key":"value"}', "value" ): NULL
I don’t quite buy that there is no “value” in my writing. Maybe I got the syntax wrong?
mysql> select json_search('{"a":{"b":"c"}}', "c" )\G
*************************** 1. row ***************************
json_search('{"a":{"b":"c"}}', "c" ): NULL
Unfortunately, the result does not look any better. I must be missing something fundamental not only documentation, because:
Search | ||
---|---|---|
Document | JSON_SEARCH(what) | Comment |
{"key":"value"}
|
value
|
Not found |
{"a":{"b":"c"}}
|
c
|
Not found |
{"a":"b", "c": "d"}
|
d
|
Not found |
{"a":"b","c":["2", "3"]}
|
2
|
Not found |
{"a":[1]}
|
1
|
Found |
Features | ||
Exact string match | Supported | |
Regular expression | Unsupported | |
Fuzzy wildcard | Unsupported | |
User defined comparsion expression, e.g. for case-insensitive search | Unsupported |
Fast forward to a successful search. Upon success, the function
returns a […] key path of the element which contains the value in
reverse order […].
mysql> select json_search('{"key":[1]}', "1" )\G *************************** 1. row *************************** json_search('{"key":[1]}', "1" ): 0:key::
Note the key path returned: 0:key::
. The interesting
bit here is the double double dot ::
.
JSON_CONTAINS_KEY
unveils that no magic key “::”
exists. But, if it does not exist, why does
JSON_SEARCH
report it? There is no single root
element in JSON documents, because JSON documents do not form
trees, and there seems to be no need for a prefix.
mysql> select json_contains_key('{"a":[1]}', "::" )\G *************************** 1. row *************************** json_contains_key('{"a":[1]}', "::" ): 0 mysql> select json_contains_key('{"a":[1]}', ":" )\G *************************** 1. row *************************** json_contains_key('{"a":[1]}', ":" ): 0
Modifying data
It may be entertaining to point out bugs and quirks of a pre-production version, but what really matters is the feature set and direction. Search works, albeit limited to most basic exact string match. What about modifying data? Again, there are functions that work on parts of a document and one that does not require a key path.
JSON_MERGE
:
Merges two or more documents into one. Returns first document with following documents appended.
But what exactly is a document? A document is anything
that JSON_VALID
considers valid. As argued above,
JSON_VALID
should consider both an object and an
array as a valid document. That is
{"key":"value"}
respectively [1]
.
mysql> select json_merge('{"a":"b"}', '{"c":"d"}' )\G *************************** 1. row *************************** json_merge('{"a":"b"}', '{"c":"d"}' ): {"a":"b", "c":"d"}
Simple cases work flawless. But then, there are the more tricky
ones such as merging and object with an array. This may be
actually undefined and there may be no solution: under which
member name should the array appear? Or, what if two objects are
merged that have members of the same name at the same level (see
table below).
The README continues stating about JSON_MERGE
: If
one of following documents does not contain an opening curly
bracket first documents merged are returned and warning is
generated. Reasonable, but…
mysql> select json_merge('{"a":"b"}', '"c":"d"}' )\G *************************** 1. row *************************** json_merge('{"a":"b"}', '"c":"d"}' ): {"a":"b"} 1 row in set (0,00 sec) mysql> show warnings; Empty set (0,00 sec)
And the last sentence from the README is: [returns] NULL if first
document does not contain an opening curly bracket. Which tells
me that “document” is a quite fuzzy description for the functions
parameters. Assuming the function would work on documents
(objects or arrays), a missing opening curly bracket (applies to
object only) would be just one of many possible ways of passing
invalid “documents” as input values. Whatever, test and
fail:
mysql> select json_merge('"a":"b"}', '{"c":"d"}' )\G *************************** 1. row *************************** json_merge('"a":"b"}', '{"c":"d"}' ): "a":"b", "c":"d"}
Merging: JSON_MERGE | |||
---|---|---|---|
JSON 1 | JSON 2 | Result | Comment |
{"a":"b"}
|
{"c":"d"}
|
{"a":"b","c":"d"}
|
OK |
{"a":"b"}
|
{"c":{"d":"e"}}
|
{"a":"b","c":{"d":"e"}}
|
OK |
{"a":"b"}
|
{}
|
{"a":"b", }
|
Input: two valid objects, Result: invalid object |
{"a":"b"}
|
{"a":"c"}
|
{"a":"b", "a":"c" }
|
OK, but for serialization into native JavaScript variable questionable |
{"a":"b"}
|
{"a":"b"}
|
{"a":"b", "a":"b" }
|
OK, but again seems a bit questionable considering serialization into any native programming language object |
{"a":"b"}
|
[1]
|
{"a":"b"}
|
Somewhat undefinable result, error/warning missing |
[1,2]
|
[3,4,5]
|
[1,2]
|
Questionable, error/warning missing |
{"a":"b"}
|
"c":"d"}
|
{"a":"b"}
|
OK, but error/warning missing |
{"a":"b"}
|
{true: "c"}
|
{"a":"b", true:"c"}
|
Wrong, returns invalid JSON, error/warning missing |
"a":"b"}
|
{"c":"d"}
|
"a":"b","c":"d"}
|
Wrong, README says NULL shall be returned |
The remainder
I’ll iterate over the remaining functions only very briefly. The story pretty much continues as it has begun.
It seems that JSON_APPEND
can be used inject new
data into an existing document: Inserts new element into JSON
document. Returns document with appended element or NULL if
parsing failed.. The syntax documented is json_append(doc,
keypart1, keypart2, ..., new_element)
. When I first time
read this, I looked pretty puzzled when I tried to append an
element to the end of an array:
mysql> select json_append('{"a":[1]}', "a", "2" )\G *************************** 1. row *************************** json_append('{"a":[1]}', "a", "2" ): {"a":[1]}
The trick is that here new_value
should read
new_key, new_value
. The correct syntax seems to
be:
mysql> select json_append('{"a":[1]}', "a", "99", "3" )\G *************************** 1. row *************************** json_append('{"a":[1]}', "a", "99", "3" ): {"a":[1, 3]}
Note, that I used offset “99”. Any offset greater or equal to “1”
worked for me. Next try: appending pairs to a nested
object:
mysql> select json_append('{"a":{"b":"c"}}', "a", "d", "e" )\G *************************** 1. row *************************** json_append('{"a":{"b":"c"}}', "a", "d", "e" ): {"a":{"b":"c", "d": e}}
Have you noticed? Invalid JSON returned…
Modifying: JSON_APPEND | |||||
---|---|---|---|---|---|
JSON | key part | new_key | new_value | Result | Comment |
{"a":"b"}
|
"a"
|
"c"
|
"d"
|
{“a”:”b”} | Nothing inserted, no error/warning |
{"a":"b"}
|
"a"
|
"c"
|
{“a”:”b”} | Nothing inserted, no error/warning, unclear syntax description. Same with arrays. | |
{"a":[1]}
|
"a"
|
99
|
2
|
{“a”:[1, 2]} | OK, nice offset handling |
"a":{"b":"c"}}
|
"a"
|
"d"
|
"e"
|
{“a”:{“b”:”c”, “d”: e}} | Invalid JSON returned |
All in all, it smells as if it was too early to blog about it.
Instead of demoing how to use JSON with MySQL, I ended up in a
life debug session for PlanetMySQL. It is not really hard to do
the testing. JSON has a pretty simple gramma. Go and play with
the productions of the gramma, add a bit of “first user, no
docs”, and its easy to find yet another “gotcha”. Invalid JSON
returned:
mysql> select json_replace('{"a":[1]}', "a", "b")\G *************************** 1. row *************************** json_replace('{"a":[1]}', "a", "b"): {"a":b
Taken from the README. For most functions manipulating or generating JSON documents is true: Warning! This version does not check whole document for validity. Hey, it is a labs.mysql.com pre-production release :-).
-
json_append(doc, keypart1, keypart2, ..., new_element)
Inserts new element into JSON document. Returns document with appended element orNULL
if parsing failed.
Note:json_append(doc, keypart1, keypart2, ..., new_pair_key, new_pair_value)
might be a better description, see above. -
json_contains_key(doc, keypart1, keypart2, ...)
Checks if documents contains specified key. ReturnsTRUE
if key exists,FALSE
if not exists,NULL
if parsing failed. -
json_extract(doc, keypart1, keypart2, ...)
Extracts value of the specified key. Returns value of the key specified,NULL
if parsing failed. -
json_merge(doc1, doc2, ...)
Merges two or more documents into one. Returns first document with following documents appended. If one of following documents does not contain an opening curly bracket first documents merged are returned and warning is generatedNULL
if first document does not contain an opening curly bracket. -
json_remove(doc, keypart1, keypart2, ...)
Removes element specified by the key. Returns document without the element orNULL
if parsing failed. -
json_replace(doc, keypart1, keypart2, ..., new_value)
Updates value of specified key. Returns document with replaced key or original document if no such an element found,NULL
if parsing failed. -
json_search(doc, value)
Searches for specified value in the document. Returns key path of the element which contains the value in reverse order orNULL
if parsing failed. -
json_set(doc, keypart1, keypart2, ..., new_value)
Performs kind ofINSERT ... ON DUPLICATE KEY UPDATE
operation.
Returns document with updated or inserted element orNULL
if parsing failed. -
json_test_parser(doc)
Returns text representation of parse tree of the JSON document, empty string if document is invalid. This function is supposed to use for tests only and should not be used in production. -
json_valid(doc)
Checks if doc is valid JSON document. ReturnsTRUE
if document is valid,FALSE
is document is invalid.
My initial take
The new set of SQL functions to work on JSON is a most welcome addition. BUT there are issues with the very first pre-production version published on MySQL Labs.
The list of functions available covers only most basic search and modification. Check the PHP manual array functions to get an idea of what developers can envision to do with hashes and have readily available as a built-in function! For example, there is no way to recursively search a document/hash using a user-supplied comparison function. This, however, does not bother me much: once there is a place for dumping new functions, it is only a matter of demand until someone adds them.
TEXT
/BLOB
columns holding JSON
documents remain second class citizen to MySQL. Think off page
storage, think indexing. Given MySQLs’ current pace, are you
pessimistic about the future…?
Happy hacking!