Showing entries 1 to 9
Displaying posts with tag: Justin Swanhart (reset)
Advanced JSON for MySQL

What is JSON

JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL

It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don’t solve the really tough …

[Read more]
‘Indexing’ JSON documents for efficient MySQL queries over JSON data

MySQL meets NoSQL with JSON UDF

I recently got back from FOSDEM, in Brussels, Belgium. While I was there I got to see a great talk by Sveta Smirnova, about her MySQL 5.7 Labs release JSON UDF functions. It is important to note that while the UDF come in a 5.7 release it is absolutely possible to compile and use the UDF with earlier versions of MySQL because the UDF interface has not changed for a long time. However, the UDF should still be considered alpha/preview level of quality and should not be used in production yet! For this example I am using Percona Server 5.6 with the UDF.

That being said, the proof-of-concept that I’m about to present here uses only one JSON function (JSON_EXTRACT) and it has worked well enough in my testing to present my idea here. The JSON functions will probably be GA sometime soon anyway, and this is a useful test of the JSON_EXTRACT function. …

[Read more]
Generating test data from the mysql> prompt

There are a lot of tools that generate test data.  Many of them have complex XML scripts or GUI interfaces that let you identify characteristics about the data. For testing query performance and many other applications, however, a simple quick and dirty data generator which can be constructed at the MySQL command line is useful.

First, let’s talk about what kind of data you can easily create with MySQL function calls:

You can generate a decimal number between zero and another number using the MySQL RAND() function like the following query (here between 0 and 10000):

SELECT RAND() * 10000;

Similarly, you can generate a random integer by adding the FLOOR() function:

SELECT FLOOR(RAND() * 10000)

You can generate a random string of 32 characters using MD5():

SELECT MD5(RAND() * 10000)

You can return a random integer between 500 and 1000 with the following:

[Read more]
Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into …

[Read more]
Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into which …

[Read more]
Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and …

[Read more]
MySQL webinar: ‘Introduction to open source column stores’

Join me Wednesday, September 18 at 10 a.m. PDT for an hour-long webinar where I will introduce the basic concepts behind column store technology. The webinar’s title is: “Introduction to open source column stores.”

What will be discussed?

This webinar will talk about Infobright, LucidDB, MonetDB, Hadoop (Impala) and other column stores

  • I will compare features between major column stores (both open and closed source).
  • Some benchmarks will be used to demonstrate the basic performance characteristics of the open source column stores.
  • There will be a question and answer session to ask me anything you like about column stores (you can also ask in the …
[Read more]
MySQL and the SSB – Part 2 – MyISAM vs InnoDB low concurrency

This blog post is part two in what is now a continuing series on the Star Schema Benchmark.

In my previous blog post I compared MySQL 5.5.30 to MySQL 5.6.10, both with default settings using only the InnoDB storage engine.  In my testing I discovered that innodb_old_blocks_time had an effect on performance of the benchmark.  There was some discussion in the comments and I promised to follow up with more SSB tests at a later date.

I also promised more low concurrency SSB tests when Peter blogged about the importance of performance at low concurrency.

The SSB
The SSB tests a database’s ability to optimize queries for a star schema. A star …

[Read more]
MySQL 5.6 vs MySQL 5.5 and the Star Schema Benchmark

MySQL 5.6 vs MySQL 5.5 & the Star Schema Benchmark

So far most of the benchmarks posted about MySQL 5.6 use the sysbench OLTP workload.  I wanted to test a set of queries which, unlike sysbench, utilize joins.  I also wanted an easily reproducible set of data which is more rich than the simple sysbench table.  The Star Schema Benchmark (SSB) seems ideal for this.

I wasn’t going to focus on the performance of individual queries in this post, but instead intended to focus only on the overall response time for answering all of the queries in the benchmark. I got some strange results, however, which showed MySQL 5.6.10 to be much slower than MySQL 5.5.30 even with only a single connection. I felt these results warranted deeper investigation, so I did some research and …

[Read more]
Showing entries 1 to 9