Showing entries 1 to 6
Displaying posts with tag: aggregation (reset)
Advanced JSON for MySQL

What is JSON

JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL

It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don’t solve the really tough …

[Read more]
Advanced JSON for MySQL: indexing and aggregation for highly complex JSON documents

What is JSON
JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL
It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don't solve the really tough …

[Read more]
Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and …

[Read more]
Advantages of weighted lists in RDBMS processing

A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:

1
2
3
1


A set is similar to a list, but has the following differences:

  1. The size of the set is always known
  2. A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:

1,2
2,1
3,1

Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

[Read more]
Advantages of weighted lists in RDBMS processing

A list is simply a list of things. The list has no structure, except in some cases, the length of the list may be known. The list may contain duplicate items. In the following example the number 1 is included twice.

Example list:

1
2
3
1


A set is similar to a list, but has the following differences:

  1. The size of the set is always known
  2. A set may not contain duplicates

You can convert a list to a set by creating a 'weighted list'. The weighted list includes a count column so that you can determine when an item in the list appears more than once:

1,2
2,1
3,1

Notice that there are two number 1 values in the weighted list. In order to make insertions into such a list scalable, consider using partitioning to avoid large indexes.

[Read more]
I wrote a new tool that runs aggregation queries over MySQL sharded databases using Gearman.

I created a new tool this week:
http://code.google.com/p/shard-query

As the name Shard-Query suggests, the goal of the tool is to run a query over multiple shards, and to return the combined results together as a unified query. It uses Gearman to ask each server for a set of rows and then runs the query over the combined set. This isn't a new idea, however, Shard-Query is different than other Gearman examples I've seen, because it supports aggregation.

It does this by doing some basic query rewriting based on the input query.

Take this query for example:

select c2, 
       sum(s0.c1), 
       max(c1) 
 from t1 as s0 
 join t1 using (c1,c2) 
 where c2 = 98818 
 group by c2;



The tool will split this up into two queries.

This first query will be sent to each shard. Notice that …

[Read more]
Showing entries 1 to 6