Showing entries 1 to 10 of 17
7 Older Entries »
Displaying posts with tag: parallelism (reset)
Semi-Sync replication performance in MySQL 5.7.4 DMR

I was interested to hear about semi-sync replication improvements in MySQL’s 5.7.4 DMR release and decided to check it out.  I previously blogged about poor semi-sync performance and was pretty disappointed from semi-sync’s performance across WAN distances back then, particularly with many client threads.

The Test

The basic environment of these tests was:

  • AWS EC2 m3.medium instances
  • Master in us-east-1, slave in us-west-1 (~78ms ping RTT)
  • CentOS 6.5
  • innodb_flush_log_at_trx_commit=1
  • sync_binlog=1
  • Semi-sync replication plugin installed and enabled.
  • GTID’s enabled (except on 5.5)
  • sysbench 0.5 update_index.lua test, 60 seconds, 250k table size.
  • MySQL 5.7 was …
[Read more]
Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and …

[Read more]
Shard-Query 2.0 performance on the SSB with InnoDB on Tokutek’s MariaDB distribution

Scaling up a workload to many cores on a single host

Here are results for Shard-Query 2.0 Beta 1* on the Star Schema Benchmark at scale factor 10.  In the comparison below the “single threaded” response times for InnoDB are the response times reported in my previous test which did not use Shard-Query.

Shard-Query configuration

Shard-Query has been configured to use a single host.  The Shard-Query configuration repository is stored on the host.  Gearman is also running on the host, as are the Gearman workers.  In short, only one host is involved in the testing.

The …

[Read more]
Shard-Query 2.0 Beta 1 released

It is finally here.  After three years of development, the new version of Shard-Query is finally available for broad testing.

This new version of Shard-Query is vastly improved over previous versions in many ways.  This is in large part due to the fact that the previous version of Shard-Query (version 1.1) entered into production at a large company.  Their feedback during implementation was invaluable in building the new Shard-Query features.   The great thing is that this means that many of the new 2.0 features have already been tested in at least one production environment.

This post is intended to highlight the new features in Shard-Query 2.0.  I will be making posts about individual features as well as posting benchmark results.

[Read more]
So now Hadoop's days are numbered?

Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...

Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.

My take from the article is this: Hadoop is a foundation, low-level platform. I used the word …

[Read more]
ARM based data center. Inspiring.

In a previous post I wrote ARM based servers. Since then, and thanks to all the comments and responses I got, I looked more into this ARM thing and it's absolutely fascinating...

Look at this beauty (taken from the site of Calxeda, the manufacturer):

What is it? A chip? A server? No, it's a cluster of 4 servers...

And this:

is HP Redstone Server, 288 chips, 1,152 cores (Calxeda quad-core SoC) in a 4U server “Dramatically reducing the cost and complexity of cabling and …

[Read more]
The catch-22 of read/write splitting

In my previous post I covered the shard-disk paradigm's pros and cons, but the conclusion that is that it cannot really qualify as a scale-out solution, when it comes to massive OLTP, big-data, big-sessions-count and mixture of reads and writes.

Read/Write splitting is achieved when numerous replicated database servers are used for reads. This way the system can scale to cope with increase in concurrent load. This solution qualifies as a scale-out solution as it allow expansion beyond the boundaries of one DB, DB machines are shared-nothing, can be added as a slave to the replication "group" when required.


And, as a fact, read/write …

[Read more]
Scale differences between OLTP and Analytics


In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I reviewed the differences between OLTP and Analytics databases.

Scale challenges are different between those 2 worlds of databases.



Scale challenges in the Analytics world are with the growing amounts of data. Most solutions have been leveraging those 3 main aspects: Columnar storage, RAM and parallelism.
Columnar storage makes scans and data filtering more precise and focused. After that – it all goes down to the I/O - the faster the I/O is, the faster the query will finish and bring results. Faster disks and also SSD can play good role, but above all: RAM! …

[Read more]
Loading Air Traffic Control Data with TokuDB 4.1.1

TokuDB has a big advantage over B-trees when trickle loading data into existing tables. However, it is possible to preprocess the data when bulk loading into empty tables or when new indexes are created. TokuDB release 4 now uses a parallel algorithm to speed up these types of bulk insertions. How does the parallel loader performance compare with the serial loader? We use the Air Traffic Control (ATC) data and queries described in a Percona blog and also used in an experiment with TokuDB 2.1.0 to gain some insight.

Our ATC data is about 122M rows in size, is stored in a 40GiB CSV file, and can be found in our Amazon S3 public bucket. See the end of this blog for details. We …

[Read more]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from …

[Read more]
Showing entries 1 to 10 of 17
7 Older Entries »