Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 15

Displaying posts with tag: parallelism (reset)

Shard-Query 2.0 performance on the SSB with InnoDB on Tokutek’s MariaDB distribution
+2 Vote Up -0Vote Down
Scaling up a workload to many cores on a single host

Here are results for Shard-Query 2.0 Beta 1* on the Star Schema Benchmark at scale factor 10.  In the comparison below the “single threaded” response times for InnoDB are the response times reported in my previous test which did not use Shard-Query.

Shard-Query configuration

Shard-Query has been configured to use a single host.  The Shard-Query configuration repository is stored on the host.  Gearman is also running on the host, as are the Gearman workers.  In short, only one host is involved in the testing.

The

  [Read more...]
Shard-Query 2.0 Beta 1 released
+4 Vote Up -0Vote Down

It is finally here.  After three years of development, the new version of Shard-Query is finally available for broad testing.

This new version of Shard-Query is vastly improved over previous versions in many ways.  This is in large part due to the fact that the previous version of Shard-Query (version 1.1) entered into production at a large company.  Their feedback during implementation was invaluable in building the new Shard-Query features.   The great thing is that this means that many of the new 2.0 features have already been tested in at least one production environment.

This post is intended to highlight the new features in Shard-Query 2.0.  I will be making posts about individual features as well as posting benchmark results.

  [Read more...]
So now Hadoop's days are numbered?
+3 Vote Up -0Vote Down
Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"
I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...

Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.

My take from the article is




  [Read more...]
ARM based data center. Inspiring.
+1 Vote Up -1Vote Down
In a previous post I wrote ARM based servers. Since then, and thanks to all the comments and responses I got, I looked more into this ARM thing and it's absolutely fascinating...

Look at this beauty (taken from the site of Calxeda, the manufacturer):

What is it? A chip? A server? No, it's a cluster of 4 servers...

And this:







  [Read more...]
The catch-22 of read/write splitting
+2 Vote Up -0Vote Down
In my previous post I covered the shard-disk paradigm's pros and cons, but the conclusion that is that it cannot really qualify as a scale-out solution, when it comes to massive OLTP, big-data, big-sessions-count and mixture of reads and writes.

Read/Write splitting is achieved when numerous replicated database servers are used for reads. This way the system can scale to cope with increase in concurrent load. This solution qualifies as a scale-out solution as it allow expansion beyond the boundaries of one DB, DB machines are shared-nothing, can be added as a slave to the replication "group" when required.



  [Read more...]
Scale differences between OLTP and Analytics
+1 Vote Up -0Vote Down

In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I reviewed the differences between OLTP and Analytics databases.

Scale challenges are different between those 2 worlds of databases.



Scale challenges in the Analytics world are with the growing amounts of data. Most solutions have been leveraging those 3 main aspects: Columnar storage, RAM and parallelism.
Columnar storage makes scans and data filtering more precise and focused. After that – it all goes down to







  [Read more...]
Loading Air Traffic Control Data with TokuDB 4.1.1
+2 Vote Up -0Vote Down

TokuDB has a big advantage over B-trees when trickle loading data into existing tables. However, it is possible to preprocess the data when bulk loading into empty tables or when new indexes are created. TokuDB release 4 now uses a parallel algorithm to speed up these types of bulk insertions. How does the parallel loader performance compare with the serial loader? We use the Air Traffic Control (ATC) data and queries described in a Percona blog and also used in an experiment with TokuDB 2.1.0 to gain some insight.

Our ATC data is about 122M rows in size, is stored in a 40GiB CSV file, and can be found in our Amazon S3 public

  [Read more...]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1
+4 Vote Up -3Vote Down

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than

  [Read more...]
Intra-query parallelism for MySQL queries without an appliance or closed source database
+2 Vote Up -0Vote Down
*edit* I want to point out that this test was done on a single database server which used MySQL partitioning. This is a demonstration of how Shard-Query can improve performance in non-sharded databases too.*edit*.

Over the weekend I spent a lot of time improving my new Shard-Query tool (code.google.com/p/shard-query) and the improvements can equate to big performance gains on partitioned data sets versus executing the query directly on MySQL.


I'll explain this graph below, but lower is better (response time) and Shard-Query is the red line.

MySQL understands that queries which access data in only certain partitions don't have to read the rest of the table. This partition






  [Read more...]
Scaling Memcached: 500,000+ Operations/Second with a Single-Socket UltraSPARC T2
Employee +0 Vote Up -0Vote Down

A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns.

Table below shows out-of-the-box (no custom OS rewrites or networking tuning required) performance with 10G networking hardware and one single-socket UltraSPARC T2-based server with 8 cores and 8 threads per core (64 threads on a chip). All runs are done with a single memcached instance and 40 worker threads so that about 3 cores (24 threads) are used for the critical networking stack that is also heavily

  [Read more...]
Scaling Memcached: 500,000+ Operations/Second with a Single-Socket UltraSPARC T2
Employee +0 Vote Up -0Vote Down

A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns.

Table below shows out-of-the-box (no custom OS rewrites or networking tuning required) performance with 10G networking hardware and one single-socket UltraSPARC T2-based server with 8 cores and 8 threads per core (64 threads on a chip). All runs are done with a single memcached instance and 40 worker threads so that about 3 cores (24 threads) are used for the critical networking stack that is also heavily

  [Read more...]
Sequential Web Frontends/Browsers are the Killer
Employee +0 Vote Up -0Vote Down

Response times of any web application are very critical for the end-user experience. Steve Souders takes a detailed look at several large Web sites and concludes that 80-90% of the end-user response time is spent on the frontend, i.e., program code that is running inside your Web browser.

Traditional parallelization techniques and caching are without a doubt very effective in the design of scalable Web servers, databases, operating systems and other mission-critical software and hardware components. Assume that all these components are perfectly parallel and optimized, Amdhal's law still suggests that response time improvements will be very modest, or barely measurable.

Sequential Web Frontends/Browsers are the Killer
Employee +0 Vote Up -0Vote Down

Response times of any web application are very critical for the end-user experience. Steve Souders takes a detailed look at several large Web sites and concludes that 80-90% of the end-user response time is spent on the frontend, i.e., program code that is running inside your Web browser.

Traditional parallelization techniques and caching are without a doubt very effective in the design of scalable Web servers, databases, operating systems and other mission-critical software and hardware components. Assume that all these components are perfectly parallel and optimized, Amdhal's law still suggests that response time improvements will be very modest, or barely measurable.

Real-World Concurrency
Employee +0 Vote Up -0Vote Down

One interesting and useful paper on real-world concurrency by Bryan Cantrill and Jeff Bonwick.

Abstract: In this look at how concurrency affects practitioners in the real world, Cantrill and Bonwick argue that much of the anxiety over concurrency is unwarranted. Most developers who build typical MVC systems can leverage parallelism by combining pieces of already concurrent software such as database and operating systems (i.e., concurrency through architecture), rather than by writing multithreaded code themselves. And for those who actually must deal with threads and locks, the authors include a helpful list of best practices to help minimize the pain.

Real-World Concurrency
Employee +0 Vote Up -0Vote Down

One interesting and useful paper on real-world concurrency by Bryan Cantrill and Jeff Bonwick.

Abstract: In this look at how concurrency affects practitioners in the real world, Cantrill and Bonwick argue that much of the anxiety over concurrency is unwarranted. Most developers who build typical MVC systems can leverage parallelism by combining pieces of already concurrent software such as database and operating systems (i.e., concurrency through architecture), rather than by writing multithreaded code themselves. And for those who actually must deal with threads and locks, the authors include a helpful list of best practices to help minimize the pain.

Showing entries 1 to 15

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.