Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 31 to 60 of 66 Next 6 Older Entries

Displaying posts with tag: sharding (reset)

Being successful like Pinterest without its DB adventures...
+1 Vote Up -0Vote Down
I just came across this: "Scaling Pinterest and adventures in database sharding"  (http://gigaom.com/data/scaling-pinterest-and-adventures-in-database-sharding/)
"Pinterest has learned about scaling the way most popular sites do — the architecture works until one day it doesn’t"
Pinterest found out that "the architecture" is not scalable and they turned to development of a Scale Out mechanism also called Sharding.

I find it amazing that sharding, or in other words, the idea of "scale out by splitting and parallelizing data across shared-nothing commodity-hardware" is not supplied "out of the box" by "the architecture" (such as database, load-balancer, any other IT stuff). I'm wondering who was the one that


  [Read more...]
Facebook makes big data look... big!
+1 Vote Up -0Vote Down
Oh I love these things: http://techcrunch.com/2012/08/22/how-big-is-facebooks-data-2-5-billion-pieces-of-content-and-500-terabytes-ingested-every-day/

Every day there are 2.5B content items shares, and 2.7B "Like"s. I care less about GiGo content itself, but metadata, connections, relations are kept transactionally in a relational database. The above 2 use-cases generate 5.2B transactions on the database, and since there are only 86400 seconds a day, we get over 60000 write transactions per second on the database, from these 2 use-cases alone, not to mention all other use-cases, such as new profiles, emails, queries...

And what's the



  [Read more...]
Scale Up, Partitioning, Scale Out
+1 Vote Up -0Vote Down
On the 8/16 I conducted a webinar titled: "Scale Up vs. Scale Out" (http://www.slideshare.net/ScaleBase/scalebase-webinar-816-scaleup-vs-scaleout):


ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut from ScaleBase
The webinar was successful, we had many attendees and great participation in questions and answers throughout the session and in the end. Only after the webinar it only occurred to me that one specific graphic was missing from the webinar deck. It was occurred to me after answering



  [Read more...]
ARM based data center. Inspiring.
+1 Vote Up -1Vote Down
In a previous post I wrote ARM based servers. Since then, and thanks to all the comments and responses I got, I looked more into this ARM thing and it's absolutely fascinating...

Look at this beauty (taken from the site of Calxeda, the manufacturer):

What is it? A chip? A server? No, it's a cluster of 4 servers...

And this:







  [Read more...]
Why shared-storage DB clusters don't scale
+4 Vote Up -2Vote Down
Yesterday I was asked by a customer for the reason why he had failed to achieve scale with a state-of-the-art "shared-storage" cluster. "It's a scale-out to 4 servers, but with a shared disk. And I got, after tons of work and efforts, 130% throughput, not even close to the expected 400%" he said.

Well, scale-out cannot be achieved with a shared storage and the word "shared" is the key. Scale-out is done with absolutely nothing shared or a "shared-nothing" architecture. This what makes it linear and unlimited. Any shared resource, creates a tremendous burden on each and every database server in the cluster.

In a previous post, I identified database engine activities such as buffer management, locking, thread locks/semaphores,



  [Read more...]
Impressions from Amazon's AWS Summit in NYC
+0 Vote Up -0Vote Down
Yesterday (4/19) I attended the AWS Summit in NYC (http://aws.amazon.com/aws-summit-2012/nyc).

I'm a big fan and also a heavy user of AWS especially S3, EC2, and naturally, RDS. In every point in time I have several dozens of AWS machines running for me out there in the East region, and in some cases when we do some special benchmarks and tests, number of EC2 and RDS machines can easily reach 3-digit. As I said, I'm a fan...

A few quotes I was able to catch and document on my laptop, on my laps...:
"When you develop an app for facebook, you must be prepared (and be afraid) that to your party, not noone will show up, but everybody will show up!"
So true! Simple and true. We all want to succeed, to have success with our app. We have to think about scaling




  [Read more...]
So how can we scale databases?
+0 Vote Up -0Vote Down
There are ways to scale databases, unfortunately some are limited, some introduce complexities, some are do not fit the cloud...

By scaling solution I mean a solutions that help me scale my existing environment, my existing RDBMS. Some magic or technology that will take my existing Oracle or MySQL for example, to the next level, without porting to a new DB engine/vendor and without completely recoding my app.

Let's try to organize things a bit in this very summarized table, just to get the hunch of it. I can't imagine to cover it all in 1 table or even 100 pages, but that should be a start of a meaningful discussion to continue in next posts:

Solution Scales reads? Scales writes? Scales data? Scales sessions? Cloud? Bottom line Scale-Up: faster HW, CPU, memory,





  [Read more...]
Applications come and go. Databases are here to scale.
+0 Vote Up -0Vote Down
In my heart, I'm a DBA, always was and always will be. People say I'm a database guy by the way I think, keep my car, and file my music and also bank statements... However I did great deal of development, design, architecture on the apps side. I (hope to) have some perspective.

Applications come and go. The second programming language I've ever learned and worked on was COBOL, some still say most of the world's lines of code are written in this language, maybe so, but anyway I since then have known and written in dozens of programming languages, from Assembly to Force.com, from Pascal to Delphi, from functional C to Object Oriented SmallTalk, C++, Java and , from compiled C/CGI to interpreted Perl, ASP and Ruby back to compiled node.js... My first applications ran on Main-Frame with green screen, later I created beautiful graphic

  [Read more...]
DbCharmer 1.7.0 Release: Rails 3.0 Support and Forced Slave Reads
+0 Vote Up -0Vote Down

This week, after 3 months in the works, we’ve finally released version 1.7.0 of DbCharmer ruby gem – Rails plugin that significantly extends ActiveRecord’s ability to work with multiple databases and/or database servers by adding features like multiple databases support, master/slave topologies support, sharding, etc.

New features in this release:

  • Rails 3.0 support. We’ve worked really hard to bring all the features we supported in Rails 2.X to the new version of Rails and now I’m proud that we’ve implemented them all and the implementation looks much cleaner and more universal (all kinds of relations in rails 3 work in exactly the same way and we do not need to implement connection switching for all kinds of weird corner-cases in ActiveRecord).
  • Forced
  [Read more...]
Proper handling of insert-mostly, select-recently datasets
+3 Vote Up -0Vote Down
Some kinds of large tables such as chat messages, blog entries, etc have the following characteristics.

* huge number of records, huge data and index size
* insert and select mostly
* select from only recent data
* select by secondary index (i.e. user_id)
* secondary index entries are inserted in random order

What are optimal ways to handle these tables? The below single large table does not perform well.
CREATE TABLE message (
message_id BINGINT UNSIGNED PRIMARY KEY,
user_id INT UNSIGNED,
body VARCHAR(255),
...
created DATETIME,
INDEX(user_id)
) ENGINE=InnoDB;

The cause of poor performance is a secondary index on user_id. user_id is inserted in random order. Index size grows, and sooner or later it will exceed RAM size. Once index size on user_id exceeds RAM size,















  [Read more...]
Shard-Query turbo charges Infobright community edition (ICE)
+2 Vote Up -1Vote Down

Shard-Query is an open source tool kit which helps improve the performance of queries against a MySQL database by distributing the work over multiple machines and/or multiple cores. This is similar to the divide and conquer approach that Hive takes in combination with Hadoop. Shard-Query applies a clever approach to parallelism which allows it to significantly improve the performance of queries by spreading the work over all available compute resources. In this test, Shard-Query averages a nearly 6x (max over 10x) improvement over the baseline, as shown in the following graph:

One



  [Read more...]
Intra-query parallelism for MySQL queries without an appliance or closed source database
+2 Vote Up -0Vote Down
*edit* I want to point out that this test was done on a single database server which used MySQL partitioning. This is a demonstration of how Shard-Query can improve performance in non-sharded databases too.*edit*.

Over the weekend I spent a lot of time improving my new Shard-Query tool (code.google.com/p/shard-query) and the improvements can equate to big performance gains on partitioned data sets versus executing the query directly on MySQL.


I'll explain this graph below, but lower is better (response time) and Shard-Query is the red line.

MySQL understands that queries which access data in only certain partitions don't have to read the rest of the table. This partition






  [Read more...]
I wrote a new tool that runs aggregation queries over MySQL sharded databases using Gearman.
+2 Vote Up -1Vote Down
I created a new tool this week:
http://code.google.com/p/shard-query

As the name Shard-Query suggests, the goal of the tool is to run a query over multiple shards, and to return the combined results together as a unified query. It uses Gearman to ask each server for a set of rows and then runs the query over the combined set. This isn't a new idea, however, Shard-Query is different than other Gearman examples I've seen, because it supports aggregation.

It does this by doing some basic query rewriting based on the input query.

Take this query for example:
select c2, 
       sum(s0.c1), 
       max(c1) 
 from t1 as s0 
 join t1 using (c1,c2) 
 where c2 = 98818 
 group by c2;


The tool will split this up into two queries.

This first query will be sent to each shard. Notice











  [Read more...]
DbCharmer – Rails Can Scale!
+0 Vote Up -1Vote Down

Back in November 2009 I was working on a project to port Scribd.com code base to Rails 2.2 and noticed that some old plugins we were using in 2.1 were abandoned by their authors. Some of them were just removed from the code base, but one needed a replacement – that was an old plugin called acts_as_readonlyable that helped us to distribute our queries among a cluster of MySQL slaves. There were some alternatives but we didn’t like them for one or another reasons so we’ve decided to go with creating our own ActiveRecord plugin, that would help us scale our databases out. That’s the story behind the first release of DbCharmer.

Today, six months after the first release of

  [Read more...]
Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine
Employee_Team +5 Vote Up -0Vote Down
The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.

However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce

  [Read more...]
MySQL University: The Spider Storage Engine
Employee +2 Vote Up -0Vote Down

This Thursday (November 26th, 14:00 UTC), Giuseppe Maxia will present the Spider Storage Engine. This session was originally scheduled for October 15th but had to be postponed for technical reasons.

Here's from the abstract: Everybody needs sharding. Which is not easy to maintain. Being tied to the application layer, sharding is hard to export and to interact with. The Spider storage engine, a plugin for MySQL 5.1 and later, solves the problem in a transparent way. It is an extension of partitioning. Using this engine, the user can deal transparently with multiple


  [Read more...]
MySQL University: The Spider Storage Engine
Employee +0 Vote Up -0Vote Down

This Thursday (November 26th, 14:00 UTC), Giuseppe Maxia will present the Spider Storage Engine. This session was originally scheduled for October 15th but had to be postponed for technical reasons.

Here's from the abstract: Everybody needs sharding. Which is not easy to maintain. Being tied to the application layer, sharding is hard to export and to interact with. The Spider storage engine, a plugin for MySQL 5.1 and later, solves the problem in a transparent way. It is an extension of partitioning. Using this engine, the user can deal transparently with


  [Read more...]
MySQL University: The Spider Storage Engine
Employee +0 Vote Up -0Vote Down

This Thursday (November 26th, 14:00 UTC), Giuseppe Maxia will present the Spider Storage Engine. This session was originally scheduled for October 15th but had to be postponed for technical reasons.

Here's from the abstract: Everybody needs sharding. Which is not easy to maintain. Being tied to the application layer, sharding is hard to export and to interact with. The Spider storage engine, a plugin for MySQL 5.1 and later, solves the problem in a transparent way. It is an extension of partitioning. Using this engine, the user can deal transparently with


  [Read more...]
“Shard early, shard often”
+1 Vote Up -0Vote Down

I wrote a post a while back that said why you don't want to shard.  In that post that I tried to explain that hardware advances such as 128G of RAM being so cheap is changing the point at which you need to shard, and that the (often omitted) operational issues created by sharding can be painful.

What I didn't mention was that if you've established that you will need to eventually shard, is it better to just get it out of the way early?  My answer is almost always no. That is to say I disagree with a statement I've been hearing recently; "shard early, shard often".  Here's why:

  • There's an order of magnitude better performance that can be gained by focusing on query/index/schema optimization.  The gains from sharding are usually much
  [Read more...]
Spider and vertical partition engines with new goodies
+4 Vote Up -0Vote Down



The Spider storage engine should be already known to the community. Its version 2.5 has recently been released, with new features, the most important of which is that you can execute remote SQL statements in the backend servers. The method is quite simple. Together with Spider, you also get an UDF that executes SQL code in a remote server. You send a query with parameters saying how to connect to the server, and check the result (1 for success, 0 for failure). If the SQL involves a SELECT, the result can be sent to a temporary table. Simple and effective.

In addition to the Spider engine, Kentoku




  [Read more...]
Video: The ScaleDB shared-disk clustering Storage Engine for MySQL
+4 Vote Up -0Vote Down

Mike Hogan, CEO of ScaleDB spoke at the Boston MySQL User Group in September 2009:

ScaleDB is a storage engine for MySQL that delivers shared-disk clustering. It has been described as the Oracle RAC of MySQL. Using ScaleDB, you can scale your cluster by simply adding nodes, without partitioning your data. Each node has full read/write capability, eliminating the need for slaves, while delivering cluster-level load balancing. ScaleDB is looking for additional beta testers, there is a sign up at http://www.scaledb.com.

Slides are online (and downloadable) at http://www.slideshare.net/Sheeri/scale-db-preso-for-boston-my-sql-meetup-92009

Watch the video online at http://www.youtube.com/watch?v=emu2WfNx4KA or directly embedded here:

How to generate per-database traffic statistics using mk-query-digest
+7 Vote Up -0Vote Down

We often encounter customers who have partitioned their applications among a number of databases within the same instance of MySQL (think application service providers who have a separate database per customer organization ... or wordpress-mu type of apps). For example, take the following single MySQL instance with multiple (identical) databases:

SHOW DATABASES;
+----------+
| Database |
+----------+
| db1      |
| db2      |
| db3      |
| db4      |
| mysql    |
+----------+

Separating the data in this manner is a great setup for being able to scale by simply migrating a subset of the databases to a different physical host when the existing host begins to get overloaded. But MySQL doesn't allow us to examine statistics on a per-database basis.

Enter Maatkit.

There is an often-ignored gem in

  [Read more...]
Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)
+4 Vote Up -0Vote Down

This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too.

Sharding for the masses View more documents from Giuseppe Maxia.

Why sharding? Scaling, of course. The MySQL way to solve this, is replication (even Yahoo! and Google use this).

When the master doesn’t have enough resources to cope with what you do (i.e. large data sets), replication

  [Read more...]
OpenSQLCamp 2009 presentation videos are online and free!
+4 Vote Up -0Vote Down

In record time, less than a week after the conference (thanks to the free Pinnacle Video Spin and YouTube), all 11 videos that were taken at OpenSQLCamp Europe are online.

For those who missed the sessions, or just want to relive the fun!

Almost all the sessions were filmed; regrettably Darren Cassar’s Securich – MySQL user administration and security made easy! and Stephane Combaudon’s Minimizing data access with covering indexes were not.

The YouTube videos have the descriptions and resources from the official conference pages, and links to pages. If there is more information to add (for example, the slides from

  [Read more...]
MySQL Sandbox and Spider at FrOSCon and OpenSQLCamp
+1 Vote Up -0Vote Down



FrOSCon and the OpenSQLCamp are about to start.
I am packing for Sankt Augustin, where I will attend the fourth edition of FrOSCon and the second OpenSQLCamp. I will have two sessions, Sharding for the masses, about the Spider storage engine and MySQL Sandbox 3, about one of my favorite tools.

The program is very rich. There will be several tracks in the main event and in the associated conferences. If you have any involvement or simply some





  [Read more...]
Why you don’t want to shard.
+6 Vote Up -1Vote Down

Note: This blog post is part 1 of 4 on building our training workshop.

The Percona training workshop will not cover sharding. If you follow our blog, you'll notice we don't talk much about the subject; in some cases it makes sense, but in many we've seen that it causes architectures to be prematurely complicated.

So let me state it: You don't want to shard.

Optimize everything else first, and then if performance still isn't good enough, it's time to take a very bitter medicine. The reason you need to shard basically

  [Read more...]
Sharding for the masses: the spider storage engine
+5 Vote Up -0Vote Down


In my previous article about the Spider storage engine, I made some tests and I saw that the engine has potential. I did also identify some problems, which were promptly addressed by the author. I have looked at the latest version (0.12), and the results are very much encouraging.

Installing the Spider storage engine is not trivial but not extremely difficult either. My previous article has a



  [Read more...]
I’m looking for sharding problems
+0 Vote Up -0Vote Down
Do you want a SPOCK tee shirt?  Read on:

I’m going to give a talk on Spockproxy (a sharding / connection pooling only version of MySQL proxy) at the MySQL conference and as I prepare I’m looking to give my talk broad appeal and try to address all kinds of problems folks might have sharding their databases.

So I’m throwing this question out to the MySQL community – Have you looked into sharding your database(s)?  Did you come up against problems that were difficult to solve? Please take a moment and let me know about them.  I’d like to address how to fix them with Spockproxy.  Even if you’ve solved these issues already or have no intension of using Spockproxy your problems could be interesting to others; add your sharding problem(s) in the comment below and look for me   [Read more...]
Database Sharding at Netlog, with MySQL and PHP
+0 Vote Up -0Vote Down

This article accompanies the slides from a presentation on database sharding. Sharding is a technique used for horizontal scaling of databases we are using at Netlog. If you’re interested in high performance, scalability, MySQL, php, caching, partitioning, Sphinx, federation or Netlog, read on …

This presentation was given at the second day of FOSDEM 2009 in Brussels. FOSDEM is an annual conference on open source software with about 5000 hackers. I was invited by Kris Buytaert and Lenz Grimmer to give a talk in the MySQL Dev

  [Read more...]
Database sharding at Netlog (FOSDEM talk slides)
+0 Vote Up -0Vote Down

Here are the slides from yesterday’s presentation about horizontal database scaling through sharding at the mySQL dev room at FOSDEM 2009.

I’ve got a ton of notes and remarks to these slides, which will become available here soon.

Previous 30 Newer Entries Showing entries 31 to 60 of 66 Next 6 Older Entries

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.