Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 16

Displaying posts with tag: parallel (reset)

Parallel replication: off by one
+2 Vote Up -0Vote Down

One of the most common errors in development is where a loop or a retrieval by index falls short or long by one unit, usually because of an oversight or a logic in coding.

Of the following snippets, which one will run 10 times?

/* #1 */    for (N = 0 ; N < 10; N++) printf("%d\n", N);

/* #2 */ for (N = 0 ; N <= 10; N++) printf("%d\n", N);

/* #3 */ for (N = 1 ; N <= 10; N++) printf("%d\n", N);

/* #4 */ for (N = 1 ; N < 10; N++) printf("%d\n", N);

The question is deceptive, as there are two snippets that will run 10 times (1 and 3). But they will print different numbers. If you ware aiming for numbers from 1 to 10, only #3 is good.

After many years of programming, off-by-one errors are rare in my code, and I have been able to spot








  [Read more...]
One billion
Employee +5 Vote Up -0Vote Down
As always, I am a little late, but I want to jump on the bandwagon and mention the recent MySQL Cluster milestone of passing 1 billion queries per minute. Apart from echoing the arbitrarily large ransom demand of Dr Evil, what does this mean?

Obviously 1 billion is only of interest to us humans as we generally happen to have 10 fingers, and seem to name multiples in steps of 10^3 for some reason. Each processor involved in this benchmark is clocked at several billion cycles per second, so a single billion is not so vast or fast.

Measuring over a minute also feels unnatural for a computer performance benchmark - we are used to lots of things happening every second! A minute is a long time in silicon.

What's





  [Read more...]
Eventual Consistency in MySQL Cluster - using epochs
Employee +5 Vote Up -0Vote Down



Before getting to the details of how eventual consistency is implemented, we need to look at epochs. Ndb Cluster maintains an internal distributed logical clock known as the epoch, represented as a 64 bit number. This epoch serves a number of internal functions, and is atomically advanced across all data nodes.

Epochs and consistent distributed state

Ndb is a parallel database, with multiple internal transaction coordinator components starting, executing and committing transactions against rows stored in different data nodes. Concurrent transactions only interact where they attempt to lock the same row. This






  [Read more...]
Some MySQL projects I think are cool - Shard-Query
Employee +0 Vote Up -0Vote Down
I've already described Justin Swanhart's Flexviews project as something I think is cool. Since then Justin appears to have been working more on Shard-Query which I also think is cool, perhaps even more so than Flexviews.

On the page linked above, Shard-Query is described using the following statements :

"Shard-Query is a distributed parallel query engine for MySQL"
"ShardQuery is a PHP class which is intended to make working with a partitioned dataset easier"
"ParallelPipelining - MPP distributed query engines runs fragments of queries in parallel, combining the results at the end. Like map/reduce except it speaks SQL





  [Read more...]
Some MySQL projects I think are cool - HandlerSocket Plugin
Employee +2 Vote Up -0Vote Down
The HandlerSocket project is described in Yoshinori Matsunobu's blog entry under the title 'Using MySQL as a NoSQL - A story for exceeding 750,000 qps on a commodity server'. It's a great headline and has generated a lot of buzz. Quite a few early commentators were a little confused about what it was - a new NoSQL system using InnoDB? A cache? In memory only? Where does Memcached come in? Does it support the Memcached protocol? If not, why not? Why is it called HandlerSocket?

Inspirations from Memcache may include the focus on simplicity, performance and a simple human readable protocol. As Yoshinori says, Kazuho Oku has already implemented a MySQLD-embedded Memcached server, no

  [Read more...]
Four short links: 7 June 2011
+0 Vote Up -2Vote Down

  • OMG Text -- a plugin for CSS framework Compass for directional text shadows. (via David Kaneda)
  • Build a Cheap Bitcoin Mine -- some day it will be revealed that the act of generating a bitcoin token is helping the Russian mafia to crack nuclear missile launch codes and Afghan druglords built the Bitcoin system to destabilize the US dollar.
  • Polycode -- a free, open-source, cross-platform framework for creative code. You can use it as a C++ API or as a standalone scripting language to get easy and simple access to accelerated 2D and 3D graphics, hardware shaders, sound and network programming, physics engines and
  •   [Read more...]
    Memory tuning fast paced ETL
    +3 Vote Up -0Vote Down

    Dear Kettle friends,

    on occasion we need to support environments where not only a lot of data needs to be processed but also in frequent batches.  For example, a new data file with hundreds of thousands of rows arrives in a folder every few seconds.

    In this setting we want to use clustering to use “commodity” computing resources in parallel.  In this blog post I’ll detail how the general architecture would look like and how to tune memory usage in this environment.

    Clustering was first created around the end of 2006.  Back then it looked like this.

    The master

    This is the most important part of our cluster.  It takes care of administrating network configuration and topology.  It also keeps track of the state of dynamically added slave servers.

    The master is started

      [Read more...]
    Journey upriver to the dark heart of ha_ndbcluster
    Employee +3 Vote Up -0Vote Down
    Unlike most other MySQL storage engines, Ndb does not perform all of its work in the MySQLD process. The Ndb table handler maps Storage Engine Api calls onto NdbApi calls, which eventually result in communication with data nodes. In terms of layers, we have SQL -> Handler Api -> NdbApi -> Communication. At each of these layer boundaries, the mapping between operations at the upper layer to operations at the lower layer is non trivial, based on runtime state, statistics, optimisations etc.

    The MySQL status variables can be used to understand the behaviour of the MySQL Server in terms of user commands processed, and also how these map to some of the Storage Engine Handler Api calls.

    Status variables



      [Read more...]
    Data distribution in MySQL Cluster
    Employee +5 Vote Up -0Vote Down
    MySQL Cluster distributes rows amongst the data nodes in a cluster, and also provides data replication. How does this work? What are the trade offs?

    Table fragments

    Tables are 'horizontally fragmented' into table fragments each containing a disjoint subset of the rows of the table. The union of rows in all table fragments is the set of rows in the table. Rows are always identified by their primary key. Tables with no primary key are given a hidden primary key by MySQLD.

    By default, one table fragment is created for each data node in the cluster at the time the table is created.

    Node groups and Fragment replicas

    The data nodes in a cluster are logically divided into Node groups. The size of each Node group is controlled by the NoOfReplicas parameter. All data nodes in a Node group store the same data. In









      [Read more...]
    How fast is parallel replication? See it live today
    +3 Vote Up -0Vote Down
    I talked about parallel replication last month. Since then, there has been a considerable interest for this feature. As far as I know, Tungsten's is the only implementation of this much coveted feature, so I can only compare with MySQL native replication.
    The most compelling question is "how fast is it?"
    That's a tricky one. The answer is the same that I give when someone asks me "how fast is MySQL". I always say: it depends.
    Running replication in a single thread is sometimes slower than the operations in the master. Many users complain that the single thread can't keep up with the master, and the slave lags behind. True. There is, however, a hidden benefit of single threaded replication: it requires less resources. There is no contention for


      [Read more...]
    Advanced replication for the masses - Part II - Parallel replication
    +4 Vote Up -1Vote Down
    I hope you liked the first part of this series of lessons. And I really hope that you have followed the instructions and got your little replication cluster up and working.
    If you haven't done that, thinking that you would spare your energies for more juicy matters, I have news for you. What I explained in the previous part is exactly what you need to do to set up parallel replication. With just a tiny additional detail.
    For the sake of the diligent readers who have followed the instructions with the first lessons, I won't repeat them, but I'll invite you

      [Read more...]
    Advanced replication for the masses - Part I - Getting started with Tungsten Replicator
    +11 Vote Up -1Vote Down
    MySQL DBAs and developers: oil your fingers and get ready to experience a new dimension of data replication. I am pleased to announce that Continuent has just released Tungsten Replicator 2.0, an open source data replication engine that can replace MySQL native replication with a set of advanced features.
    A note about the source code. The current version of Tungsten Replicator available in the web site is free to use, but it is not yet the open source version. We need a few weeks more to extract the code from the enterprise tree and make a new build. But we
      [Read more...]
    Low latency distributed parallel joins
    Employee +9 Vote Up -0Vote Down
    When MySQL AB bought Sun Microsystems in 2008 (or did Sun buy MySQL?), most of the MySQL team merged with the existing Database Technology Group (DBTG) within Sun. The DBTG group had been busy working on JavaDB, Postgres and other DB related projects as well as 'High Availability DB' (HADB), which was Sun's name for the database formerly known as Clustra.

    Clustra originated as a University research project which spun out into a startup company and was then acquired by Sun around the era of dot-com. A number of technical papers describing aspects of Clustra's design and history can be found online, and it is in many ways similar to Ndb Cluster, not just in their shared Scandinavian roots. Both are shared-nothing parallel databases originally aimed at the Telecoms market, supporting high availability

      [Read more...]
    Some MySQL projects I think are cool - Spider Storage Engine
    Employee +2 Vote Up -0Vote Down
    One thing that has puzzled me about MySQL Server is that it became famous for sharded scale-out deployments in well known web sites and yet has no visible support for such deployments. The MySQL killer feature for some time has been built-in asynchronous replication and gigabytes of blogs have been written about how to setup, use, debug and optimise replication, but when it comes to 'sharding' there is nothing built in. Perhaps to have attempted to implement something would have artificially constrained user's imaginations, whereas having no support at all has allowed 1,000 solutions to sprout? Perhaps there just wasn't MySQL developer bandwidth available, or perhaps it just wasn't the best use of the available time. In any case, it remains unclaimed territory to this day.

    On first hearing of the

      [Read more...]
    Ndb software architecture
    Employee +3 Vote Up -0Vote Down
    I'm sure that someone else can describe the actual history of Ndb development much better, but here's my limited and vague understanding.

    • Ndb is developed in an environment (Ericsson AXE telecoms switch) where Ericsson's PLEX is the language of choice
      PLEX supports multiple state machines (known as blocks) sending messages (known as signals) between them with some system-level conventions for starting up, restart and message classes. Blocks maintain internal state and define signal handling routines for different signal types. Very little abstraction within a block beyond subroutines is supported. (I'd love to hear some more detail on PLEX and how it has evolved). This architecture maps directly to the AXE processor


      [Read more...]
    Sorting a Terabyte in 197 seconds
    +2 Vote Up -0Vote Down

    Sorting a Terabyte in 197 seconds

    I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest.  I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today.  (And it will likely remain standing, since terabyte sorting is now deprecated because it’s too fast.  Now the challenge is to sort 100TB.)

    For many years Jim Gray ran a sorting contest to see how fast anyone could sort a terabtye worth of 100-byte records, how much data could be sorted in one minute, and how much data could be sorted for a penny.  After Jim’s

      [Read more...]
    Showing entries 1 to 16

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.