As promised, this is the final post in a series looking at
eventual consistency with MySQL Cluster asynchronous replication.
This time I'll describe the transaction dependency tracking used
with NDB$EPOCH_TRANS and review some of the implementation
properties.
Transaction based conflict handling with NDB$EPOCH_TRANS
NDB$EPOCH_TRANS is almost exactly the same as NDB$EPOCH, except
that when a conflict is detected on a row, the whole user
transaction which made the conflicting row change is marked as
conflicting, along with any dependent transactions. All of these
rejected row operations are then handled using inserts to an
exceptions table and realignment operations. This helps avoid the
row-shear problems described …
In previous posts I described how row conflicts are detected
using epochs. In this post I describe how they are handled.
Row based conflict handling with NDB$EPOCH
Once a row conflict is detected, as well as rejecting the row
change, row based conflict handling in the Slave will :
- Increment conflict counters
- Optionally insert a row into an exceptions table
For NDB$EPOCH, conflict detection and handling operates on one Cluster in an Active-Active pair designated as the Primary. When a Slave MySQLD attached to the Primary Cluster detects a conflict between data stored in the Primary and a replicated event from the Secondary, it needs to realign the Secondary to store the same values for the conflicting data. Realignment …
[Read more]
The last post described MySQL Cluster epochs and why
they provide a good basis for conflict detection, with a few
enhancements required. This post describes the
enhancements.
The following four mechanisms are required to implement conflict
detection via epochs :
- Slaves should 'reflect' information about replicated epochs
they have applied
Applied epoch numbers should be included in the Slave Binlog events returning to the originating cluster, in a Binlog position corresponding to the commit time of the replicated epoch transaction relative to Slave local transactions.
- Masters should maintain a maximum replicated epoch
…
Before getting to the details of how eventual consistency is
implemented, we need to look at epochs. Ndb Cluster maintains an
internal distributed logical clock known as the epoch,
represented as a 64 bit number. This epoch serves a number of
internal functions, and is atomically advanced across all data
nodes.
Epochs and consistent distributed state
Ndb is a parallel database, with multiple internal transaction
coordinator components starting, executing and committing
transactions against rows stored in different data nodes.
Concurrent transactions only interact where they attempt to lock
the same row. This design minimises unnecessary system-wide
synchronisation, enabling linear scalability of reads and
writes.
The stream of changes made to rows stored at a …
In my previous posts I introduced two new conflict detection
functions, NDB$EPOCH and NDB$EPOCH_TRANS without explaining how
these functions actually detect conflicts? To simplify the
explanation I'll initially consider two circularly replicating
MySQL Servers, A and B, rather than two replicating Clusters, but
the principles are the same.
Commit ordering
Avoiding conflicts requires that data is only modified on one
Server at a time. This can be done by defining Master/Slave roles
or Active/Passive partitions etc. Where this is not done, and
data can be …
I've already described Justin Swanhart's Flexviews project as
something I think is cool. Since then Justin appears to
have been working more on Shard-Query which I also think is cool, perhaps
even more so than Flexviews.
On the page linked above, Shard-Query is described using the
following statements :
"Shard-Query is a distributed parallel query engine for
MySQL"
"ShardQuery is a PHP class which is intended to make working with
a partitioned dataset easier""ParallelPipelining - MPP
distributed query engines runs fragments of queries in parallel,
combining the results at the end. Like map/reduce except it
speaks SQL directly."
The things I like from the above description :
- Distributed …
In my last post I described the motivation for the new NDB$EPOCH
conflict detection function in MySQL
Cluster. This function detects when a row has been
concurrently updated on two asynchronously replicating MySQL
Cluster databases, and takes steps to keep the databases in
alignment.
With NDB$EPOCH, conflicts are detected and handled on a row
granularity, as opposed to column granularity, as this is the
granularity of the epoch metadata used to detect conflicts.
Dealing with conflicts on a row-by-row basis has implications for
schema and application design. The NDB$EPOCH_TRANS function
extends NDB$EPOCH, giving …
tl;dr : New 'automatic' optimistic conflict detection functions
available giving the best of both optimistic and pessimistic
replication on the same data
MySQL replication supports a number of topologies, and one of the
most interesting is an active-active, or master-master topology,
where two or more Servers accept read and write traffic, with
asynchronous replication between them.
This topology has a number of attractions, including :
- Potentially higher availability
- Potentially low impact on read/write latency
- Service availability insensitive to replication failures
- Conceptually simple
However, data consistency is hard to maintain in this
environment. Data, and access to it, must usually be partitioned …
The HandlerSocket project is described in Yoshinori Matsunobu's blog entry under the
title 'Using MySQL as a NoSQL - A story for exceeding 750,000 qps
on a commodity server'. It's a great headline and has generated a
lot of buzz. Quite a few early commentators were a little
confused about what it was - a new NoSQL system using InnoDB? A
cache? In memory only? Where does Memcached come in? Does it
support the Memcached protocol? If not, why not? Why is it called
HandlerSocket?
Inspirations from Memcache may include the focus on simplicity,
performance and a simple human readable protocol. As Yoshinori
says, Kazuho Oku has already implemented a MySQLD-embedded
Memcached server, no need to do it again. What's more, the
Memcache protocol …
Unlike most other MySQL storage engines, Ndb does not perform all
of its work in the MySQLD process. The Ndb table handler maps
Storage Engine Api calls onto NdbApi calls, which eventually result in
communication with data nodes. In terms of layers, we have SQL
-> Handler Api -> NdbApi -> Communication. At each of
these layer boundaries, the mapping between operations at the
upper layer to operations at the lower layer is non trivial,
based on runtime state, statistics, optimisations etc.
The MySQL status variables can be used to understand the
behaviour of the MySQL Server in terms of user commands
processed, and also how these map to some of the Storage Engine
Handler Api calls.
Status variables tracking user commands start with …