Introduction Enterprises often start with a single server
setup.
As the enterprise grows, so does the data and the number of
requests for the data. Using a single server setup makes it
difficult to manage the increasing load. This creates the
requirement to scale out.
One solution would be to replicate to read servers. The writes go
to the master and the reads go to the slaves.
Although this setup handles the
increased read load, it still cannot handle the increasing write
load. Trying to handle this by adding another master just
compounds the problem. The write load must be repeated on every
master (more work for the application and each master is just as
busy as if there were just one).
Thus we need to tackle the problem in a different way. Instead of
focusing on our setup we shift focus to our data. By splitting
the data we can correspondingly distribute the load on the
data.
Fabric Sharding allows the distribution of rows from a MySQL
table over multiple database servers or physical locations. MySQL
Fabric Sharding - Basic Architecture
The basic architecture consists of the,
- Fabric Aware Connector
- Fabric Server
- Server Groups - That store the shard data.
MySQL Fabric Aware Connector The connector contacts the Fabric server to find the location of a particular shard key. The connector also implements caching to avoid repeated round trips to the Fabric server to fetch the sharding information. By building the functionality into the connector, the architecture avoids the need to for an extra hop to a proxy/routing node and so minimizes latency. In the future we may add the option of a standalone routing node to enable Fabric-unaware connectors work with MySQL Fabric. MySQL Fabric Server The Fabric server,
- Stores the sharding information
- Range Based Sharding
- Hash Based Sharding
- Handles Re-Sharding
- Shard Moving
- Shard Splitting
Server Groups - Storing the shard data Each server group
represents an HA configuration of MySQL Servers, with one of the
servers acting as master and the others acting as slaves.
Sharding Schemes The data can basically be partitioned using the
RANGE based or HASH based sharding scheme. In the current
version, the application must provide the sharding key but this
is something that we aim to make more transparent in the future.
Range Based Sharding The Range based sharding scheme partitions a
given table based on, user defined ranges of the value of a
column, chosen as the sharding key. For example, in the following
Employee table schema,
EmpID | EmpName | EmpAddress | EmpPhone |
|
|
|
|
The table can be partitioned on EmpID. Let us assume that
the EmpID ranges between values 0 – 10000. We can create five
partitions with lower_bounds = 0, 201, 401, 601,801. Where the
partitions with lower_bound = 0 stores all the rows with EmpID
between 0 – 200. Similarly the partition with lower_bound = 201
stores all the rows with EmpID between 201 – 400. Hash
Based Sharding The Hash based sharding scheme uses the md5
cryptographic hash and builds on the underlying RANGE based
sharding scheme. The lower_bounds for the underlying RANGE based
sharding scheme are generated by applying the cryptographic hash
on the group ID for the particular shard.
In order to decide whether a given row of a table must be in a
particular shard, we apply the md5 algorithm on the value of the
column chosen as the shard key. We compare this value with the
lower_bound of the shard definition to decide if the row should
be placed in this shard.
Hence in the case of the employee table schema, assume that we
have a FABRIC group with ID as GROUPID-1. We set lower_bound =
md5(“GROUPID-1”). Now when we want to insert a row into this
shard, we check to see if md5(EmpID for row) >
md5(“GROUPID-1”) before inserting. Handling Global Operations In
order to handle transactions that need to be executed on all the
shards, E.g. Schema updates / updates on global tables, FABRIC
creates the concept of a global group.
The Global Group stores the schema (IMPORTANT: Not the data) of
the tables that are sharded. It also stores the tables
(IMPORTANT: And the data) for all such tables that need to be
present on all the shards (E.g. Pincode table that might need to
participate in joins across all the shards).
When we need to fire a global operation, for example a schema
update on a sharded table, we run this on the global group. The
update automatically gets propagated across all the shards.
Moving the Shard
When a given shard is overloaded, we may decide to move the shard
into a new Fabric Group (For e.g. The new Fabric Group could be
composed of servers running on more powerful hardware). While the
shard move takes place we would like to ensure minimal impact on
ongoing transactions.
Fabric allows moving a shard registered with the Fabric system
into another Fabric group seamlessly. Splitting the
Shard
When the data in a shard grows larger than the size optimal for
good performance, we can split the data in this shard into 2
server groups. While shard split takes place we would like to
ensure minimal impact on ongoing transactions. Fabric allows
splitting a shard registered with the Fabric system into another
Fabric group seamlessly.
Find an use case for starting with a single server (single shard
setup) and scaling gradually as the application and its data
grows here http://vnwrites.blogspot.in/2013/09/mysqlfabric-sharding-example.html
.