We regularly receive questions from our user community with
regards to which AMIs to use when deploying database clusters on
Amazon EC2.
As part of our ongoing development work on the Severalnines Configurator and ClusterControl, we have recently done some testing
on deploying MySQL Cluster on EC2 using Severalnines on three
different AMIs. We thought we should share the results of these
tests, hence the reason for this week's blog!
If you would like to test such a deployment yourself, feel free
to use the parameters and guidelines below to do so. You can also
check out these new videos to see Severalnines technology in …
Not every company or application needs an elastic database. Some
applications can get by just fine on a single database server,
rendering database elasticity moot from their perspective. To
make this determination, simply ask yourself:
1. Will I need more than a single database server? Look at your
current load and your projected growth and ask yourself whether
it will exceed the capacity of a single server. If it doesn’t
now, nor will it in the future, then you don’t need an elastic
database.
2. Will my load fluctuate sufficiently to warrant the investment
in elasticity? If your database requirements won’t experience
fluctuations in demand—e.g. daily, weekly, monthly, seasonal
changes in the number of servers required—then elasticity isn’t
important. For example, if you have a social networking
application that requires 2 database nodes 24x7, but peaks at 10
nodes for 2 hours a night, then elasticity is important. If your …
The primary reasons people are moving to the public cloud are:
(1) replace capital expenses with operating expenses (pay as you
go); (2) use shared resources for processes like back-up,
maintenance, networking (shared expenses); (3) use shared
infrastructure that enables you to pay only for those resources
you actually use, instead of consuming your maximum load
resources at all times (pay-per-use). The first thing you’ll
notice is that all 3 cloud benefits have their basis in finances
or the cloud business model.
We will focus in on #3 above: Pay-Per-Use. The old school model
was to build your compute infrastructure for the maximum load
today, plus growth over the life-cycle of the equipment, plus
some buffer so the systems don’t get overloaded from spikes in
usage. The net result is that your average usage might run 10% of
the potential for the infrastructure you mortgaged your home to
buy. In other words, you were paying 10X more than …
The primary database architectures—shared-disk and
shared-nothing—each have their advantages. Shared-disk has
functional advantages such as high-availability, elasticity, ease
of set-up and maintenance, eliminates partitioning/sharding,
eliminates master-slave, etc. The shared-nothing advantages are
better performance and lower costs. What if you could offer a
database that is a hybrid of the two; one that offers the
advantages of both. This sounds too good to be true, but it is
fact what ScaleDB has done.
The underlying architecture is shared-disk, but in
many situations it can operate like shared-nothing.
You see the problems with shared-disk arise from the messaging
necessary to (a) ship data among nodes and storage; and (b)
synchronize the nodes in the cluster. The trick is to move the
messaging outside of the transaction so it doesn’t impact
performance. The way to achieve that is to exploit locality. Let …
The CAP Theorem has become a convenient excuse for throwing data
consistency under the bus. It is automatically assumed that every
distributed system falls prey to CAP and therefore must sacrifice
one of the three objectives, with consistency being the
consistent fall guy. This automatic assumption is simply false. I
am not debating the validity of the CAP Theorem, but instead
positing that the onset of CAP limitations—what I call the CAP
event horizon—does not start as soon as you move to a second
master database node. Certain approaches can, in fact, extend the
CAP event horizon.
Physics tells us that different properties apply at different
scales. For example, quantum physics displays properties that do
not apply at larger scale. We see similar nuances in scaling
databases. For example, if you are running a master slave
database, using synchronous replication with a single slave is no
problem. Add nine more slaves and it slows the …
ScaleDB and Oracle RAC are both clustered databases that use a
shared-disk architecture. As I have mentioned previously, they
both actually share data via a shared cache, so it might be more
appropriate to call them shared-cache databases.
Whether it is called shared-disk or shared-cache, these databases
must orchestrate the sharing of a single set of data amongst
multiple nodes. This introduces two challenges: the physical
sharing of the data and the logical sharing of the data.
Physical Sharing:
Raw storage is meant to work on a 1:1 basis with a single server.
In order to share that data amongst multiple servers, you need
either a Network File System (NFS), which shares whole files, or
a Cluster File System (CFS), which shares data blocks.
Logical Sharing:
This is specific to …
As described in the prior post, the shared-disk performance
dilemma is simple:
1. If each node stores/processes data in memory, versus disk, it
is much faster.
2. Each node must expose the most recent data to the other nodes,
so those other nodes are not using old data.
In other words, #1 above says flush data to disk VERY
INFREQUENTLY for better performance, while #2 says flush
everything to disk IMMEDIATELY for data consistency.
Oracle recognized this dilemma when they built Oracle Parallel
Server (OPS), the precursor to Oracle Real Application Cluster
(RAC). In order to address the problem, Oracle developed Cache
Fusion.
Cache fusion is a peer-based shared cache. Each node works with a
certain set of data in its local cache, until another node needs
that data. When one node …
For decades the debate between shared-disk and shared-nothing
databases has raged. The shared-disk camp points to the laundry
list of functional benefits such as improved data consistency,
high-availability, scalability and elimination of
partitioning/replication/promotion. The shared-nothing camp
shoots back with superior performance and reduced costs. Both
sides have a point.
First, let’s look at the performance issue. RAM (average access
time of 200 nanoseconds) is considerably faster than disk
(average access time of 12,000,000 nanoseconds). Let me put this
200:12,000,000 ratio into perspective. A task that takes a single
minute in RAM would take 41 days in disk. So why do I bring this
up?
Shared-Nothing: Since the shared-nothing database has sole
ownership of its data—it doesn’t share the data with other
nodes—it can operate in the machine’s local RAM, only writing
infrequently to disk (flushing the data …
ScaleDB is proud to announce the introduction of a database that
takes data storage to a new level, and a new altitude. ScaleDB’s
patent pending “molecular-flipping technology” enables low energy
molecular flipping that changes selected water molecules from H20
to HOH, representing positive and negative states that mimic the
storage mechanism used on hard drive disks.
“Because we act at the molecular level, we achieve massive
storage density with minimal energy consumption, which is
critical in today’s data centers, where energy consumption is the
primary cost,” said Mike Hogan, ScaleDB CEO. “A single thimble of
water vapor provides the same storage capacity as a high-end
SAN.”
The technology does have one small challenge: persistence. Clouds
are not known for their persistence. ScaleDB relies on the
Cumulus formation, since it is far beefier than some of those
wimpy cirrus clouds. However, when deployed …