There have been a number of comments that Amazon RDS does
not allow users access to MySQL replication capabilities (for
example here and here). This is a pity. Replication is
one of the great strengths of MySQL and the lack of it is a
show-stopper for many users. As of the latest build
of Tungsten Replicator half of this problem
is on the way to being solved. You can now set up real-time
replication from an external MySQL master into an Amazon RDS
instance.
In the remainder of this article I will explain how to set up
Tungsten replication to an Amazon RDS slave, then add a few
thoughts about why this feature is useful along with some
suggestions for improvement. To keep the article
reasonably short I assume you understand the basics of installing
Tungsten Replicator. If you need more information, check
out the online documentation.
Readying an RDS Test Instance
Amazon RDS is Amazon's on-demand relational database. RDS
supports several database types including MySQL, which I will use
for this demonstration. Launching a new instance is simple.
Login to the Amazon AWS Console using an account that has
RDS enabled, then switch to the Amazon RDS
Console. Press the Launch a DB Instance button,
whereupon a screen like the following appears:
RDS Database Selection |
Press the Select button for MySQL Community Edition, which starts
the configuration window. (You can also replicate into
Oracle if you are up for a challenge. If you do this,
please post what you did to get it to work!) Next fill out
properties for MySQL.
MySQL Instance Configuration |
Among other things, you create a master login and password.
Note these carefully as you will need them to configure
replication. Then continue for another couple of screens,
at which point you can launch your instance. It takes about
10 minutes for new instances to spin up, after which you can see
the instance properties in the AWS RDS Console. Here's a
screen shot of my test instance.
AWS RDS Console |
Once the instance is up, test access. This is of course
necessary to prove the instance is running properly and that we
can login from a remote location. Note the host name in the
Endpoint field. This is a DNS entry for the new MySQL
instance. Using that and the master login, we can now fire
up the mysql client from a remote host where we plan to run
replication.
$ mysql -utungsten -p
-htest.c4villnbpuq1.us-west-1.rds.amazonaws.com
Enter password:
Welcome to the MySQL monitor. Commands end with ; or
\g.
Your MySQL connection id is 2125
Server version: 5.5.27-log Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current
input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| innodb |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.01 sec)
This looks quite good. We are now ready to install and
start replication.
Important note! If you have trouble connecting you may need
to tweak your Amazon security group settings to open up ports,
especially if you are replicating from non-Amazon locations.
Amazon has a very cool feature that can guess your
originating host IP and offer a CIDR address that covers the port range from
which you are operating. I used this when configuring my
security groups.
Setting Up Tungsten Replication
It is possible to set up replication from a MySQL master directly
to Amazon RDS using a single Tungsten Replicator process.
However, it is more versatile and simpler to set up two
replicators: one to read from the MySQL master, and another
replicator to apply to the Amazon RDS slave. I will
therefore demonstrate this configuration.
We will assume you have a MySQL master already running and that
it meets prerequisites for running Tungsten.
Let's now grab the Tungsten code and install a master
replicator. You can get fresh builds from the Tungsten Replicator builds page.
We will take a recent replicator build that contains the RDS
changes, which are documented in Issue 425. Use 2.0.7 build 177 or later.
The main improvement is to add a non-privileged slave mode that
avoids invoking any of the operations forbidden by Amazon.
Among other things Tungsten normally uses commands like
'SET SESSION SQL_LOG_BIN=0' to suppress writing to the binlog
when it applies data on a slave. This command requires
SUPER privilege, hence causes problems for underprivileged RDS
logins.
Unpack the code and install the master replicator in
/opt/continuent. This is no different from installing a
normal Tungsten master. My master host is logos1.
Here are sample commands to pull the code and set up the
master. The example shows the minimum options--if you have
MySQL installed in a non-standard location or otherwise differ
from a stock installation you may need to add additional
options.
mkdir ~/staging
cd staging
wget --no-check-certificate
https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.0.7-177.tar.gz
tar -xf tungsten-replicator-2.0.7-177.tar.gz
tungsten-replicator-2.0.7-177/tools/tungsten-installer \
--master-slave \
--master-host=logos1 \
--datasource-user=tungsten \
--datasource-password=your_passord \
--service-name=aws \
--home-directory=/opt/continuent \
--cluster-hosts=logos1 \
--start-and-report
Next, set up the slave replicator. For convenience I am
going to install the slave on a separate host, named logos2, to
avoid port clashes between the two replicators. If you
install on the same host, you'll need to install into another
release directory location and use the --rmi-port and --thl-port
options to avoid port overlaps. Here is the command to set
up the Amazon RDS slave. Note that the tungsten-installer
program can install code between hosts, which is an extremely
useful feature.
tungsten-replicator-2.0.7-177/tools/tungsten-installer \
--master-slave \
--cluster-hosts=logos2 \
--master-host=logos1 \
--datasource-host=test.c4villnbpuq1.us-west-1.rds.amazonaws.com
\
--datasource-user=tungsten \
--datasource-password=your_password \
--service-name=aws \
--slave-privileged-updates=false \
--home-directory=/opt/continuent \
--skip-validation-check=InstallerMasterSlaveCheck \
--skip-validation-check=MySQLPermissionsCheck \
--start-and-report
You may see a few warnings during the RDS installation, as the
tungsten-installer cannot verify some settings on the Amazon RDS
host. These can be ignored. If everything goes well,
you now have two replicators up and running. You can check
the status of the master and slave using the trepctl command, as
in:
/opt/continuent/tungsten/tungsten-replicator/bin/trepctl -host
logos1 status
/opt/continuent/tungsten/tungsten-replicator/bin/trepctl -host
logos2 status
Both replicators should report that they are online. Now
complete the exercise by proving that replication works between
the replicators. We start by logging into the local MySQL
instance, creating a new table in the test schema, and adding
data.
$ mysql -uroot test
Welcome to the MySQL monitor. Commands end with ; or
\g.
Your MySQL connection id is 231488
Server version: 5.5.21-rel25.1-log Percona Server with XtraDB
(GPL), Release rel25.1, Revision 234
Type 'help;' or '\h' for help. Type '\c' to clear the current
input statement.
mysql> create table foo(id int primary key);
Query OK, 0 rows affected (0.24 sec)
mysql> insert into foo values (256);
Query OK, 1 row affected (0.00 sec)
Now login to the Amazon RDS instance and look for table foo.
$ mysql -utungsten -p
-htest.c4villnbpuq1.us-west-1.rds.amazonaws.com test Enter
password: Reading table information for completion of table
and column names You can turn off this feature to get a quicker
startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2161 Server version: 5.5.27-log
Source distribution
Type 'help;' or '\h' for help. Type '\c' to clear the current
input statement.
mysql> select * from foo; +-----+ | id | +-----+ | 256 |
+-----+ 1 row in set (0.01 sec)
Mission accomplished! We have real-time replication enabled
from a MySQL master to Amazon RDS. At this point you can
replicate more or less normally. There are some obvious
limitations due to the fact that Amazon RDS is locked down and
does not grant our login full privileges.
- Temp table replication may not work. Tungsten depends on being able to issue commands of the form "set @@session.pseudo_thread_id=23531" and the like. This prevents clashes between temp tables of the same name on different master sessions. You may need to enable row replication on the master, which suppresses temp table replication. (For other approaches, see my previous article on temp tables and the binlog.)
- Any command that requires SUPER privilege will not work. As an obvious example, you will not be able to grant SUPER privilege to new accounts. Such commands will break replication.
- All replicated commands go into the binlog, which is a potential performance drag and may slow down Amazon RDS slaves. Parallel replication may not help in this case, since committing to the binlog is a serialization point that blocks other transactions. This problem may be cured if Amazon picks up group commit fixes from MySQL 5.6 and/or MariaDB.
All things considered, however, these are minor inconveniences.
Most applications should be able to replicate without
difficulties, especially if the master transaction rate is not
too high.
Configuring SSL for Connections to RDS
In the previous demonstration I used a master host running
outside Amazon. This means my test transactions traveled
across the Internet, where they were visible to all and sundry
along the way. To illustrate, we can run tcpdump and watch
traffic as it goes by.
$ sudo tcpdump -A -vvv -s 256 host
test.c4villnbpuq1.us-west-1.rds.amazonaws.com
...
logos2.46657 >
ec2-54-241-56-140.us-west-1.compute.amazonaws.com.mysql: Flags
[P.], cksum 0xb0f7 (incorrect -> 0x995d), seq 3012:3073, ack
3412, win 94, options [nop,nop,TS val 89630514 ecr 74223746],
length 61
E..qw>@.@......n6.8..A..4..L...f...^.......
.W.2.l..9....insert into foo values (256) /* ___SERVICE___ =
[aws] */
22:55:36.316049 IP (tos 0x8, ttl 51, id 26226, offset 0, flags
[DF], proto TCP (6), length 63)
If we were handling confidential data, exposing traffic like this
to possible evildoers would be a serious problem.
Fortunately, Amazon RDS supports SSL encrypted connections
from clients. Here is how to use it with Tungsten.
First, you need to get the Amazon RDS certificate, which is used
to sign certificates for individual RDS instances.
mkdir /opt/continuent/certs
cd /opt/continuent/certs
wget https://rds.amazonaws.com/doc/mysql-ssl-ca-cert.pem
Next, you need to create a trust store that Java can access
containing the certificates of signing authorities whom you
trust. For this you will need the Java keytool utility, which is included in the
JDK. If you are just using the Java runtime in production,
you will need to generate the store on another host, then copy it
over to your test hosts. I used the password "secret" in
this example.
$ keytool -import -alias rds -file mysql-ssl-ca-cert.pem
-keystore truststore
Enter keystore password:
Re-enter new password:
Owner: CN=aws.amazon.com/rds/, OU=RDS, O=Amazon.com, L=Seattle,
ST=Washington, C=US
Issuer: CN=aws.amazon.com/rds/, OU=RDS, O=Amazon.com, L=Seattle,
ST=Washington, C=US
...
Trust this certificate? [no]: yes
Certificate was added to keystore
We now need to tell the slave replicator about the truststore
file using Java VM options. On the slave host, edit
/opt/continuent/tungsten/tungsten-replicator/conf/wrapper.conf
and add the extra options shown in bold face.
# Java Additional Parameters
wrapper.java.additional.1=-Dreplicator.home.dir=../../tungsten-replicator/
wrapper.java.additional.2=-Dreplicator.log.dir=../../tungsten-replicator/log
wrapper.java.additional.3=-Dcom.sun.management.jmxremote
wrapper.java.additional.4=-Djavax.net.ssl.trustStore=/opt/continuent/certs/truststore
wrapper.java.additional.5=-Djavax.net.ssl.trustStorePassword=secret
The last step is to enable SSL encryption when applying data.
We need to set an extra URL option on the drizzle JDBC
driver to turn on SSL. For this we need to edit the
static-svc.properties file that configures replication. In
my example this file is located
in /opt/continuent/tungsten/tungsten-replicator/conf/static-aws.properties.
Open the file and look for the section that starts with
APPLIERS. Edit the text to add additional urlOptions line as
shown below.
############ # APPLIERS # ############
replicator.applier.dbms=com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier
replicator.applier.dbms.host=${replicator.global.db.host}
replicator.applier.dbms.port=${replicator.global.db.port}
replicator.applier.dbms.user=${replicator.global.db.user}
replicator.applier.dbms.password=${replicator.global.db.password}
replicator.applier.dbms.urlOptions=?useSSL=true
replicator.applier.dbms.ignoreSessionVars=autocommit
replicator.applier.dbms.getColumnMetadataFromDB=true
Restart the replicator process
(/opt/continuent/tungsten/tungsten-replicator/bin/replicator
restart) and you will now be using SSL encryption. If we
now look back at the tcpdump outout, it looks like garbage as the
following example shows.
... 23:46:52.370904 IP (tos 0x0, ttl 64, id 5899, offset 0, flags
[DF], proto TCP (6), length 105) logos2.44717 >
ec2-54-241-56-140.us-west-1.compute.amazonaws.com.mysql: Flags
[P.], cksum 0xb0ef (incorrect -> 0xd834), seq 3087:3140, ack
4838, win 102, options [nop,nop,TS val 89938121 ecr 74992771],
length 53 E..i..@.@.r....n6.8......zSR.b.....f.......
.\X..xL.....07..|..x.)...T..888.H/...iz...^.W8....'...... ..J
...
This is much better. We are replicating to an Amazon RDS,
and the transactions are safe from prying eyes. If you have
gotten this far you are ready to try your own applications.
Benefits of Replication into Amazon RDS
Amazon RDS is convenient thanks to its quick and simple setup,
but the lack of replication is a severe limitation in building
systems that need more than a single MySQL instance. In
particular it makes it hard to integrate RDS into systems that
consist of more than Amazon RDS itself. Adding the ability
to replicate in real-time into RDS therefore has a number of
benefits. The most obvious include using RDS to extend
existing systems.
First, Amazon RDS can offer a quick way to add read capacity to
existing MySQL applications. This is especially useful if
you have a cluster, such as Tungsten, which handles your
transaction processing and overall HA. You can now add
Amazon RDS read slaves that you discard when no longer needed.
Tungsten Replicator has a number of other useful features
like the ability to read from a group of nodes, not just one,
that make such topologies easy to set up and maintain.
Clusters other than Tungsten will likewise benefit from
this feature.
Second, Amazon RDS is suitable for applications that do not need
24x7 high availability (limitations include slow failover, no
online maintenance, no cross-cloud capabilities, etc.) You
can now pull data from other sources and send them to Amazon RDS
slaves for processing, which amounts to extending overall
processing capacity. For example, you could use RDS to run
back-office tasks using transactions replicated in from MySQL
masters. Tungsten Replicator also replicates data from
Oracle, so this is an additional source of transactions.
There are of course other ways to replicate data to and from RDS,
for example using batch ETL tools like Talend.
However, these are not real-time and often require
application changes to add timestamp columns or otherwise mark
transactions that need to be extracted. Log-based
replication as implemented by Tungsten is fast and has minimal
impact on applications or MySQL itself.
Thoughts about Further Improvements for Amazon RDS
Replication
On our side, i.e., at Continuent, we need to do more testing, add
documentation, and fix problems as they arise. We are
starting a beta test with one of our customers in the next few
days, who incidentally was the same customer who requested this
feature in the first place after hacking it for themselves.
RDS also has some interesting provisioning capabilities
that I would like to understand better. We are also adding
options that eliminate the need for manual configuration of
security settings. This will keep us busy for a
while.
Other improvements depend on changes to RDS itself. An
obvious and huge improvement would be to permit replication
out of Amazon RDS. Unfortunately, Tungsten needs a
login that has REPLICATION_SLAVE privilege so that we can
download binlog data. That privilege is not yet available
to Amazon users. Once it is, Tungsten extraction will also
work in very short order. We actually don't need other commands
like START/STOP SLAVE or FLUSH LOGS--just ability to issue a
COM_BINLOG_EVENT from a client connection
and receive binlog records. I am sure other products would
use this capability as well. (RDS developers, if you are
listening here's an easy way to extend your product usability
significantly...)
Replication is such a valuable feature of MySQL that Amazon RDS
feels somewhat crippled without it. For this reason I would
imagine that Amazon will open up additional capabilities
in the future. Until then we will polish up replication
from MySQL masters to Amazon RDS slaves, awaiting a time when we
can add more features.
In the meantime, I hope you will try the new replication to
Amazon RDS. As noted in this article you can grab the
latest builds and try it yourself. Please report your
experiences through either Continuent Support if you are a
customer or the Tungsten discussion list if you use the open source Tungsten Replicator. I look
forward to your feedback and suggestions for making Amazon RDS
support better.