Showing entries 1 to 10 of 13
3 Older Entries »
Displaying posts with tag: Coalface (reset)
Kafka Replication from MySQL and Oracle

Hello again everybody.

Well, I promised it a couple of weeks ago, and I’m sorry it has been so long (I’ve been working on other fun stuff in addition to this). But I’m pleased to say that we now have a fully working applier that takes data from an incoming THL stream, whether that is Oracle or MySQL, and converts that into a JSON document and message for distribution over a Kafka topic.

Currently, the configuration is organised with the following parameters:

  • The topic name is set according to the incoming schema and table. You can optionally add a prefix. So, for example, if you have a table ‘invoices’ in the schema ‘sales’, your Kafka topic will be sales_invoices, or if you’ve added a prefix, ‘myprefix_schema_table’.
  • Data is marshalled into a JSON document as part of the message, and the structure is to have a bunch of metadata and then an embedded record. You’ll see an …
[Read more]
Replicating into ElasticSearch

So here at Continuent we are working on multiple new targets for applying data using Tungsten Replicator. There are so many potential targets out there where people want to replicate data directly into a specific system, sometimes just for a specific data set, table, database or requirements.

Yesterday afternoon, I started working on ElasticSearch – this morning I have it finished!

As with all solutions, the same basic principles apply – want to pull out of MySQL or Oracle and into something else? That’s fine. Want to replicate to HDFS and ElasticSearch? We do that too!

So what does it look like?

Installation operates just our normal appliers – you just specify the datasource type (ElasticSearch) and the EL host name and port:

tools/tpm configure alpha \
--datasource-type=elasticsearch \
--install-directory=/opt/continuent \ …
[Read more]
Revisiting ZFS and MySQL

While at Percona Live this year I was reminded about ZFS and running MySQL on top of a ZFS-based storage platform.

Now I’m a big fan of ZFS (although sadly I don’t get to use it as much as I used to after I shutdown my home server farm), and I did a lot of different testing back while at MySQL to ensure that MySQL, InnoDB and ZFS worked correctly together.

Of course today we have a completely new range of ZFS compatible environments, not least of which are FreeBSD and ZFS on Linux, I think it’s time to revisit some of my original advice on using this combination.

Unfortunately the presentations and MySQL University sessions back then have all been taken down. But that doesn’t mean the advice is any less valid.

Some of the core advice for using InnoDB on ZFS:

  • Configure a single InnoDB tablespace, rather than configuring multiple tablespaces …
[Read more]
Tungsten Replicator 3.0 is Cloudera Enterprise 5 Certified

One of the key platforms I’ve been testing on for the MySQL to Hadoop replication has been Cloudera, largely driven by customer requirements, but it’s also one of the easiest way to get started with Hadoop.

What I’m even more pleased about is the fact that we are proud to announce that Tungsten Replicator 3.0 is certified for use on the new Cloudera Enterprise 5 platform. That means that we’re sure that replicating your data from MySQL to Cloudera 5 and have it work without causing problems or difficulties on the Hadoop loading and materialisation.

Cloudera is a great product, and we’re very happy to be working so effectively with the new Cloudera Enterprise 5. Cloudera …

[Read more]
Continuent Replication to Hadoop – Now in Stereo!

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my long association with the IBM InfoSphere BigInsights Hadoop product, and I’m pleased to say that it’s working there too. I’ve had to adapt Robert’s original script to work with the different versions of the underlying Hadoop tools and systems to make it compatible. The actual performance and process is unchanged; you just use a different JS-based batchloader script to work with different tools.

Robert has also been simplifying some of the core functionality, such as configuring some fixed pre-determined formats, so you no longer have to explicitly set the field and record separators.

I’ve also been …

[Read more]
Parallel Extractor for Provisioning

Coming up as a new feature in Tungsten Replicator (and written by our replicator expert Stephane Giron) is the ability to provision a new database by using data from an existing database. This new feature comes in the form of a tool called the Parallel Extractor.

The principles are very simple. On the master side:

  • Start the master replicator offline.
  • Switch the replicator to the online provision state.
  • The master replicator pulls the data out of the existing database and writes that information into the Transaction History Log (THL). At this point, the normal replicator thread is not extracting events from the source database.
  • Once the parallel replication has completed, the replicator switches over to normal extraction mode, and starts writing change data into the THL.

On the slave side, the THL events are read as usual from the master and applied to the slave, but …

[Read more]
MySQL to Hadoop Step-By-Step

We had a great webinar on Thursday about replicating from MySQL to Hadoop (watch the whole thing). It was great, but one of the questions at the end was ‘is there an easy way to test’.

Sadly we can’t go giving out convenient ready-to-run downloads of these things because of licensing and and other complexities, so I want to try and make it as simple and straightforward as possible by giving you the directions to complete. I’m going to be point to the Continuent Documentation every now and then so this is not too crowded, but we should get through it pretty easily.

Major Decisions

For this to work: 

  • We’ll setup two VMs, one the master (running MySQL), the other the slave (Running Cloudera) …
[Read more]
Real-Time Replication from MySQL to Cassandra

Earlier this month I blogged about our new Hadoop applier, I published the docs for that this week (http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html) as part of the Tungsten Replicator 3.0 documentation (http://docs.continuent.com/tungsten-replicator-3.0/index.html). It contains some additional interesting nuggets that will appear in future blog posts.

The main part of that functionality that performs the actual applier for Hadoop is based around a JavaScript applier engine – there will eventually be docs for that as part of the Batch Applier content ( …

[Read more]
Getting Data into Hadoop in real-time

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html

[Read more]
Anonymizing Data During Replication

If you happen to work with personal data, chances are you are subject to SOX (Sarbanes-Oxley) whether you like it or not.

One of the worst aspects of this is that if you want to be able to analyse your data and you replicate out to another host, you have to find a way of anonymizing the information. There are of course lots of ways of doing this, but if you are replicating the data, why not anonymize it during the replication?

Of the many cool features in Tungsten Replicator, one of my favorites is filtering. This allows you to process the stream of changes that are coming from the data extracted from the master and perform operations on it. We use it a lot in the replicator for ignoring tables, schemas and columns, and for ensuring that we have the correct information within the THL.

Given this, let’s use it to anonymize the data as it is being replicated so that we don’t need to post-process it for analysis, and …

[Read more]
Showing entries 1 to 10 of 13
3 Older Entries »