Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 30 of 38 Next 8 Older Entries

Displaying posts with tag: tokumx (reset)

ClusterControl Module for Puppet
+0 Vote Up -0Vote Down
July 7, 2014 By Severalnines

If you are automating your infrastructure using Puppet, then this blog is for you. We are glad to announce the availability of a Puppet module for ClusterControl. For those using Chef, we already published Chef cookbooks for Galera Cluster and ClusterControl some time back.  

 

 

ClusterControl on Puppet Forge

 

The ClusterControl module initial release is available on Puppet Forge, installing the

  [Read more...]
Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics
+1 Vote Up -0Vote Down
June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then

  [Read more...]
Best Practices for Partitioned Collections and Tables in TokuDB and TokuMX
+0 Vote Up -1Vote Down

In my last post, I gave a technical explanation of the performance characteristics of partitioned collections in TokuMX 1.5 (which is right around the corner) and partitioned tables in relational databases. Given those performance characteristics, in this post, I will present some best practices when using this feature in TokuMX or TokuDB. Note that these best practices are designed for TokuMX and TokuDB only, which

  [Read more...]
Understanding the Performance Characteristics of Partitioned Collections
+0 Vote Up -0Vote Down

In TokuMX 1.5 that is right around the corner, the big feature will be partitioned collections. This feature is similar to partitioned tables in Oracle, MySQL, SQL Server, and Postgres. A question many have is “why should I use partitioned tables?” In short, it’s complicated. The answer depends on your workload, your schema, and your database of choice. For example, this Oracle related post states “Anyone with un-partitioned databases over 500 gigabytes is courting disaster.” That’s not true for TokuDB or TokuMX. Nevertheless,

  [Read more...]
Webinar Replay, Slides & Q&A: Introducing ClusterControl 1.2.6 - Managing your MySQL, MariaDB & MongoDB Clusters
+0 Vote Up -0Vote Down
May 19, 2014 By Severalnines

 

Thanks to everyone who attended and participated last week’s joint webinar on ClusterControl 1.2.6! We had great questions from participants (thank you), most of which are transcribed below with our answers to them.

 

If you missed the sessions or would like to watch the webinar again & browse through the slides, they are now available online.

 

Webinar topics discussed: 

  • Database Infrastructure Lifecycle
  • Deploy, Monitor, Manage,
  [Read more...]
Thoughts on Small Datum – Part 3
+0 Vote Up -0Vote Down

Background: If you did not read my first blog post about why I am sharing my thoughts on the benchmarks published by Mark Callaghan on Small Datum you may want to skim through it now for a little context: Thoughts on Small Datum – Part 1”

~~~~~~~~~~~~~~~~~~~~~~~~

Last time, in Thoughts on Small Datum – Part 2 I shared my cliff notes and a graph on Mark Callaghan’s (@markcallaghan) March 11th insertion rate benchmarks using flash storage media. In those tests he compares MySQL (http://www.mysql.com/) outfitted with the

  [Read more...]
New Release Webinar on May 13th: Introducing ClusterControl 1.2.6 - Live Demo
+0 Vote Up -0Vote Down
May 7, 2014 By Severalnines

 

Following the release of ClusterControl 1.2.6 a couple of weeks ago, we are now looking forward to demonstrating this latest version of the product on Tuesday next week, May 13th.

 

This release contains key new features (along with performance improvements and bug fixes), which we will be demonstrating live during the webinar. 

 

Highlights include:

  • Centralized
  [Read more...]
Maybe You Should Try Taking a Walk in My Shoes
+0 Vote Up -0Vote Down

The title of this post should really be, “Maybe He Should Try Taking a Walk in Your Shoes.”

The he I’m referring to is economist and author, Tim Harford. The you is the people who use NewSQL and NoSQL approaches to mine big data with database platforms like MySQL (http://www.mysql.com" target="_blank) and MongoDB (or, preferably, our high-performance distributions of them, TokuDB and TokuMX).

Why should Mr. Harford take that walk? Well, he recently

  [Read more...]
Thoughts on Small Datum – Part 2
+0 Vote Up -0Vote Down

If you did not read my first blog post about Mark Callaghan’s (@markcallaghan) benchmarks as documented in his blog, Small Datum, you may want to skim through it now for a little context.

——————-

On March 11th, Mark, a former Google and now Facebook database guru, published an insertion rate benchmark comparing MySQL (http://www.mysql.com) outfitted with the InnoDB storage engine with two NoSQL alternatives — basic MongoDB and TokuMX (the Tokutek high-performance

  [Read more...]
How TokuMX Secondaries Work in Replication
+0 Vote Up -0Vote Down

As I’ve mentioned in previous posts, TokuMX replication differs quite a bit from MongoDB’s replication. The differences are large enough such that we’ve completely redone some of MongoDB’s existing algorithms. One such area is how secondaries apply oplog data from a primary. In this post, I’ll explain how.

In designing how secondaries apply oplog data, we did not look closely at how MongoDB does it. In fact, I’ve currently forgotten all I’ve learned about MongoDB’s implementation, so I am not in a position to compare the two. I think I recall that MongoDB’s oplog idempotency was a key to their

  [Read more...]
My Favorite MongoDB Replication Feature: Crash Safety
+0 Vote Up -0Vote Down

At an extremely high level, replication in MongoDB and MySQL are similar. Both databases have exactly one machine, the primary (or master), that accepts writes from clients. With a single transaction (or atomic operation, in MongoDB’s case), the tables and oplog (or binary log in MySQL) are modified to reflect the change. The log captures what the change is so other secondaries (or slaves) can read the changes and process them, making the slaves identical to the master. (Note that I am NOT talking about multi-master replication.)

Underneath the covers, their implementations are quite different. And in peeking underneath the covers while developing TokuMX, I learned

  [Read more...]
ClusterControl 1.2.5 Released
+2 Vote Up -0Vote Down
March 5, 2014 By Severalnines

The Severalnines team is pleased to announce the release of ClusterControl 1.2.5. This release contains key new features along with performance improvements and bug fixes. We have outlined some of the key features below. 

For additional details about the release:

  [Read more...]
How TokuMX was Born
+2 Vote Up -0Vote Down

With TokuMX 1.4 coming out soon, with (teaser) wonderful improvements made to sharding and updates (and plenty of other goodies), I’ve recently reminisced about how we got TokuMX to this point. We (actually, really John) started dabbling with integrating Fractal Tree® indexes into MongoDB in the summer of 2012, where we (really, he) prototyped using Fractal Tree indexes only for secondary indexes. As cool as that prototype was, it

  [Read more...]
The Effects of Database Heap Storage Choices in MongoDB
+0 Vote Up -0Vote Down

William Zola over at MongoDB gave a great talk called “The (Only) Three Reasons for Slow MongoDB Performance”. It reminded me of an interesting characteristic of updates in MongoDB. Because MongoDB’s main data store is a flat file and secondary indexes store offsets into the flat file (as I explain here), if the location of a document changes, corresponding entries in secondary indexes must also change. So, an update to an unindexed field that causes the document to move also causes modifications to every secondary index, which, as William points out, can be expensive. If a document has indexed an array, this

  [Read more...]
Webinar Replay & Slides: Repair & Recovery for Your MySQL, MariaDB & MongoDB / TokuMX Clusters
+0 Vote Up -0Vote Down
January 23, 2014 By Severalnines

 

Thanks to everyone who attended this week’s webinar; if you missed the sessions or would like to watch the webinar again and browse through the slides, they are now available online.

 

Special thanks to Seppo Jaakola from Codership, the creators of Galera Cluster, for walking us through the various scenarios of Galera recovery. 

 

Webinar topics discussed: 

  • Redundancy models for Galera, NDB and MongoDB / TokuMX
  • Failover & Recovery (Automatic vs Manual)
  [Read more...]
New Webinar: Repair and Recovery for your MySQL, MariaDB and MongoDB/TokuMX Clusters
+1 Vote Up -0Vote Down
December 19, 2013 By Severalnines


Database clusters are pretty sophisticated distributed systems with complex dependencies between nodes. The failure of a node will generally impact the overall cluster, as the remaining nodes need to reconfigure themselves to continue to operate without the failed node. Since re-introducing a node will also affect the existing cluster, the timing could therefore be dependent on the state of the other nodes in the cluster. Repair and restarts often needs to be performed


  [Read more...]
December 17 Webinar: Use Your MySQL Knowledge to Become a MongoDB Guru
+0 Vote Up -0Vote Down

Use your MySQL expertise to analyze the strengths and weaknesses of MongoDB.

SPEAKER: Tim Callaghan, VP of Engineering at Tokutek
DATE: Tuesday, December 17th
TIME: 1pm ET
Register Now!

MongoDB is a popular NoSQL DBMS that shares the ease-of-use and quick setup that made MySQL famous. But is MongoDB really up to the job? Is it right for your applications? If you understand MySQL well, you know how database systems work.

Join Tim Callaghan, VP/Engineering at Tokutek as he recaps his and CEO of Continuent, Robert Hodges, session from 2013′s Percona Live London. Learn how to lean on your knowledge of topics like schema design, query optimization, indexing, sharding, and high availability to analyze the strengths and weaknesses of MongoDB. System design is all about asking the right




  [Read more...]
ClusterControl 1.2.4 Released
+2 Vote Up -0Vote Down
November 19, 2013 By Severalnines

The Severalnines team is pleased to announce the release of ClusterControl 1.2.4. This release contains key new features along with performance improvements and bug fixes.

We have outlined some of the key features below. For additional details about the release:

  [Read more...]
Put your MySQL Knowledge to Good Use with Tim Callaghan at Percona Live-London, November 12
+0 Vote Up -0Vote Down

Attending Percona Live in London next week?

Don’t miss the chance to hear Tokutek’s Vice President of Engineering, Tim Callaghan, discuss how to use your MySQL knowledge to become an instant MongoDB Guru and the advantages of using Fractal Tree&#174 indexes in MySQL and MongoDB. Tim will be speaking about these topics in two separate sessions at 12:00pm and 5:00pm on November 12.

For more information on these sessions and Percona Live-London, visit https://www.percona.com/live/london-2013/users/tim-callaghan.

Introducing TokuMX Transactions for MongoDB Applications
+1 Vote Up -1Vote Down

Since our initial release last summer, TokuMX has supported fully ACID and MVCC multi-statement transactions. I’d like to take this post to explain exactly what we’ve done and what features are now available to the user.

But before beginning, an important note: we have implemented this for non-sharded clusters only. We do not support distributed transactions across different shards.

At a high level, what have we done?

We have taken MongoDB’s basic transactional behavior, and extended it. MongoDB is transactional with respect to one, and only one, document. MongoDB guarantees single document atomicity. Journaling provides durability

  [Read more...]
Introducing TokuMX Clustering Indexes for MongoDB
+1 Vote Up -0Vote Down

Since introducing TokuMX, we’ve discussed benefits that TokuMX has for existing MongoDB applications that require no changes. In this post, I introduce an extension we’ve made to the indexing API: clustering indexes, a tool that can tremendously improve query performance. If I were to speak to someone about clustering indexes, I think the conversation could go something like this…

What is a Clustering Index?

A clustering index is an index that stores the entire document, not just the defined key.

A common example is

  [Read more...]
A TokuDB Stall Caused by Conflicting Transactions When Opening a Table
+0 Vote Up -0Vote Down

One of our customers reported that ‘create table select from’ statements stall for a period of time equal to the TokuDB lock timeout.  This indicated a lock conflict between multiple transactions.  In addition, other MySQL clients that were opening unrelated tables were also stalled.  This indicated that some shared mutex is held too long.  We discuss details about this bug and how it was fixed.  The bug fix will be distributed in TokuDB 7.1.0.

Example
Suppose that we set the tokudb lock timeout to 60 seconds just to exaggerate the stall.

mysql> set global tokudb_lock_timeout=60000;
Query OK, 0 rows affected (0.00 sec)

We then create a simple table.

mysql> create table s (id int primary key);
Query OK, 0 rows affected (0.02







  [Read more...]
How we built TokuMX
+0 Vote Up -0Vote Down

When I get to talk to people about TokuMX and how it’s an optimized MongoDB, I sometimes get follow-up questions like:

  • “Is it an in-memory proxy?”
  • “Write optimized? So you buffer all of the writes in memory and lose them on crash?”
  • “Did you re-implement the server and match the protocol?”

None of these things describe TokuMX, but it demonstrates that there are many schools of thought on how to optimize databases, and MongoDB in particular. I’d like to elaborate more on what TokuMX really is and how we built it. First, let’s talk about what MongoDB is.

MongoDB consists of a server process that stores data and executes queries, mongod, a sharding router process, mongos, as well as a wire protocol for interacting with these

  [Read more...]
TokuMX Hot Backup – Part 3
+0 Vote Up -0Vote Down

Last week I described TokuDB’s new Hot Backup feature.  This week we are going to briefly discuss the same feature, but as it was added to TokuMX, our version of MongoDB.

Since the Hot Backup library is essentially a shim between MySQL and the Linux kernel, intercepting file system calls for the life of the process, it should be easy to add this to any other system, including TokuMX.  Indeed with our addition of transactions and logging to TokuMX we can gain a consistent backup of any data set at any time.

Unlike MySQL, where system tables use the non-transactional MyISAM storage engine, TokuMX uses internal

  [Read more...]
A TokuDB Stall Caused by a Big Transaction and How It was Fixed
+1 Vote Up -0Vote Down

One of our customers sometimes observed lots of simple insertions taking far longer than expected to complete. Usually these insertions completed in milliseconds, but the insertions sometimes were taking hundreds of seconds. These stalls indicated the existence of a serialization bug in the Fractal Tree index software, so the hunt was on. We found that these stalls occurred when a big transaction was committing and the Fractal Tree index software was taking a checkpoint. This problem was fixed in both TokuDB 7.0.3 and TokuMX 1.0.3. Please read on as we describe some details about this bug and how we fixed it. We describe some of the relevant Fractal Tree index algorithms first.

What is a Big Transaction?

Each transaction builds a rollback log as it performs Fractal Tree index operations. The rollback log is maintained in

  [Read more...]
Lock Diagnostics and Index Usage Statistics in TokuMX v1.2.1
+0 Vote Up -0Vote Down

TokuMX v1.2.1 introduces two simple new features to help you understand the performance characteristics of your database: lock diagnostics and index usage statistics. We’d like to take you through a few examples of what these features are and how to use them.

Lock Diagnostics

Since we introduced TokuMX, one of the most frequent complaints has been about “lock not granted” errors.  These arise when a long-running operation takes document-level locks, and other clients timeout while waiting to acquire the same locks.

This is a new problem in TokuMX that doesn’t exist in MongoDB, because MongoDB

  [Read more...]
TokuMX vs. MongoDB : In-Memory Sysbench Performance
+3 Vote Up -0Vote Down

In talking to existing MongoDB users and TokuMX evaluators, I’ve often heard that the performance of MongoDB is very good as long as your working data set fits in RAM. The story continues that if your working data set grows to be larger than the RAM on your server, the built-in sharding capabilities of MongoDB allow you to scale horizontally.

As my benchmarking presentation at Percona Live 2013 pointed out, I’m never one to accept something without at least running it once myself. I decided to run my

  [Read more...]
Announcing TokuMX v1.2: Hot Backup
+1 Vote Up -0Vote Down

We’ve been hard at work on TokuMX since it’s initial release just over 2 months ago. Today we released TokuMX v1.2 which includes Hot Backup in the Enterprise Edition.

Hot Backup allows users to create a backup of a running TokuMX primary or secondary server in a replica set, with no blocking of writes for clients. We will be blogging more about the Hot Backup technology in the coming weeks. This same technology is used for Hot Backup in TokuDB.

Also worth noting are the features we’ve added since the initial TokuMX release:

  • Migration Tools. Migrate to TokuMX from MongoDB using our tool that replays MongoDB repication. This allows a TokuMX server to stay in sync with a MongoDB replica set, reducing downtime for production go-live.
  • Bulk Loading.
  [Read more...]
Building TokuMX and TokuDB for Production
+1 Vote Up -0Vote Down

Recently, we’ve seen a few people ask us about building TokuMX from scratch. While it’s best if you just use the binaries you can get from us (they have all the right optimizations, we’ve tested them, and we can interpret coredumps they generate), we recognize there are other reasons you might need to do a custom build.

Since we actually build six distinct products all using the Fractal Tree indexing® library (community and enterprise versions of TokuDB for MySQL, TokuDB for MariaDB, and TokuMX), our build process is pretty complicated, compared to software packages that might, for example, just involve one source repository and link against a few standard libraries. Our TokuMX builds involve four git repositories, three

  [Read more...]
Slides from Boston MongoDB User Group Meetup on 7/31/13
+0 Vote Up -0Vote Down

On Wednesday night, the Boston MongoDB User group was kind enough to have me speak about TokuMX Internals. I spoke about Fractal Tree® indexes and the technical reasons behind the benefits they provide to MongoDB applications. Although the talk mostly references TokuMX and MongoDB, all the theory applies to TokuDB and MySQL as well.

My slides are on our technology overview page, along with other great content.

Opportunities to present technical material to an engaged audience asking tough questions is rare, and much appreciated. So thank you to the Boston MongoDB User group for having me present.

Showing entries 1 to 30 of 38 Next 8 Older Entries

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.