Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 30 of 177 Next 30 Older Entries

Displaying posts with tag: mongodb (reset)

How to answer questions
+1 Vote Up -0Vote Down
Are you somehow responsible for operating a DBMS in production? Congratulations! You are going to get a lot of questions so learn how to be efficient while answering them. Questions can be cheap to ask because speculation is free. A non-speculative answer is expensive when it requires research and experiments. Good and bad questions will arrive faster than good answers can be provided. How do you remain productive assuming you don't ignore the questions? Make the answer as inexpensive as the question. Problem solved? Here is a short guide.

Preferred version

I prefer these questions over the ones that follow. I don't know whether the non-preferred variants are asked more frequently or whether I am more likely to remember them. My blogs, smalldatum and mysqlha, have a
  [Read more...]
ClusterControl Module for Puppet
+0 Vote Up -0Vote Down
July 7, 2014 By Severalnines

If you are automating your infrastructure using Puppet, then this blog is for you. We are glad to announce the availability of a Puppet module for ClusterControl. For those using Chef, we already published Chef cookbooks for Galera Cluster and ClusterControl some time back.  

 

 

ClusterControl on Puppet Forge

 

The ClusterControl module initial release is available on Puppet Forge, installing the

  [Read more...]
Benchmark(et)ing
+1 Vote Up -0Vote Down
Benchmarking and benchmarketing both have a purpose. Both also have a bad reputation. A frequently expressed opinion is that benchmark results are useless. I usually disagree. I don't mind benchmarketing and think it is a required part of product development but I am not fond of benchmarketing disguised as benchmarking.

Benchmarketing is a common activity for many DBMS products whether they are closed or open source. Most products need new users to maintain viability and marketing is part of the process. The goal for benchmarketing is to show that A is better than B. Either by accident or on purpose good benchmarketing results focus on the message A is better than B rather than A is better than B in this context. Note that the context can be critical and includes the hardware, workload, whether both systems were properly configured and some attempt to

  [Read more...]
Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics
+1 Vote Up -0Vote Down
June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then

  [Read more...]
Best Practices for Partitioned Collections and Tables in TokuDB and TokuMX
+0 Vote Up -1Vote Down

In my last post, I gave a technical explanation of the performance characteristics of partitioned collections in TokuMX 1.5 (which is right around the corner) and partitioned tables in relational databases. Given those performance characteristics, in this post, I will present some best practices when using this feature in TokuMX or TokuDB. Note that these best practices are designed for TokuMX and TokuDB only, which

  [Read more...]
Understanding the Performance Characteristics of Partitioned Collections
+0 Vote Up -0Vote Down

In TokuMX 1.5 that is right around the corner, the big feature will be partitioned collections. This feature is similar to partitioned tables in Oracle, MySQL, SQL Server, and Postgres. A question many have is “why should I use partitioned tables?” In short, it’s complicated. The answer depends on your workload, your schema, and your database of choice. For example, this Oracle related post states “Anyone with un-partitioned databases over 500 gigabytes is courting disaster.” That’s not true for TokuDB or TokuMX. Nevertheless,

  [Read more...]
Webinar Replay, Slides & Q&A: Introducing ClusterControl 1.2.6 - Managing your MySQL, MariaDB & MongoDB Clusters
+0 Vote Up -0Vote Down
May 19, 2014 By Severalnines

 

Thanks to everyone who attended and participated last week’s joint webinar on ClusterControl 1.2.6! We had great questions from participants (thank you), most of which are transcribed below with our answers to them.

 

If you missed the sessions or would like to watch the webinar again & browse through the slides, they are now available online.

 

Webinar topics discussed: 

  • Database Infrastructure Lifecycle
  • Deploy, Monitor, Manage,
  [Read more...]
Write amplification from log writes
+0 Vote Up -0Vote Down
MongoDB, TokuMX and MySQL use log files with high-value data. For MongoDB this is the journal that uses direct IO. For MySQL this is the binlog, relay log and InnoDB redo log,  all use buffered IO by default, the InnoDB redo log uses 512 bytes as the "page size"  and the replication logs have no notion of page size.

The minimum size for a write to a storage device is either the sector size or filesystem page size. The sector size is either 512 or 4096 bytes today. In the future it will be 4096 or larger. The filesystem page size on my servers is 4096 bytes. When a DBMS tries to append 309 bytes to the end of a log file then more than 309 bytes are written to the storage device. Depending on the filesystem and choice of buffered or direct IO either a disk sector or

  [Read more...]
Thoughts on Small Datum – Part 3
+0 Vote Up -0Vote Down

Background: If you did not read my first blog post about why I am sharing my thoughts on the benchmarks published by Mark Callaghan on Small Datum you may want to skim through it now for a little context: Thoughts on Small Datum – Part 1”

~~~~~~~~~~~~~~~~~~~~~~~~

Last time, in Thoughts on Small Datum – Part 2 I shared my cliff notes and a graph on Mark Callaghan’s (@markcallaghan) March 11th insertion rate benchmarks using flash storage media. In those tests he compares MySQL (http://www.mysql.com/) outfitted with the

  [Read more...]
Interview with John Partridge, President & CEO of Tokutek, Inc.
+0 Vote Up -0Vote Down

“As the database gets used, shards can grow at an uneven rate and one shard might carry a majority of the load. MongoDB corrects this by balancing shards, but because of MongoDB’s lack of concurrency this operation can stall the database unacceptably.”–John Partridge.

I have interviewed John Partridge, President & CEO of Tokutek, Inc.

RVZ

Q1. Tokutek recently announced to have eliminated performance issues of MongoDB sharding. What was the problem?

John Partridge: The problem occurs after a shard is created. As the database gets used, shards can grow at an uneven rate and one shard might carry a majority of the load. MongoDB corrects this by balancing

  [Read more...]
New Release Webinar on May 13th: Introducing ClusterControl 1.2.6 - Live Demo
+0 Vote Up -0Vote Down
May 7, 2014 By Severalnines

 

Following the release of ClusterControl 1.2.6 a couple of weeks ago, we are now looking forward to demonstrating this latest version of the product on Tuesday next week, May 13th.

 

This release contains key new features (along with performance improvements and bug fixes), which we will be demonstrating live during the webinar. 

 

Highlights include:

  • Centralized
  [Read more...]
Eventual consistency of NoSQL marketing
+0 Vote Up -0Vote Down
Yesterday I learnt an important lesson about an important difference between NoSQL and MySQL, at least when it comes to the marketing and hype.

I saw a tweet from around marketing of one of NoSQL leaders:

Most people apparently would just conclude from the tweet's text, however I actually clicked the link, and couldn't believe eyes:

I guess that in NoSQL, when it comes to the integrity of data as well as hype - it is eventually consistent...



The impact of read ahead and read size on MongoDB, TokuMX and MySQL
+0 Vote Up -0Vote Down
This continues my work on using a very simple workload (read-only, fetch 1 document by PK, database larger than RAM, fast storage) to understand how to get more QPS from TokuMX and MongoDB. I need to get more random read IOPs from storage to get more QPS from the DBMS. Beyond getting more throughput I ran tests to understand the impact of filesystem readahead on MongoDB and TokuMX and the impact of the read page size on TokuMX and InnoDB. My results are based on fast flash storage that can do more than 100,000 4kb reads/second. Be careful when trying to use these results for slower storage devices, especially disks.

In my previous post I wrote that I was able to get

  [Read more...]
Maybe You Should Try Taking a Walk in My Shoes
+0 Vote Up -0Vote Down

The title of this post should really be, “Maybe He Should Try Taking a Walk in Your Shoes.”

The he I’m referring to is economist and author, Tim Harford. The you is the people who use NewSQL and NoSQL approaches to mine big data with database platforms like MySQL (http://www.mysql.com" target="_blank) and MongoDB (or, preferably, our high-performance distributions of them, TokuDB and TokuMX).

Why should Mr. Harford take that walk? Well, he recently

  [Read more...]
Thoughts on Small Datum – Part 2
+0 Vote Up -0Vote Down

If you did not read my first blog post about Mark Callaghan’s (@markcallaghan) benchmarks as documented in his blog, Small Datum, you may want to skim through it now for a little context.

——————-

On March 11th, Mark, a former Google and now Facebook database guru, published an insertion rate benchmark comparing MySQL (http://www.mysql.com) outfitted with the InnoDB storage engine with two NoSQL alternatives — basic MongoDB and TokuMX (the Tokutek high-performance

  [Read more...]
How to Setup Centralized Authentication of ClusterControl Users with LDAP
+0 Vote Up -0Vote Down
April 24, 2014 By Severalnines

ClusterControl 1.2.6 introduces integration with Active Directory and LDAP authentication. This allows users to log into ClusterControl by using their corporate credentials instead of a separate password. LDAP groups can be mapped onto ClusterControl user groups to apply roles to the entire group, so it is very convenient for larger organizations who have a centralized LDAP-compliant authentication system. This blog shows you how to configure LDAP authentication in ClusterControl, and allow users to use their Active Directory or LDAP username and password to log in to ClusterControl. 

  [Read more...]
Concurrent, read-only, not cached: MongoDB, TokuMX, MySQL
+0 Vote Up -0Vote Down
I repeated the tests described here using a database larger than RAM. The test database has 8 collections/tables with 400M documents/rows per table. I previously reported results for this workload using a server with 24 CPU cores and a slightly different flash storage device. This time I provide a graph and use a server with more CPU cores. The goal for this test is to determine whether the DBMS can use the capacity of a high-performance storage device, the impact from different filesystem readahead settings for MongoDB and TokuMX and the impact from different read page sizes for TokuMX and InnoDB. It will take two blog posts to share everything. I think I will have much better QPS  [Read more...]
RW locks are hard
+1 Vote Up -0Vote Down
MongoDB and TokuMX saturated at a lower QPS rate then MySQL when running read-only workloads on a cached database with high concurrency. Many of the stalls were on the per-database RW-lock and I was curious about the benefit from removing that lock. I hacked MongoDB to not use the RW-lock per query (not safe for production) and repeated the test. I got less than 5% more QPS at 32 concurrent clients. I expected more, looked at performance with PMP and quickly realized there were several other sources of mutex contention that are largely hidden by contention on the per-database RW-lock. So this problem won't be easy to fix but I think it can be fixed.

The easy way to implement a reader-writer lock uses the pattern

  [Read more...]
Concurrent, read-only & cached: MongoDB, TokuMX, MySQL
+1 Vote Up -0Vote Down
This has results for a read-only workload where all data is cached. The test query fetches all columns in one doucment/row by PK. For InnoDB all data is in the buffer pool. For TokuMX and MongoDB all data is in the OS filesystem cache and accessed via mmap'd files. The test server has 40 CPU cores with HT enabled and the test clients share the host with mysqld/mongod to reduce variance from network latency. This was similar to a previous test, except the database is in cache and the test host has more CPU cores. The summary of my results is:
  • MongoDB 2.6 has a performance regression from using more CPU per query. The regression might be limited to simple queries that do single row lookups on the _id index. I spent a bit of time rediscovering how to get hierarchical

  [Read more...]
Biebermarks
+1 Vote Up -0Vote Down
Yet another microbenchmark result. This one is based on behavior that has caused problems in the past for a variety of reasons which lead to a few interesting discoveries. The first was that using a short lock-wait timeout was better than the InnoDB deadlock detection code. The second was that no-stored procedures could overcome network latency.

The workload is a large database where all updates are done to a small number of rows. I think it is important to use a large database to include the overhead from searching multiple levels of a b-tree. The inspiration for this is maintaining counts for popular entities like Justin Bieber and

  [Read more...]
TokuMX, MongoDB and InnoDB, IO-bound update-only with fast storage
+0 Vote Up -0Vote Down
I repeated the update-only IO-bound tests using pure-flash servers to compare TokuMX, MongoDB and InnoDB. The test setup was the same as on the pure-disk servers except for the hardware. In this case the servers have fast flash storage, 144G of RAM and 24 CPU cores with HT enabled. As a reminder, the InnoDB change buffer and TokuMX fractal tree don't help on this workload because there are no secondary indexes to maintain. Note that all collections/tables are in one database for this workload thus showing the worst-case for the MongoDB per-database RW-lock. The result summary:
  • InnoDB is much faster than MongoDB and TokuMX. This test requires a high rate of dirty page writeback and thanks to a lot of work from the InnoDB team at MySQL with help from Percona and

  [Read more...]
MongoDB, TokuMX and InnoDB for disk IO-bound, update-only by PK
+0 Vote Up -0Vote Down
I used sysbench to measure TPS for a workload that does 1 update by primary key per transaction. The database was much larger than RAM and the server has a SAS disk array that can do at least 2000 IOPs with a lot of concurrency. The update is to a non-indexed column so there is no secondary index maintenance which also means there is no benefit from a fractal tree in TokuMX or the change buffer in InnoDB. I also modified the benchmark client to avoid creating a secondary index. Despite that TokuMX gets almost 2X more TPS than InnoDB and InnoDB gets 3X to 5X more TPS than MongoDB.
  • TokuMX is faster because it doesn't use (or waste) random IOPs on writes so more IO capacity is

  [Read more...]
How TokuMX Secondaries Work in Replication
+0 Vote Up -0Vote Down

As I’ve mentioned in previous posts, TokuMX replication differs quite a bit from MongoDB’s replication. The differences are large enough such that we’ve completely redone some of MongoDB’s existing algorithms. One such area is how secondaries apply oplog data from a primary. In this post, I’ll explain how.

In designing how secondaries apply oplog data, we did not look closely at how MongoDB does it. In fact, I’ve currently forgotten all I’ve learned about MongoDB’s implementation, so I am not in a position to compare the two. I think I recall that MongoDB’s oplog idempotency was a key to their

  [Read more...]
Why aren't you using X, version 2
+1 Vote Up -0Vote Down
Sometimes I get asked why am I not using product X where X is anything but MySQL. The products that are suggested change over time and the value of X very much depends on the person asking the question. An ex-manager from my days at Oracle told me that Oracle would be better and developers from the SQL Server team told me the same. For those keeping score there was a social network that ran SQL Server and they were kind of enough to explain why.

Too often this is an assertion rather than a question and it would be more clear to say "I think you should be using X". A better question would be "Why are you using MySQL". This is the burden we carry for running MySQL at scale, but I am not in search of

  [Read more...]
MongoDB, TokuMX and InnoDB for concurrent inserts
+0 Vote Up -0Vote Down
I used the insert benchmark with concurrent insert threads to understand performance limits in MongoDB, TokuMX and InnoDB. The database started empty and eventually was much larger than RAM. The benchmark requires many random writes for secondary index maintenance for an update-in-place b-tree used by MongoDB and InnoDB. The test server has fast flash storage. The work per transaction for this test is inserting 1000 documents/rows where each document/row is small (100 bytes) and has 3 secondary indexes to maintain. The test used 10 client connections to run these transactions concurrently and each client uses a separate collection/table. The performance summaries listed below are based on the context for this test -- fast storage, insert heavy with secondary index maintenance. My conclusion from running many insert benchmark tests is that I don't want to load big databases with  [Read more...]
MongoDB, TokuMX and InnoDB for disk IO-bound, read-only point queries
+0 Vote Up -0Vote Down
This repeats a test that was done on pure-flash servers. The goals are to determine whether the DBMS can efficiently use the IO capacity of a pure-disk server.  The primary metrics are the QPS that the DBMS can sustain and the ratio of disk reads per query. The summary is that a clustered primary key index makes TokuMX and InnoDB much more IO efficient for PK lookups on IO-bound workloads.

TokuMX and InnoDB get much more QPS than MongoDB from the same IO capacity for this workload. TokuMX and InnoDB have a clustered primary index. There is at most 1 disk read per query assuming all non-leaf nodes from the index are in memory and all leaf nodes are not in memory. With MongoDB the primary key index is not clustered so there can be a disk read for the leaf node of

  [Read more...]
Notes on the storage stack
+0 Vote Up -0Vote Down
If you want high performance and quality of service from a DBMS then you need the same from the OS. The MySQL/Postgres/MongoDB crowd doesn't always speak with the Linux crowd. On the bright side there is a good collection of experts from the Linux side of things at my employer and we have begun speaking. There were several long threads on the PG hackers lists about PG+Linux and this lead to a meeting at the LSFMM summit. I am very happy these groups met. We have a lot to learn from each other. DBMS people can explain our IO patterns and get motivated to write DBMS workload simulators (like innosim,   [Read more...]
Insert benchmark on disks, part 2
+1 Vote Up -0Vote Down
I ran more insert benchmark tests for InnoDB on pure disk servers. The previous results with a lot more detail are here. My goal in this case was to use better configuration options for InnoDB on disk and to understand the impact of innodb_flush_neighbors. With the better settings InnoDB sustains a much higher insert rate.

The first problem in the previous tests was that I used a few settings that are better for flash than disk with InnoDB so I increased innodb_write_io_threads from 4 to 32, reduced

  [Read more...]
Insert benchmark for flash, part 2
+1 Vote Up -0Vote Down
I repeated a few of the long running tests for the insert benchmark using flash storage. My goals were to test at least one new configuration, repeat a few tests to confirm the configuration was what I claimed it was and to confirm the impact of doing fsync-on-commit during this test. In this test the write operation adds 1000 small documents and the redo log write is not small. From casual observation I did not see a big impact from doing fsync-on-commit (or a big benefit from not doing it) but that is the point of this post. From the results here the impact from fsync-on-commit still appears to be minor. But remember that this is specific to one workload for which many KB of data are written to the redo log or journal per commit and for which the eventual bottleneck is random  [Read more...]
TokuMX, MongoDB and InnoDB on IO-bound point queries
+1 Vote Up -0Vote Down
I used sysbench to understand whether TokuMX, MongoDB and InnoDB can use most of the IOPs provided by a fast flash device and whether their use is efficient. The workload query fetches one document/row by primary key. This is a very simple workload but helps me to understand how disk read requests are processed. The primary metric is the QPS that can be sustained when the database is much larger than RAM. A secondary metric is the number of disk reads per query during the test. Efficiency, not doing too many reads per query, matters when you want to support many concurrent users.

tl;dr - new database engines are usually worse on multi-core than old database engines. I know there are exceptions to this rule for both new engines, like WiredTiger and RethinkDB, and old engines that won't be

  [Read more...]
Showing entries 1 to 30 of 177 Next 30 Older Entries

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.