Showing entries 61 to 70 of 277
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: mongodb (reset)
Much ado about nothing

Kyle does amazing work with Jepsen and I am happy that he devotes some of his skill and time into making MongoDB better. This week a new problem was reported as stale reads despite the use of the majority write concern. Let me compress the bug report and blog post for you, but first this isn't a code bug this is expected behavior for async replication.

  1. MongoDB implements asynchronous master-slave replication
  2. Commits can be visible to others on the master before oplog entries are sent to the slave

The problem occurs when transaction 1 commits a change on the master, transaction 2 views that change on the master, the master disappears and a slave is promoted to be the new master where the oplog entries from transaction 1 …

[Read more]
Why Percona Acquired Tokutek: by Peter Zaitsev

It is my pleasure to announce that Percona has acquired Tokutek and will take over development and support for TokuDB® and TokuMX™ as well as the revolutionary Fractal Tree® indexing technology that enables those products to deliver improved performance, reliability and compression for modern Big Data applications.

At Percona we have been working with the Tokutek team since 2009, helping to improve performance and scalability. The TokuDB storage engine has been available for Percona Server for about a year, so joining forces is quite a natural step for us.

Fractal Tree indexing technology—developed by years of data science research at MIT, Stony Brook University and Rutgers University—is the new generation data structure which, for many workloads, leapfrogs traditional B-tree technology which was invented in 1972 (over 40 years ago!).  It is also often …

[Read more]
How to win at IO-bound benchmarks

I spend time evaluating database performance on IO-bound workloads and have learned a few things that can help you to get much better results on benchmarks than will occur in production. But this is really about benchmarketing.

The tips are:

  • Load the database in key order to avoid a fragmented B-Tree index or a randomized LSM tree. If there is a secondary index then create it after the load to avoid fragmentation there. This makes the database smaller and can make searches faster.
  • Only use 10% of the storage device. This reduces the average seek latency for disks. This also reduces the overhead from flash GC because there is less live data to copy out during erase block cleaning. This also makes it less likely that erase block cleaning will be needed during the test. Erase-block cleaning can also be avoided by …
[Read more]
How to Purchase [Benchmarking] Hardware on a Budget

One of my goals at Acmebenchmarking is make sure I'm running on hardware that is representative of real-world infrastructure, while at the same time doing it as inexpensively as possible.

To date I've been running on two custom built "desktops" (for lack of a better term). Both have an Intel Core i7 4790K processor (quad core plus hyperthreading, 4Ghz), 32GB RAM (dual channel), and a quality SSD. They are named acmebench01 and acmebench02.

Alas, it is time to expand. MUST...PURCHASE...MORE...HARDWARE!

In order to maintain the inexpensive theme I tend to buy used hardware, my goal on this purchase was to achieve many more cores and greater memory bandwidth than my existing machines can provide. Keep in mind that used hardware is great for benchmarking (and likely development and QA environments) but you might want to avoid it for production. For years now I've been purchasing used hardware …

[Read more]
Capacity planning and QPS

As much as I like the technology from in-memory DBMS vendors like MemSQL, VoltDB and MySQL Cluster I am amused at some of the marketing that describes capacity planning as a function of QPS. It usually isn't true that when a workload has a peak QPS of X and the peak per-server QPS for the product is Y, then X/Y nodes is sufficient. The Oracle keynote from Mr. Big is a an example of this.

Work in web-scale database land is more complex courtesy of additional constraints including database size, network latency and network bandwidth. Workloads don't require X QPS, they do …

[Read more]
Compared to what?

It is common to share big numbers for web-scale database deployments. I do it frequently in presentations. I am not alone in this practice. Is is easy to get large values for QPS with web-scale OLTP (small data). The same is true for database size and rows read rates with web-scale data warehousing (big data).

I hope that So What? is the first reaction when these big numbers are shared. Big numbers only mean that a lot of hardware has been used. Context is what makes these big numbers more or less interesting. Note that I am not saying this to take away from the work done by my peers. I have been fortunate to work with extremely talented teams.

Compared to what? is another useful response. When considering stock mutual funds we look at the performance of the fund relative to a benchmark such as S&P 500. When considering database performance it helps to understand whether an alternative product would …

[Read more]
Fast index create

What is a fast way to create secondary indexes for a write-optimized database engine?

Back in the day the only way to create a secondary index for InnoDB was via incremental maintenance. Insert, update and delete statements would maintain secondary indexes as needed. The CREATE INDEX command for a secondary index would make a copy of the table, define all indexes (primary and secondary) and then scan the source table in PK order and insert rows from the scan into the copy table while maintaining secondary indexes after each insert.  In most cases the result from this is a secondary index subject to changes in a random sequence. That means the secondary index is fully fragmented immediately after index create and there was no way to defragment a secondary index. Fragmentation is bad as it means you are wasting space and using about 1.5X the space for the index compared to the index without fragmentation.

Today …

[Read more]
MySQL shell prompt vs MongoDB shell prompt

Recently Todd Farmer shared an interesting story about the mysql command line prompt in MySQL 5.7: how it was changed to provide more context and why the change was finally reverted. This made me think that after using the command line client for MongoDB for awhile, I would love seeing a much more modern mysql shell prompt. Here are a few examples of what a modern command line client can do.

Add dynamic information to the prompt

If you use replication with MongoDB, you have probably noticed a nice feature of the prompt: it is replication aware. What I mean is that for a standalone instance, the prompt is simply:

>

When you configure this instance to be the primary of a replica set named RS, the prompt automatically becomes:

RS:PRIMARY>

and for secondaries, you will see:

[Read more]
Choosing a good sharding key in MongoDB (and MySQL)

MongoDB 3.0 was recently released. Instead of focusing on what’s new – that is so easy to find, let’s rather talk about something that has not changed a lot since the early MongoDB days. This topic is sharding and most specifically: how to choose a good sharding key. Note that most of the discussion will also apply to MySQL, so if you are more interested in sharding than in MongoDB, it could still be worth reading.

When do you want to shard?

In general sharding is recommended with MongoDB as soon as any of these conditions is met:

  • #1: A single server can no longer handle the write workload.
  • #2: The working set no longer fits in memory.
  • #3: The dataset is too large to easily fit in a single server.

Note that #1 and #2 are by far the most common reason why people need sharding. Also note that in the MySQL world, #2 does not imply that you need sharding.

[Read more]
Advanced JSON for MySQL

What is JSON

JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL

It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don’t solve the really tough …

[Read more]
Showing entries 61 to 70 of 277
« 10 Newer Entries | 10 Older Entries »