Showing entries 31 to 40 of 277
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: mongodb (reset)
Percona Server for MongoDB storage engines in iiBench insert workload

We recently released the GA version of Percona Server for MongoDB, which comes with a variety of storage engines: RocksDB, PerconaFT and WiredTiger.

Both RocksDB and PerconaFT are write-optimized engines, so I wanted to compare all engines in a workload oriented to data ingestions.

For a benchmark I used iiBench-mongo (https://github.com/mdcallag/iibench-mongodb), and I inserted one billion (bln) rows into a collection with three indexes. Inserts were done in ten parallel threads.

For memory limits, I used a 10GB as the cache size, with a total limit of 20GB available for the mongod process, …

[Read more]
Read, write & space amplification - B-Tree vs LSM

This post compares a B-Tree and LSM for read, write and space amplification. The comparison is done in theory and practice so expect some handwaving mixed with data from iostat and vmstat collected while running the Linkbench workload. For the LSM I consider leveled compaction rather than size-tiered compaction. For the B-Tree I consider a clustered index like InnoDB.

The comparison in practice provides values for read, write and space amplification on real workloads. The comparison in theory attempts to explain those values.

B-Tree vs LSM in …

[Read more]
Read, write & space amplification - pick 2

Good things come in threes, then reality bites and you must choose at most two. This choice is well known in distributed systems with CAPPACELC and FIT. There is a similar choice for database engines. An algorithm can optimize for at most two from readwrite and space amplification. These are metrics for efficiency …

[Read more]
Define better for a small-data DBMS

There are many dimensions by which a DBMS can be better for small data workloads: performance, efficiency, manageability, usability and availability. By small data I mean OLTP. Performance gets too much attention from both industry and academia while the other dimensions are at least as important in the success of a product. Note that this discussion is about which DBMS is likely to get the majority of new workloads. The decision to migrate a solved problem is much more complex.

  • Performance - makes marketing happy
  • Efficiency - makes management happy
  • Manageability - makes operations happy
  • Usability - makes databased-backed application developers happy
  • Availability - makes users happy


Performance makes marketing happy because they can publish a whitepaper to show their product …

[Read more]
Percona Live 2016 Call for Papers Open! Introducing: Community Voting

The Call for Papers for the fifth annual Percona Live Data Performance Conference and Expo (formerly MySQL Conference and Expo) taking place April 18-21, 2016, in Santa Clara, California, is now officially open!

Ask yourself… “Do I have…”

  • Fresh ideas?
  • Enlightening case studies?
  • Insight on best practices?
  • In-depth technical knowlege?

If the answer to any of these is “YES,” then you need to submit your proposal for a chance to speak at Percona Live 2016. Speaking is a great way to broaden not only the awareness of your company with an intelligent and engaged audience of software …

[Read more]
Losing it?

Many years ago the MySQL team at Google implemented semi-sync replication courtesy of Wei Li. The use case for it was limited and the community was disappointed that it did not provide the semantics they wanted. Eventually group commit was implemented for binlog+InnoDB: first by the MySQL team at Facebook, then much better by MariaDB, and finally in upstream MySQL. With group commit some magic was added to give us lossless semisync and now we have automated, lossless and fast failover in MySQL without using extra replicas. This feature is a big deal. I look forward to solutions for MariaDB (via binlog server and MaxScale) and MySQL (via …

[Read more]
Problems not worth fixing

I worked on a closed-source DBMS years ago and the development cycle was 1) code for 6 months 2) debug for 18 months. Part 2 was longer than part 1 because the customers have high expectations for quality and the product delivers on that. But it was also longer because some co-workers might not have been code complete after 6 months and were quietly extending part 1 into part 2.

I fixed many bugs during part 2. One that I remember was in the optimizer. Someone added code to detect and remove duplicate expressions (A and A and B --> A and B). Alas the algorithm was O(N*N). Analytics queries can have complex WHERE clauses and all queries were forced to pay the O(N*N) cost whether or not they had any duplicate expressions. 
The overhead for this feature was reasonable after I fixed the code but it raises an interesting question. Are some problems not worth the cost of prevention? Several times a year I see …

[Read more]
My Guidebook for Percona Live Amsterdam, 2015 - Part I

Unfortunately I am not going to Percona Live Amsterdam 2015 that starts next week. Somebody has to work and keep those customers happy who prefer to work as usual even during such a festival.

I am still asking myself: if I'd be able to go there, what tutorials and sessions I'd like to attend (besides those I'd present)? "All of them" is not a correct answer, as often great and useful sessions happens at the same time in different places. So, I decided to create a list of sessions for myself (as a reminder to check slides at least after they are published), and I'd like to share it.

I tried to pick up one session …

[Read more]
MongoDB and Percona TokuMX Security Guidelines

Several reports we’re published in the news about how easy it is to access data stored in some NoSQL systems, including MongoDB. This is not surprising because security was rather relaxed in earlier versions of MongoDB . This post lists some of the common vulnerabilities in MongoDB and Percona TokuMX.

Network Security

One key point is to ensure that the bind_ip setting is correctly adjusted: in MongoDB 2.4 and Percona TokuMX, it is not set which means that the server will listen to all available network interfaces. If proper firewall rules (iptables, Security Groups in AWS, …) are not in place, your dataset could easily be queried from anywhere in the world!

In MongoDB 2.6+, bind_ip is set by default to 127.0.0.1 in the official .deb and .rpm packages. This is great from a security point of view, but remember that you’ll still have to adjust the setting if the application servers are not …

[Read more]
Transactions with RocksDB

RocksDB has work-in-progress to support transactions via optimistic and pessimistic concurrency control. The features need more documentation but we have shared the API, additional code for pessimistic and optimistic and examples for pessimistic and optimistic. Concurrency control …

[Read more]
Showing entries 31 to 40 of 277
« 10 Newer Entries | 10 Older Entries »