Showing entries 61 to 70 of 274
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: mongodb (reset)
How to Purchase [Benchmarking] Hardware on a Budget

One of my goals at Acmebenchmarking is make sure I'm running on hardware that is representative of real-world infrastructure, while at the same time doing it as inexpensively as possible.

To date I've been running on two custom built "desktops" (for lack of a better term). Both have an Intel Core i7 4790K processor (quad core plus hyperthreading, 4Ghz), 32GB RAM (dual channel), and a quality SSD. They are named acmebench01 and acmebench02.

Alas, it is time to expand. MUST...PURCHASE...MORE...HARDWARE!

In order to maintain the inexpensive theme I tend to buy used hardware, my goal on this purchase was to achieve many more cores and greater memory bandwidth than my existing machines can provide. Keep in mind that used hardware is great for benchmarking (and likely development and QA environments) but you might want to avoid it for production. For years now I've been purchasing used hardware …

[Read more]
Capacity planning and QPS

As much as I like the technology from in-memory DBMS vendors like MemSQL, VoltDB and MySQL Cluster I am amused at some of the marketing that describes capacity planning as a function of QPS. It usually isn't true that when a workload has a peak QPS of X and the peak per-server QPS for the product is Y, then X/Y nodes is sufficient. The Oracle keynote from Mr. Big is a an example of this.

Work in web-scale database land is more complex courtesy of additional constraints including database size, network latency and network bandwidth. Workloads don't require X QPS, they do …

[Read more]
Compared to what?

It is common to share big numbers for web-scale database deployments. I do it frequently in presentations. I am not alone in this practice. Is is easy to get large values for QPS with web-scale OLTP (small data). The same is true for database size and rows read rates with web-scale data warehousing (big data).

I hope that So What? is the first reaction when these big numbers are shared. Big numbers only mean that a lot of hardware has been used. Context is what makes these big numbers more or less interesting. Note that I am not saying this to take away from the work done by my peers. I have been fortunate to work with extremely talented teams.

Compared to what? is another useful response. When considering stock mutual funds we look at the performance of the fund relative to a benchmark such as S&P 500. When considering database performance it helps to understand whether an alternative product would …

[Read more]
Fast index create

What is a fast way to create secondary indexes for a write-optimized database engine?

Back in the day the only way to create a secondary index for InnoDB was via incremental maintenance. Insert, update and delete statements would maintain secondary indexes as needed. The CREATE INDEX command for a secondary index would make a copy of the table, define all indexes (primary and secondary) and then scan the source table in PK order and insert rows from the scan into the copy table while maintaining secondary indexes after each insert.  In most cases the result from this is a secondary index subject to changes in a random sequence. That means the secondary index is fully fragmented immediately after index create and there was no way to defragment a secondary index. Fragmentation is bad as it means you are wasting space and using about 1.5X the space for the index compared to the index without fragmentation.

Today …

[Read more]
MySQL shell prompt vs MongoDB shell prompt

Recently Todd Farmer shared an interesting story about the mysql command line prompt in MySQL 5.7: how it was changed to provide more context and why the change was finally reverted. This made me think that after using the command line client for MongoDB for awhile, I would love seeing a much more modern mysql shell prompt. Here are a few examples of what a modern command line client can do.

Add dynamic information to the prompt

If you use replication with MongoDB, you have probably noticed a nice feature of the prompt: it is replication aware. What I mean is that for a standalone instance, the prompt is simply:


When you configure this instance to be the primary of a replica set named RS, the prompt automatically becomes:


and for secondaries, you will see:

[Read more]
Choosing a good sharding key in MongoDB (and MySQL)

MongoDB 3.0 was recently released. Instead of focusing on what’s new – that is so easy to find, let’s rather talk about something that has not changed a lot since the early MongoDB days. This topic is sharding and most specifically: how to choose a good sharding key. Note that most of the discussion will also apply to MySQL, so if you are more interested in sharding than in MongoDB, it could still be worth reading.

When do you want to shard?

In general sharding is recommended with MongoDB as soon as any of these conditions is met:

  • #1: A single server can no longer handle the write workload.
  • #2: The working set no longer fits in memory.
  • #3: The dataset is too large to easily fit in a single server.

Note that #1 and #2 are by far the most common reason why people need sharding. Also note that in the MySQL world, #2 does not imply that you need sharding.

[Read more]
Advanced JSON for MySQL

What is JSON

JSON is an text based, human readable format for transmitting data between systems, for serializing objects and for storing document store data for documents that have different attributes/schema for each document. Popular document store databases use JSON (and the related BSON) for storing and transmitting data.

Problems with JSON in MySQL

It is difficult to inter-operate between MySQL and MongoDB (or other document databases) because JSON has traditionally been very difficult to work with. Up until recently, JSON is just a TEXT document. I said up until recently, so what has changed? The biggest thing is that there are new JSON UDF by Sveta Smirnova, which are part of the MySQL 5.7 Labs releases. Currently the JSON UDF are up to version 0.0.4. While these new UDF are a welcome edition to the MySQL database, they don’t solve the really tough …

[Read more]
Bad Benchmarketing and the Bar Chart

Technical conferences are flooded with visual [mis]representations of a particular product's performance, compression, cost effectiveness, micro-transactions per flux-capacitor, or whatever two-axis comparison someone dreams up. Lets be honest, benchmarketers like to believe we all suffer from innumeracy.

The Merriam-Webster dictionary defines innumeracy as follows:
innumeracy (noun): marked by an ignorance of mathematics and the scientific approach Mark Callaghan has been a long time advocate of explaining benchmark results, but that's not the point of the bar chart. Oh no, the bar chart only exists to catch your eye and …

[Read more]
Increasing Cloud Database Efficiency – Like Crows in a Closet

In Mo’ Data, Mo’ Problems, we explored the paradox that “Big Data” projects pose to organizations and how Tokutek is taking an innovative approach to solving those problems. In this post, we’re going to talk about another hot topic in IT, “The Cloud,” and how enterprises undertaking Cloud efforts often struggle with idea of “problem trading.” Also, for some reason, databases are just given a pass as traditionally “noisy neighbors” and that there is nothing that can be done about it. Lets take a look at why we disagree.

With the birth of the information age came a coupling of business and IT. Increasingly strategic business projects and objectives were reliant on information infrastructure to provide information storage and retrieval instead of paper and filing cabinets. This was the dawn of the database and what gave rise to companies like Oracle, Sybase and MySQL. With the appearance of true Enterprise Grade …

[Read more]
Real-time data loading from Oracle and MySQL to data warehouses, analytics

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem.In this webinar-on-demand we showcase

Showing entries 61 to 70 of 274
« 10 Newer Entries | 10 Older Entries »