Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting on top of that. For the examples below I will use the “ontime flight performance” data from my previous post (Increasing MySQL performance with parallel query execution). I’ve used the Cloudera Manager v.4 to install Apache Hadoop and Impala. For this test …
[Read more]A little background…
When I ventured into sales and marketing (I’m an engineer by education) I learned I would often have to interpret and simply summarize the business value that is sometimes hidden in benchmarks. Simply put, the people who approve the purchase of products like TokuDB® and TokuMX™ appreciate the executive summary.
Therefore, I plan to publish a multipart series here on TokuView where I will share my simple summaries and thoughts on business value for the benchmarks Mark Callaghan (@markcallaghan), a former Google and now Facebook database guru, is publishing on his blog, Small Datum.
I’m going to start with his first benchmark post and work my way forward to …
[Read more]This week we talk to Justin Swanhart about Shard-Query, a massively parallel processing query engine for MySQL. Ear candy is about mydumper, and At the Movies is "Common Deadly MySQL Development Mistakes."
Events
Percona Live
DB Hangops -
every other Wednesay at noon Pacific time
Upcoming MySQL events
Training
SkySQL Trainings
This is “A new dimension to MySQL query optimizations – part 2″. If you didn’t read part 1 already I suggest you skim through it before reading on.
To summarize, the problem at hand is this: Given a query with a join between two or more tables, the MySQL optimizer’s mission is to find the best combination of join order and access method so that the response time becomes as low as possible. The optimizer does this by calculating the cost of each combination and then picks the cheapest one.
Consider the following query:
SELECT * FROM employee JOIN department ON employee.dept_no=department.dept_no WHERE employee.first_name="John" AND employee.hire_date BETWEEN "2012-01-01" AND "2012-06-01"
The optimizer will calculate the cost of the alternative plans as follows:
total cost = …[Read more]
A real fast list of stuff from the Percona Live 2014 event.
Yahoo’s Performance Analyzer
Yahoo is developing a MySQL performance analyzer that should be released as open source later this year. From the demo, it looks like it pulls in most of the MySQL metrics, shows you a processlist and then lets you drill into a processlist with explain. Will have to keep my eye out for this.
ChatOps with Hubot
GitHub’s Sam Lambert has a set of hubot chatops scripts for MySQL. I was already looking at depoying hubot for the ability to push messages from a remote source into an IRC channel, so this would be a natural fit. He also mentioned using Janky to tie CI with hubot.
DevOps at Outbrain
Shlomi …
[Read more]We have released some code in a labs release that does compression at the InnoDB IO layer. Let me answer the most frequently asked question. It will work on any OS/File system that supports sparse files and has “punch hole” support. It is not specific to FusionIO. However, I’ve been told by the FusionIO developers that you will get two benefits from FusionIO + NVMFS, no fragmenation issues and more space savings because of a smaller file system block size. Why the block size matters I will attempt to explain next.
The high level idea is rather simple. Given a 16K page we compress it using your favorite compression algorithm and write out the only the compressed data. After writing out the data we “punch a hole” to release the unused part of the original 16K block back to the file system. Let me illustrate with an example:
[DDDDDDDDDDDDDDDD] -> Compress -> [CCCCFFFFFFFFFFFF]
D – Data
F – Free …
We are refactoring the SQL parser: the sql_yacc.yy file and friends.
Refactoring the parser consists of a base task that provides the common framework for refactoring, and several follow-up tasks to refactor individual types of SQL statements. We have already completed two follow-up tasks: one to refactor SELECT statements, and another to refactor SET statements.
Parser refactoring goals and tasks
The old parser had critical limitations because of its grammar complexity and top-down parsing style:
- The top-down parsing style is not natural for Bison/YACC parser generators (they generate bottom-up parsers), which lead …
It’s not radical to claim that one of the most important tasks of any DBMS query optimizer is to find the best join order for the tables participating in a query. Is it, e.g., better to read country or city first in
SELECT * FROM country JOIN city ON country.id=city.countryid WHERE city.population > 1000000 AND country.region="EMEA"
employee or department first in
SELECT * FROM employee JOIN department ON employee.dept_no=department.dept_no WHERE employee.first_name="John" AND employee.hire_date BETWEEN "2012-01-01" AND "2012-06-01" AND department.location="Paris"
If the optimizer gets this wrong, the resulting response time may be disastrous (or hilarious, depending on your sense of humour).
Simply put (and ignoring some edge cases), the MySQL optimizer does the following to find the cheapest combination of access methods and join order for the second query above:
Calculate …
[Read more]The first version of MySQL Cluster 7.4 has now been released on MySQL Labs. Note that labs loads are not suitable for production use (in fact they’re even less mature than Development Milestone Releases); their purpose is to give users a chance to see what’s in the works, try it for themselves and then provide feedback. Having read that, if you’d like to try it out then Download MySQL Cluster 7.4 from MySQL Labs.
The focus of this first Cluster 7.4 load is performance and data node restart times.
Performance
MySQL Cluster was designed from the outset to be a distributed, in-memory database and …
[Read more]Background
InnoDB is a transactional storage engine. Two parts of the acronym ACID (atomicity and durability) are guaranteed by write-ahead logging (WAL) implemented by the InnoDB redo log.
A statement within a user transaction can consist of multiple operations, such as inserting a record into an index B-tree. Each low-level operation is encapsulated in a mini-transaction that groups page-level locking and redo logging. For example, if an insert would cause a page to be split, a mini-transaction will lock and modify multiple B-tree pages, splitting the needed pages, and finally inserting the record.
On mini-transaction commit, the local mini-transaction log will be appended to the global redo log buffer, the page locks will be released and the modified pages will be inserted into the flush list. Only after the log buffer has been written …
[Read more]