Oracle released a bunch of MySQL stuff they’ve been working on since the last huge release, and my blog reader filled up with a few dozen posts I’m gonna have to read through so I don’t feel ignorant. Dear MySQL Engineering Team, could you take pity on me and release these gradually over the course of a month or so next time? Especially since Google discontinued Reader, and I’m using Feedly now, and it has a bug that I can’t figure …
[Read more]It’s common wisdom that large-scale database systems require distributing the data across machines. But what seems to be missing in a lot of discussions is distributing the query processing too. By this I mean the actual computation that’s performed on the data.
I just had a conversation with Peter Zaitsev yesterday that helped make concrete some thoughts I’ve been having about Cassandra for a while. Because Cassandra doesn’t allow you to really do any computation in the data (aggregating, evaluating expressions, and so on), if you’re going to use it for truly Big data, you’re going to fetch enormous amounts of data across the network. Sure, you’re distributing the storage and retrieval across many machines — but you’re locating your data far from your processing. You have a distant low-level key-value store, in essence, and you have to write a database wrapper on top of it if you’re going to use it for anything …
[Read more]Continuing with my wishlist, I’ll add windowing functions. They’re enormously powerful. They allow you to extend relational logic beyond the strict boundaries of tuples. In MySQL at present, one must use ugly hacks to preserve state from one row to the next, such as user variables — which are not guaranteed to work if the optimizer changes the query plan.
And yeah, PostgreSQL and SQL Server have windowing functions too, and once you’ve used them it’s a little hard to go back. This is in fact one of the main things I hear from people who love PostgreSQL for what I consider to be legitimate reasons.
Windowing functions extend the uses of SQL (sometimes awkwardly, sometimes elegantly), into areas you can’t really go without them. Time-series data, for example, or more powerful graph processing. These things must be done externally to SQL otherwise, in ugly procedural logic.
Windowing functions together with CTEs …
[Read more]The pace of MySQL engineering has been pretty brisk for the last few years. I think that most of the credit is due to Oracle, but one should not ignore Percona, Monty Program, Facebook, Google, Twitter, and others. Not only are these organizations (and the individuals I haven’t mentioned) innovating a lot, they’re providing pressure on Oracle to keep up the improvements, too.
But if you look back over the last few years, MySQL is still functionally a lot like it used to be. OK, we’ve got row-based binary logging — but we had binary logging and replication before, this is just a variation on a theme. Partitioning — that’s a variation on a theme (partitioned tables are a variation on non-partitioned tables). Performance — same thing, only faster. And so on.
I’m painting things with too broad a brush. There’s actually a lot of stuff that’s NOT just a variation.
But if you look around at what’s out there …
[Read more]I’ve been considering using TokuDB for a large dataset, primarily because of its high compression. The data is append-only, never updated, rarely read, and purged after a configurable time.
I use partitions to drop old data a day at a time. It’s much more efficient than deleting rows, and it lets me avoid indexing the data on the time dimension. Partitioning serves as a crude form of indexing, as well as helping purge old data.
I wondered if TokuDB supports partitioning. Then I remembered some older posts from the Tokutek blog about partitioning. The claim is that “there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.”
I’m not sure this is true for my use case, for a couple of reasons.
First, I clearly fall into the only category that the flowchart …
[Read more]If you’re in the Washington, DC area on Sept 12th, be sure to attend Percona University. This is a free 1-day mini-conference to bring developers and system architects up to speed on the latest MySQL products, services and technologies. Some of the topics being covered include Continuent Tungsten; Percona XtraDB Cluster; MySQL Backups in the Real World; MariaDB 10.0; MySQL 5.6 and Percona Server 5.6; Apache Hadoop.
I’ll be speaking about using MySQL with Go. I’ll talk about idiomatic database/sql code, available drivers for MySQL, and tips and tricks that will save you time and frustration.
Continuent is sponsoring a complimentary breakfast and Percona will also provide refreshments throughout the day, along with a raffle for a chance to win cool t-shirts, copies of “High Performance MySQL,” and a few other great prizes. …
[Read more]
The MySQL Utilities Team is pleased to announce the latest GA
release of
MySQL Utilities. This release marks a milestone of concentrated
effort to
expand the use of utilities in more diverse installations through
improved
robustness, error handling, and quality.
Many Improvements
There are number such enhancements in this release. In this post
we will
highlight a few of the more significant improvements.
- (new utility) MySQL .frm Reader (mysqlfrm) - read .frm files and generate CREATE statements with or without a server connection.
- (revised) improved documentation including a section on example administrative tasks - see http://dev.mysql.com/doc/workbench/en/mysql-utilities.html
- MySQL Utilities is packaged for .msi, .rpl, .deb platforms and source .tar/.zip …
In my recent SQL work for a large Swiss bank, I have maintained nested database view monsters whose unnested SQL code amounted up to 5k lines of code, joining the same table over and over again in separate subselects combined via UNION operations. This monster performed in way under 50ms, no matter how we queried … Continue reading The Myth About Slow SQL JOIN Operations →
I’ll be joining Percona for a free day of MySQL education and insight at their upcoming Percona University Washington DC event on September 12th. My topic is accessing MySQL from Google’s Go programming language. I’ve learned a lot about this over the past year or so, and hopefully I can help you get a quick-start.
If you’re not familiar with Go, it’s the darling of the Hacker News crowd these days. Anything with “Go” in its title gets to the front page for at least a little while! Go is a great systems programming language. It’s safe to say I’ve fallen in love with it, and it’s now my favorite programming language of all those I’ve used over my entire career. I chose it because it’s ideally suited for VividCortex’s agent programs (zero dependencies, compiled, lightweight, high performance, robust, makes concurrency …
[Read more]
Over time, some options have crept into the MySQL server which
are comma separated lists of options. These include SQL_MODE,
optimizer_switch, optimizer_trace and a few other
variables.
Optimizer_switch is particularly problematic to work with as it
contains many options and is hard to interpret. This is what you
usually see when you examine optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,subquery_materialization_cost_based=on,use_index_extensions=on
As you can see, seeing which option is on or off is rather
difficult. You can use the REPLACE function to make this
easier:
mysql> select replace(@@optimizer_switch, ',','\n')\G …[Read more]