In my previous post,http://database-scalability.blogspot.com/2012/05/oltp-vs-analytics.html, I
reviewed the differences between OLTP and Analytics
databases.
Scale challenges are different between those 2 worlds of
databases.
Scale challenges in the Analytics world are with the growing
amounts of data. Most solutions have been leveraging those 3 main
aspects: Columnar storage, RAM and parallelism.
Columnar storage makes scans and data filtering more precise and
focused. After that – it all goes down to the I/O - the faster
the I/O is, the faster the query will finish and bring results.
Faster disks and also SSD can play good role, but above all: RAM! …
Have you ever tried to kill a query, but rather than just go away, it remained among the running ones for an extended period of time? Or perhaps you have noticed some threads makred with killed showing up from time to time and not actually dying. What are these zombies? Why does MySQL sometimes seem to fail to terminate queries quickly? Is there any way to force the kill command to actually work instantaneously? This article sheds some light on it.
Threads and connections
MySQL uses a separate thread for each client connection. A query sent to MySQL is handled by a thread that was previously associated with the connection over which the query arrived. Anyone with sufficient privileges can see the list of currently active threads, along with some additional details, by running SHOW PROCESSLIST command, which returns a table-like view where each connection becomes a separate row:
…[Read more]
My old post (http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for) on the performance gains from batch rewritten statements gets regurgitated in various forums, and conference sessions, and often people miss the nuances of it.
Under the hood, what is happening with this feature is the following:
(1) The driver attempts to detect that the SQL being prepared is
an INSERT. We (on purpose) don‘t ship a full-fledged parser in
the driver, so this works 95% of the time. For the other 5%,
you‘re out of luck unless you can simplify your query text.
(2) If the statement is an INSERT, the driver attempts to
determine if it can be rewritten as a multi-value INSERT. From
the code itself, the conditions are:
// Needs to be INSERT, can‘t have INSERT … SELECT or
// INSERT … ON DUPLICATE KEY UPDATE …
Believe me managing EC2 instances is not as simple or magical as
marketers would like for you to believe. The cloud gets
complicated when it gets bigger. EC2 really only enables a person
to ignore power, network layout (which is bad), and getting more
servers that end up costing more then actual servers fast.
Things that EC2 is missing to make life easier for the
developer:
Ability to update all servers with packages and code. Natively
they do not support the ability to push files or install new
software packages to server groups. Thus install cluster-it and
puppet and write your own deploy program.
Server names and the EC2 AWS management console do not match.
Everything is referenced by instance ids. The interface does not
allow one to launch many instances in a named pattern so you have
to go back and sync up the instance with …
My first computer program was written almost quarter a century ago on a BK-0010 computer. It was very simple: the program asked the user to enter their name and then greeted the user using the entered name, like “Hello, Artem!”. I was fascinated. A couple of lines written in Vilnius BASIC transformed a piece of metal and silicon into a considerate thing that cared about a person’s name enough to remember it :-). Of course, the first experience doesn’t represent the day-to-day routine of software development, but the moments when I see a couple of lines making an amazing transformation still enchant me, and remind me why I’ve been writing code all this time.
I’ve just experienced this very same first-time feeling as we’ve released …
[Read more]Our previous GA release of Percona XtraDB Cluster caused a lot of interest and feedback. I am happy to announce next version Percona XtraDB Cluster 5.5.23, which comes with bug fixes and improvements.
List of changes:
- Fixes merged from upstream (Codership-mysql)
- Support for MyISAM, now changes to MyISAM tables are replicated to other nodes
- Improvements to XtraBackup SST methods, better error handling
- New SST wsrep_sst_method=skip, useful when you start all nodes from the same sources (i.e. backup)
- Ability to pass list of IP addresses for a new node, it will connect to the first available
Binaries are available from downloads area or from our repositories.
For …
[Read more]We are excited to have Gerry Narvaja start today at Tokutek! Gerry has spent more than 25 years in the software industry, most of them working with databases for different kinds of applications, from embedded to large-scale web products. Gerry worked first at MySQL, and then Sun Microsystems supporting the Sales teams. In 2008 he transitioned into being a Senior MySQL DBA. Gerry graduated as an Electronic Engineer from I.T.B.A (Instituto Tecnológico de Buenos Aires) and has an M.B.A. from Universidad del Salvador in collaboration with S.U.N.Y.A (State University of NY at Albany).
Gerry enjoys helping users to solve complex database production issues. For almost a year he has been co-hosting the popular MySQL Community podcast, OurSQL, which was given the …
[Read more]
In an earlier blog we've described the general high-level idea of
how to achieve 10X better performance for MySQL Cluster 7.2
compared to MySQL Cluster 7.1.
Naturally the development is never as straightforward as the
high-level view looks like. In this blog I'll mention a few of
the most important roadblocks on the path to improved performance
of MySQL Cluster 7.2 that we met and resolved.
Initially when we increased the number of LQH threads from 4 to
16 we only saw scaling to 8 LQH threads and we saw no scaling in
going to 16 LQH threads. This was very puzzling since we don't
really have any mutexes that should be an issue. However we
looked into the mutexes that we had and managed to decrease the
number of conflicts on the send mutexes by a factor of 15. This
did however not improve performance at all.
Next we noted using oprofile that there was a few functions that
for some reason 50% of the CPU time …
Why demos?
For as long as I can remember in my public speaking activities, I have always planned my presentations with some sort of live demo in it. I am always surprised when a conference venue asks me to provide my slides in advance, to be loaded in an anonymous computer with no chance of demos. I always turn down such offers, as I always want to provide a demo.
There have been times when technical or time constraints prevented me from demoing something, and in these cases I felt that the presentation was lacking a vital part. But I always try. I have even given demos during lightning talks, and those were the ones that made me feel really good.
I have given hundreds of presentations, and hundreds of demos, and as in every human activity, I have made plenty of mistakes. I believe I have learned some valuable lesson from my mistakes, and this article is my attempt at sharing the joy with wannabe presenters and also with …
[Read more]Reddit takes SQL injection very seriously.
How seriously?
Check their headers:
scabral-07890:~ scabral$ curl --head www.reddit.com
HTTP/1.1 200
OK
Content-Type: text/html;
charset=UTF-8
Set-Cookie:
reddit_first=%7B%22organic_pos%22%3A%201%2C%20%22firsttime%22%3A%20%22first%22%7D;
Domain=reddit.com; expires=Thu, 31 Dec 2037 23:59:59 GMT;
Path=/
Server: '; DROP TABLE servertypes;
--
Date: Sat, 12 May 2012 13:54:20
GMT
Connection: keep-alive
scabral-07890:~ scabral$
A colleague at PICC showed me this when he learned of my talk on MySQL security!