Standard MySQL is configurable such that a single master server can be clustered with a number of read-only slave servers. To enable this master-slave replication, master’s transaction logs are communicated to the slaves (log shipping). Log shipping is a form of asynchronous replication. Under this configuration, the data on the slave always remains behind the master, a condition referred to as slave lag or replication lag. The extent of the slave lag depends on workload, network bandwidth and network latency. Database reads can be served out of the slaves, assuming the application has been designed to tolerate the slave lag and requisite staleness of data (eventual consistency), which can at times be variable and opaque. MySQL master-slave replication offers the possibility of promoting a slave to become the new master should the master fail, but this is very painful to do in practice. The cluster has to stop taking ANY writes while it waits for …
[Read more]The next Helsinki MySQL User Group is set for Tuesday, February 19. Lari Pulkkinen from Arbitron Mobile will talk about their project adopting SSD disks for better MySQL performance. Yes, there are benchmarks included.
Note the changed location: Oracle office in Gräsantörmä 2, Espoo. We are glad to have Oracle Finland sponsoring the user group by taking turns as meetup host. Food and sauna will be available after the talk as is customary.
Wow! We at GenieDB have been working on a geo-distributed, multi-datacenter, relational database engine for some time. We’ve believed in this vision of providing distributed RDBMS/SQL database, but had to endure the NoSQL movement and other attempts at refuting the need for such a thing. One whitepaper and what a big difference it makes!
The Spanner whitepaper does just as good a job as any marketing speak of describing what we are after. “Even though many projects happily use Bigtable, we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. … …
[Read more]GenieDB is building a database with global distribution as its core thesis. It is no secret customers demand near-instantaneous and highly reliable service, and that they are becoming more globally dispersed than ever before. We believe that data custodianship must ultimately be moved to the “edge of the web” where it can be dynamically managed in order to improve user experience, optimize network/hardware utilization and reduce TCO. A single datacenter hosted database and application stack runs afoul of this fundamental thesis in a number of ways. In this article we will focus on the issue of improving response time for users even when they are globally distributed. This is simply a matter of physics and how long it takes to transmit a packet between the two locations. No amount of application tuning can overcome this obstacle.
The obvious solution is to have multiple copies of the …
[Read more]By last Friday morning the open bugs count raised above 150 mark and we managed to take it down to under 25 by the end of the day, thanks to the dedicated effort by the team. Among them, one was to make the Nodejs server run continuously. In our application we are using the Nodejs […]
I’ve been working on a data archival project over the last couple weeks and thought it would be interesting to discuss something a bit counter-intuitive. Absolutes are never true, but when getting rid of data, it’s usually more efficient to insert the data being kept into a new table rather than deleting the old data from the existing table.
Here is our example table from the IMDB database.
mysql> show create table title\G
*************************** 1. row ***************************
Table: title
Create Table: CREATE TABLE `title` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` text NOT NULL,
`imdb_index` varchar(12) DEFAULT NULL,
`kind_id` int(11) NOT NULL,
`production_year` int(11) DEFAULT NULL,
`imdb_id` int(11) DEFAULT NULL,
`phonetic_code` varchar(5) DEFAULT NULL,
`episode_of_id` int(11) DEFAULT NULL,
`season_nr` int(11) …[Read more]
A coworker came to me with a perplexing issue. He wanted to know why these two queries were not returning the same results:
mysql> SELECT COUNT(*)
-> FROM parent
-> WHERE id NOT IN (SELECT parent_id FROM child);
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (7.84 sec)
mysql> SELECT COUNT(*)
-> FROM parent p
-> WHERE NOT EXISTS(SELECT 1
-> FROM child c
-> WHERE p.id = c.parent_id);
+----------+
| count(*) |
+----------+
| 5575 |
+----------+
1 row in set (2.95 sec)
At first (and second, and third) glance these two queries look identical. It obviously is an exclusion join and because the MySQL optimizer is what it is, I decided to rewrite it as a LEFT JOIN to see what results came back:
mysql> SELECT …[Read more]
This is the second and final part of my notes from the MySQL conference. In this part I'll focus on the technical substance of talks I saw, and didn't see.
More than ever before I was a contributor rather than attendee at this conference. Looking back, this resulted in seeing less talks than I would have wanted to, since I was speaking or preparing to speak myself. Sometimes it was worse than speaking, for instance I spent half a day picking up pewter goblets from an egnravings shop... (congratulations to all the winners again :-) Luckily, I can make up for some of that by going back and browse their slides. This is especially important whenever 2 good talks are scheduled in the same slot, or in the same slot when I was to speak. So I have categorized topics here along various axes, but also along the "things I did see" versus "things I missed" axis.
My own talks
…
[Read more]Have you ever wanted to get a list of indexes and their columns for all tables in a MySQL database without having to iterate over SHOW INDEXES FROM ‘[table]’? Here are a couple ways…
The following query using the INFORMATION_SCHEMA STATISTICS table will work prior to MySQL GA 5.6 and Percona Server 5.5.
SELECT table_name AS `Table`,
index_name AS `Index`,
GROUP_CONCAT(column_name ORDER BY seq_in_index) AS `Columns`
FROM information_schema.statistics
WHERE table_schema = 'sakila'
GROUP BY 1,2;
This query uses the INNODB_SYS_TABLES, …
[Read more]Note: I’ve decided not to use Veewee due to silly compatibility issues for now.
Quoting from Vagrant’s web site:
Vagrant is a tool for building and distributing virtualized development environments. By providing automated creation and provisioning of virtual machines using Oracle’s VirtualBox, Vagrant provides the tools to create and configure lightweight, reproducible, and portable virtual environments.
A complementary technology called Veewee makes building VirtualBox VMs easier by automating away a lot of manual steps. Marius Ducea has a great blog post on how to use it.
My observations:
1. According to Vagrant’s web site, it should work on Windows.
I’ve …