Amazon EBS volumes come with a very cool feature called "lazy
loading". In a nutshell: if a volume is created from an existing
snapshot, it can become available almost immediately without
waiting for all data to be restored. This allows for extremely
fast provisioning of large data sets as long as you don't
explicitly require the entire data set to be present before you
start using it.
When an EBS volume is restored from snapshot, its blocks are
fetched from Amazon S3. It happens either lazily in the
background or explicitly on demand (think of a pagefault-like
mechanism) and of course, fetching pieces of data from Amazon S3
is going to be one-two orders of magnitude slower than reading
blocks directly from a volume.
In this short article, I will try to give you an idea of how this
may impact the crash recovery time of your MySQL databases. Why
talk about this? Depending on the workload and data set layout,
crash recovery of a MySQL …
It goes without saying that crash recovery of busy MySQL servers
(and many other RDBMS for that matter) is not an extremely quick
process. In MySQL context, one of the worst case scenarios is
when the server is used for multi-tenant application hosting i.e.
when the MySQL instance contains hundreds or thousands of schemas
and (tens/hundreds of) thousands of tablespaces. In such
scenario, the server may spend a considerable amount of time in
the tablespace discovery phase, during which MySQL
builds a mapping between tablespace IDs and names of actual
tablespace files on disk.
MySQL 5.7 promises to put an end to tablespace discovery. The
documentation lists the following improvements introduced in
versions 5.7.5 and up:
- Elimination of file system scans prior to redo log application. The MLOG_FILE_NAME redo log …
The OPTIMIZE TABLE statement provides allows MySQL DBAs to
reorganize physical table storage in order to achieve two main
goals:
- Improve IO efficiency for reads/writes against the table,
- Reclaim unused storage space.
For InnoDB tables, OPTIMIZE is implemented via ALTER TABLE
operation. This is true in all MySQL versions, however MySQL
5.7.4 introduced a change that may catch many users off guard and
is therefore worth highlighting. Theory The InnoDB storage engine
does not directly support OPTIMIZE. Instead, when a DBA invokes
OPTIMIZE TABLE x, InnoDB executes ALTER TABE x FORCE. In
MySQL versions prior to 5.7.4, OPTIMIZE is not an online
operation i.e. the table is locked and DML operations against the
table are not allowed until OPTIMIZE finishes.
MySQL 5.7.4 introduces a significant change around OPTIMIZE,
namely the ALTER TABLE operation now uses …
The "innodb_read_only" parameter is perhaps not
among the most commonly used MySQL parameters. It may however
influence server behaviour in a way that you may not expect. This
post is a record of an investigation that revealed one of MySQL
quirks caused by said parameter.
Background The MySQL 5.6 server topology I was looking at during
this investigation consisted of a single main server (R/W) and
multiple replicas serving read-only statements.
This is relatively simple and you wouldn't expect to run into
issues on the replicas as long as you stick to statements that do
not modify data, right?
Not quite. Issue I was asked to investigate can be described as
follows:
The DBA created two tables connected using a foreign key
relationship:
CREATE TABLE `employee` ( `e_id` int(11) NOT …[Read more]
Recently, I happened to have an onsite engagement and the goal of the engagement was to move a database service to RDS Aurora. Like probably most of you, I knew the service by name but I couldn’t say much about it, so, I Googled, I listened to talks and I read about it. Now that my onsite engagement is over, here’s my first impression of Aurora.
First, let’s describe the service itself. It is part of RDS and, at first glance, very similar to a regular RDS instance. In order to setup an Aurora instance, you go to the RDS console and you either launch a new instance choosing Aurora as type or you create a snapshot of a RDS 5.6 instance and migrate it to Aurora. While with a regular MySQL RDS instance you can create slaves, with Aurora you can add reader nodes to an existing cluster. An Aurora cluster minimally consists of a …
[Read more]People at Intel started the pmem library project some time ago, it’s open to the broader community at GitHub and other developers, including Linux kernel devs, are actively involved.
While the library does allow interaction with an SSD using a good-old-filesystem, we know that addressing SSD through SATA or SAS is very inefficient. That said, the type of storage architecture that SSD uses does require significant management for write levelling and verifying so that the device as a whole actually lasts, and your data is kept safe: in theory you could write to an NVRAM chip, and not know when it didn’t actually store your data properly.
But there are other technologies, such as Memristor (RRAM) and Phase Change Memory …
[Read more]
Indexes are separate data structures that provide alternate
pathways to finding data. They can and do generally speed up the
processing of queries and other DML commands, like the
INSERT, UPDATE, REPLACE
INTO, and DELETE statements. Indexes are also
called fast access paths.
In the scope of the InnoDB Database Engine, the MySQL database
maintains the integrity of indexes after you create them. The
upside of indexes is that they can improve SQL statement
performance. The downside is that they impose overhead on every
INSERT, UPDATE, REPLACE
INTO, and DELETE statement, because the
database maintains them by inserting, updating, or deleting items
for each related change in the tables that the indexes support.
Indexes have two key properties—usability and visibility. Indexes are both usable and visible by default. That means they …
[Read more]Recently Aphyr (Kyle Kingsbury) published https://aphyr.com/posts/327-call-me-maybe-mariadb-galera-cluster
The article is technically valid, I am not going to dispute a conclusion Aphyr made, but it is also quite technically involved, so users who just jump to conclusion may get the wrong impression and we’re left with more questions than ever.
So, let me state what is the real conclusion of this
article:
“Galera cluster does not support SNAPSHOT ISOLATION
LEVEL, in contract to what was stated in the
documentation”.
Following that conclusion is using Galera cluster may result in
“corrupted” data.
I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.
So with this clarification, the …
[Read more]In my last Booking.com Hackathon, I worked on MyISAM vs InnoDB for data loading (LOAD DATA IN FILE) and bulk index creation. My motivation was the following: knowing that some are still using MyISAM for this particular use-case, I wanted to verify/understand if/why InnoDB is slower than MyISAM. I do not yet have complete results on this specific subject but I found some interesting things that
Thank you for attending my July 22 webinar titled “Advanced Query Tuning in MySQL 5.6 and 5.7” (my slides and a replay available here). As promised here is the list of questions and my answers (thank you for your great questions).
Q: Here is the explain example:
mysql> explain extended select id, site_id from test_index_id where site_id=1
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test_index_id
type: ref
possible_keys: key_site_id
key: key_site_id
key_len: 5
ref: const
rows: 1
filtered: 100.00
Extra: Using where; Using index
why is site_id a covered index for the query, given the fact that a) we are selecting “id”, b) key_site_id only …
[Read more]