How do you run non-blocking schema changes in MySQL? This is an eternal question. With a plethora of 3rd party solutions and with recent advancements in MySQL, it's difficult to track which solution is preferable for a given schema migration. In this post, we provide a high level overview of the state of MySQL online schema migrations in 2024. We limit the discussion to ALTER TABLE statements, as other DDL statements are typically fast (DROP TABLE is somewhat of an exception, but out of scope of this post). We'll first examine the native MySQL options: INPLACE and INSTANT. For reference, see Online DDL Operations MySQL 8.0 documentation. INPLACE, aka InnoDB Online DDL This is MySQL's first take on non-blocking schema changes. Some types of ALTER TABLE (see above link for exhaustive list of supported changes) are eligible to run with ALGORITHM=INPLACE. An INPLACE schema change is technically non-blocking, with quite a few caveats: On the server where …
[Read more]If you're using MySQL, you likely have indexes that are powered by B-trees.The B-tree is a powerful data structure, and is frequently used to construct indexes in relational databases.If you are using the InnoDB storage engine, it is the only choice for your index, save for spatial indexes.However, MySQL has a secret weapon for making lookups with these types of indexes even faster: the Adaptive Hash Index, or AHI.Before jumping in to how this works, let's take a few moments to review B-trees, the InnoDB buffer pool, and how these work together during index lookups. B-Tree indexes The B-tree data structure has been used by computer systems for decades.It is particularly useful in the context of file system and data storage applications, due to the fact that each node can store many values.This is useful in algorithms that interface with storage systems, as the size of each node can be set to work well with the unit(s) of storage, for example aligning …
[Read more]When considering the performance of any software, there's a classic trade-off between time and space.In the process of assessing the performance of a MySQL query, we often focus on execution time (or query latency) and use it as the primary metric for query performance.This is a good metric to use, as ultimately, we want to get query results as quickly as possible. I recently released a blog post about how to identify and profile problematic MySQL queries, with a discussion centered around measuring poor performance in terms of execution time and row reads.However, in this discussion, memory consumption was largely ignored. Though it may not be needed as often, MySQL also has built-in mechanisms for gaining a deep understanding of both how much memory a query is using and also what that memory is being used for.Let's take a deep dive through this functionality and see how we can perform live monitoring of the memory usage of a MySQL connection. …
[Read more]Though we try our best to avoid it, it's easy to let underperforming queries slip through the cracks in our workloads, negatively impacting the performance of a database system.This is especially true in large-scale database environments, with many gigabytes or terabytes of data, hundreds of tables, and thousands of query patterns being executed on a daily basis. Thankfully, MySQL has the ability to collect data that can be leveraged for identifying problematic queries, and can also do profiling on them in order to drill into their poor performance.In this article, I'll go over several built-in techniques for how to do this in native MySQL.If you use PlanetScale, this type of information can be gathered more easily and intuitively using the PlanetScale Insights dashboard.I'll include a brief discussion of this feature later on. For this article, I'll be using the following schema as an example.
A fake workload has been run on this database, …
[Read more]Universally Unique Identifiers, also known as UUIDs, are designed to allow developers to generate unique IDs in a way that guarantees uniqueness without knowledge of other systems. These are especially useful in a distributed architecture, where you have a number of systems and databases responsible for creating records. You might think that using UUIDs as a primary key in a database is a great idea, but when used incorrectly, they can drastically hurt database performance. In this article, you'll learn about the downsides of using UUIDs as a primary key in your MySQL database. The many versions of UUIDs At the time of this writing, there are five official versions of UUIDs and three proposed versions. Let's take a look at each version to better understand how they work. UUIDv1 A UUID version 1 is known as a time-based UUID and can be broken down as follows:
While much of modern computing uses the UNIX epoch time (Jan 1, 1970) as its base, …
[Read more]Many years ago, I worked for a telematics company that ingested data from hundreds of thousands of devices worldwide. There was a point of incredible growth where we onboarded a customer that gave us a massive number of new devices and a huge bump in revenue. It was a great moment for the company's trajectory, but the increased amount of data being processed highlighted a massive flaw in our system. The ID column of the data history table (which logged every event that occurred across all devices) was created with the INT data type, and it was quickly running out of space. It wasn't an issue immediately, but if that column ran out of space, our entire system would come to a halt. Funnily enough, we built a quick tool called “the doomsday clock,” which would roughly calculate the date this would occur. Had we expected this, we would have designed the database with a different type that would have more easily accommodated growth like this, but of …
[Read more]Data is abstract. Geospatial design management is how engineers across various disciplines make sense of complex data to make more informed decisions and better understand the spatial relationships in the world around us. In this blog post, I explore how complex data and geographic features can be represented in MySQL. Geospatial data, often referred to in technical documentation as geodata, includes information related to locations on the Earth's surface. In MySQL, geographic features represent anything in the real world with a location and are defined as either Entities or Space. Type Definition Examples Entities Specific objects with defined boundaries and individual properties Landmarks: Mountains, rivers, forests, buildings Infrastructure: Roads, bridges, power lines Administrative areas: Countries, cities, states Points of interest: Restaurants, shops, ATMs Spaces Continuous areas defined by their location and characteristics Land cover: …
[Read more]In the early 1980’s, computer scientists Andreas Reuter and Theo Harder coined the term ACID to describe a set of properties related to database transactions designed to keep data stored reliably and with integrity. Most (if not all) modern database systems are built around ACID compliance. By adhering to these fundamentals, businesses can confidently trust the data within their database, whether it’s for a small project management app, or a large banking system. Isolation levels, as well as the related concepts, are cornerstones that enable MySQL to fulfill ACID guarantees. In this article, we’ll break down how multiple clients can work with a single database and maintaining data consistency by using isolation levels. What is a MySQL isolation level? A MySQL isolation level is one of four modes that can be set on a MySQL session that controls how transactions should behave when executing concurrently. This concept relates directly to the …
[Read more]Over 15 years ago, GitHub started as a Ruby on Rails application with a single MySQL database. Since then, GitHub has evolved its MySQL architecture to meet the scaling and resiliency needs of the platform—including building for high availability, implementing testing automation, and partitioning the data. Today, MySQL remains a core part of GitHub’s infrastructure and our relational database of choice.
This is the story of how we upgraded our fleet of 1200+ MySQL hosts to 8.0. Upgrading the fleet with no impact to our Service Level Objectives (SLO) was no small feat–planning, testing and the upgrade itself took over a year and …
[Read more]Organizations often shard their database to scale beyond what simply adding resources to a single server can provide. When you horizontally shard your database, you essentially break the data up and split it across multiple database servers. Hearing this, you might think that adding more servers means adding more maintenance overhead to your staff, and more expenses on your budget, with the tradeoff that your organization can handle more database traffic. While there is definitely some truth to that in certain situations, there’s oftentimes more to the story that's not as obvious. In this article, we’ll cover three ways that sharding your database can benefit your organization beyond additional throughput. Minimized impact on failures There’s an old saying in architecting infrastructure: two is one, and one is none. The implication is that you should never have one of anything, as it creates a single point of failure. This is true for your …
[Read more]