MySQL offers a wide array of options to configure replication, but with all of those options, how can you be sure you are doing it right? Replication is the first step to providing a higher level of availablity to your MySQL database. A well configured replication architecture can be the difference between your data being highly available, or your MySQL setup becoming a management nightmare. At PlanetScale, we support hundreds of thousands of database clusters, all using replication to provide high availability, so we have a little bit of experience in this arena! In this article, we’re going to explore some of the best practices when it comes to replication, both locally and across longer distances. Use an active/passive configuration When replicating with active/passive mode, one MySQL server acts as the source and all other servers are read-only replicas from that source. In this configuration, the replicas can be used to serve up read-only …
[Read more]Have you heard about MySQL replication but aren’t sure exactly why you should care? Having multiple servers for any workload is typically considered best practice. After all, a workload split across multiple servers helps balance out the performance of any application. When it comes to working with your database, though, the benefits may not be as clear. In this article, you’ll learn about five real-world use cases for implementing MySQL replication. What is MySQL replication? Before we get into its use cases, let us briefly describe what MySQL replication is. MySQL replication is a process that is used to keep multiple MySQL servers in sync. When you first set up a MySQL environment, it is typically with a single server to run your databases. One approach to scaling your database environment is to configure additional servers to contain copies of your database (replicas) that match the primary MySQL server (source). As data is updated, written …
[Read more]As the usage of your app grows, performance can steadily decline. There’s nothing necessarily surprising by that statement, but what is surprising is the number of bottlenecks that can surface and the options available to you to fix them. One such bottleneck can be directly related to the time it takes to read from and write to your database. After all, behind the complexities of a relational database, you’re still working with storage systems that have inherent IO latency. This is where a well-architected caching system can help. A good caching system can reduce the load on your database and increase the general performance of your application. A faster application results in happier users and potentially more revenue, which is always a good thing! However, caching systems have their own setup complexities, along with a number of gotchas that might creep up unexpectedly. In this article, we’ll explore backend caching, how to implement it, how …
[Read more]When the performance of your database server starts to decline from general usage, you have several options for optimizing it. One common method for optimizing your MySQL database is through partitioning. In this article, we’ll cover the basics of MySQL partitioning, how to apply partitioning to your database, and we'll discuss how it’s related to sharding. The basics of MySQL partitioning Partitioning is the idea of splitting something large into smaller chunks. In MySQL, the term “partitioning” means splitting up individual tables of a database. When you partition a table in MySQL, the table is split up into several logical units known as partitions, which are stored separately on disk. When data is written to the table, a partitioning function will be used by MySQL to decide which partition to store the data in. The value for one or more columns in a given row is used for this sorting process. MySQL provides several partitioning functions …
[Read more]Deep dive into MySQL’s connection handling mechanisms for optimal connection pooling and improved concurrency.
Scaling a database presents challenges. As you grow, you might begin having trouble handling ever-increasing throughput or data size. You might find that query latency is getting worse. You might be pushing the limits of your hardware. When this happens, a classic option is vertically scaling your database by getting better hardware, but is there a better way? And what happens when you reach the vertical limits? This is where horizontal sharding comes in. In this article, we'll cover some common indicators that your database may be ready for horizontal sharding. We'll also look at some measures you can implement until then. Let's dig in. Hitting the limits There are lots of different limits that you can run into when you're scaling up. At the database level, you might be maxing out CPU, memory, disk space, or IOPS. Running into these limits can have real consequences for your business. Database operations like schema changes will start taking longer, …
[Read more]For developers building out an application, a transactional datastore is the obvious and proven choice, but with success comes scale limitations. A monolithic database works well initially, but as an application sees growth, the size of its data will eventually grow beyond what is optimal for a single server. Implementing read replicas can improve your performance but will likely add lag between your primary and your replicas, leading to performance or correctness issues for your application. These complexities can sometimes require major architectural changes, leading to a suboptimal user experience and difficult compromises, having to choose between application performance or data consistency. Scaling write traffic is even more challenging; for example, even the largest MySQL database will see performance issues at a certain point. Horizontal sharding This is not a new challenge; organizations have faced it for years, and horizontal sharding is one …
[Read more]Every day PlanetScale processes more than 10 billion of our customers’ queries. We need to collect, store, and serve telemetry data generated by these queries to power Insights, our built-in query performance tool. This post describes how we built a scalable telemetry pipeline using Apache Kafka and a sharded PlanetScale database. Insights requirements To show you Insights, we pull from the following datasets: Database-level time series data (e.g., queries per second across the entire database). Query pattern-level time series data (e.g., p95 for a single query pattern like SELECT email FROM users where id = %) Data on specific query executions for slow/expensive queries (the “slow query log”). The database-level data fits well into a time series database like Prometheus, but we run into several issues when trying to store the query-pattern level data in a time series database. On any given day, there are 10s of millions of unique query …
[Read more]There are several different ways to store dates and times in MySQL, and knowing which one to use requires understanding what you'll be storing and how MySQL handles each type. There are five column types that you can use to store temporal data in MySQL. They are: DATE DATETIME TIMESTAMP YEAR TIME Each column type stores slightly different data, has different minimum and maximum values, and requires different amounts of storage. In the table below, you'll see each column type and their various attributes.| Column | Data | Bytes | Min | Max | |-----------|-------------|-------|---------------------|---------------------| | DATE | Date only | 3 | 1000-01-01 | 9999-12-31 | | DATETIME | Date + time | 8 | 1000-01-01 00:00:00 | 9999-12-31 23:59:59 | | TIMESTAMP | Date + time | 4 | 1970-01-01 00:00:00 | 2038-01-19 03:14:17 | | YEAR | Year only | 1 | 1901 | 2155 | | TIME | Time only | 3 | -838:59:59 | 838:59:59 |
Dates, years, and times Based on this …
[Read more]One of the hidden gems in the MySQL documentation is this note in section 8.3.6: As an alternative to a composite index, you can introduce a column that is “hashed” based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a “wide” index on many columns. We will build on this idea by creating generated hash columns for indexed lookups on large values and enforcing uniqueness across many columns. Instead of creating huge composite indexes, we'll index the compact generated hashes for fast lookups. Before diving into generated hash columns, let's look at generated columns in general. Generated columns in MySQL A generated column can be considered a calculated, computed, or derived column. It is a column whose value results from an expression rather than direct data input. The expression can contain literal values, built-in functions, or references to other columns. The result of …
[Read more]