MySQL gave us the JSON data type back in mid-2015 with the release of MySQL 5.7.8. Since then, it has been used as a way to escape rigid column definitions and store JSON documents of all shapes and sizes: audit logs, configuration settings, 3rd party payloads, user-defined fields, and more. Although MySQL gives us functions for reading and writing JSON data, you’ll quickly discover something that is conspicuously missing: the ability to directly index your JSON columns. In other databases, the best way to directly index a JSON column is usually through a type of index known as a Generalized Inverted Index, or GIN for short. Since MySQL doesn’t offer GIN indexes, we’re unable to directly index an entire stored JSON document. All is not lost though, because MySQL does give us a way to indirectly index parts of our stored JSON documents. Depending on the version of MySQL that you're using, you have two options for indexing JSON. In MySQL 5.7 you …
[Read more]Overview Ever find yourself building a database only to start questioning what data types you should use for a specific column? In this entry of the MySQL data types series, we’ll explore the various ways you can save strings and text to a database to help demystify the options you have as a developer, starting with VARCHAR and CHAR. VARCHAR vs CHAR VARCHAR is probably the most widely used data type for strings. It stores a string of variable length up to a maximum of 65,535 characters. When creating a VARCHAR field, you’ll be able to specify the maxmimum number of characters the field will accept using the VARCHAR(n) format, where n is the maximum number of characters to be stored. Due to the fact that is is variable length, it will only allocate enough disk space to store the contents of the string, not the full length of the contents passed in. VARCHAR also allocates a little bit of extra space with each value stored. Depending on the space …
[Read more]Overview JavaScript Object Notation (JSON) is a light-weight text-based file format similar to YAML or XML which simplifies data exchange. It was invented by Douglas Crockford in the early 2000s and became increasingly popular with the rise of document-based (also called NoSQL) databases. JSON supports strings, numbers, booleans, objects, and arrays as well as null values. A simple JSON example containing key-value pairs, an object "bandMembers" and an array "songs" would look like this:{ "artist": "Starlord Band", "bandMembers": { "vocals": "Steve Szczepkowski", "guitar": "Yohann Boudreault", "bass": "Yannick T.", "drums": "Vince T." }, "bandMembersCount": 4, "album": "Space Rider", "releaseDate": "2021-10-25", "songs": [ "Zero to Hero", "Space Riders with No Names", "Ghost", "Bit of Good (Bit of Bad)", "Watch me shine", "We’re Here", "The Darkness inside", "No Guts No Glory", "All for One", "Solar Skies" ], "songsCount": 10 }
MySQL has …
[Read more]Knowing your database can scale provides great peace of mind. We built PlanetScale on top of Vitess so that we could harness its ability to massively scale. One of the core strengths in our ability to scale is horizontal sharding. To demonstrate the power of horizontal sharding, we decided to run some benchmarking. We set up a PlanetScale database and started running some benchmarks with a common tpc-c sysbench workload. We weren’t aiming for a rigorous academic benchmark here, but we wanted to use a well-known and realistic workload. We will have more benchmark posts coming and have partnered with an academic institution who will be releasing their work soon. For this post, there are two goals. The first is to demonstrate PlanetScale’s ability to handle large query volumes. For this, we set a goal of a million queries per second. In Vitess terms, this is not a large cluster. There are many Vitess clusters running at much higher query volumes, …
[Read more]MySQL semi-sync is a plugin mechanism on top of asynchronous replication, that can offer better durability and even consistency (term defined later). It helps in high availability solutions, but can in itself reduce availability. We look at some basics and follow up to present scenarios that require higher level intervention to ensure availability and to avoid split brains from taking place. I recommend reading this semi-sync blog post by Jean-François Gagné (aka JFG), which illustrates the internals of the semi-sync implementation, and debunks some myths about semi-sync. We will overlap a bit with another recommended post by JFG, about high availability and recovery. Note: in this post we adopt the term “primary” over the term “master” in the context of MySQL replication. However, at this time there is no alternative to using the actual names of some configuration and status variables that use “master” terminology, and some duality is …
[Read more]Since writing this blog we have released a new version of PlanetScale. Learn more about what we’ve built and give it a try, and be sure to check out our docs. Please note, this blog refers to PlanetScaleDB v1 and is not applicable to our latest product. At PlanetScale, we have built PlanetScaleDB, a fully managed database-as-a-service on top of open source Vitess that enables horizontal scaling of MySQL far beyond what you can do with a single instance. In this blog, we’ll explain how sharding works in Vitess and on PlanetScaleDB. A sharded database is a collection of multiple databases (shards) with identical relational schemas. Vitess allows your application to treat a sharded database as though it is a humongous monolithic database without having to worry about the complexities of sharding. Because of this, you can start with a small database on PlanetScaleDB and grow to massive scale without changing your application logic. In this blog post, …
[Read more]In the past year, GitHub engineers shipped GitHub Packages, Actions, Sponsors, Mobile, security advisories and updates, notifications, code navigation, and more. Needless to say, the development pace at GitHub is accelerated.
With MySQL serving our backends, updating code requires changes to the underlying database schema. New features may require new tables, columns, changes to existing columns or indexes, dropping unused tables, and so on. On average, we have two schema migrations running daily on our production servers. Some days we have a half dozen migrations to run. We’ll cover how this amounted to a significant toil on the database infrastructure team, and how we searched for a solution to automate the manual parts of the process.
At first …
[Read more]Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks…
The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.
Tumblr is a big user of MySQL, and MySQL automation at Tumblr is centered around a tool we built called Jetpants. Jetpants does an incredible job making risky operations safe and reliable, even fairly complex tasks like replacing failed master servers, or splitting a shard.
While Jetpants is an incredibly effective and valuable tool for Tumblr’s day-to-day operation, it has remained very difficult to implement a meaningful testing framework. Integration testing at this level is very challenging. In this article I’ll go through these challenges and how we’ve tackled them at Tumblr.
Requirements
Jetpants operates under the assumption you’re managing MySQL daemons on a fully functional host, and that it can:
- ssh to the target system
- manage processes via service or systemctl commands
- copy data around between systems …