Showing entries 1 to 10 of 48
10 Older Entries »
Displaying posts with tag: failover (reset)
RDS Aurora MySQL and Service Interruptions

In Amazon space, any EC2 or Service instance can “disappear” at any time.  Depending on which service is affected, the service will be automatically restarted.  In EC2 you can choose whether an interrupted instance will be restarted, or left shutdown.

For an Aurora instance, an interrupted instance is always restarted. Makes sense.

The restart timing, and other consequences during the process, are noted in our post on Aurora Failovers.

Aurora Testing Limitations

As mentioned earlier, we love testing “uncontrolled” failovers.  That is, we want to be able to pull any plug on any service, and see that the environment as a whole continues to do its job.  We can’t do that with Aurora, because we can’t control the essentials:

  • power button;
  • reset switch;
[Read more]
RDS Aurora MySQL Failover

Right now Aurora only allows a single master, with up to 15 read-only replicas.

Master/Replica Failover

We love testing failure scenarios, however our options for such tests with Aurora are limited (we might get back to that later).  Anyhow, we told the system, through the RDS Aurora dashboard, to do a failover. These were our observations:

Role Change Method

Both master and replica instances are actually restarted (the MySQL uptime resets to 0).

This is quite unusual these days, we can do a fully controlled role change in classic asynchronous replication without a restart (CHANGE MASTER TO …), and Galera doesn’t have read/write roles as such (all instances are technically writers) so it doesn’t need role changes at all.

Failover Timing

Failover between running instances takes about 30 seconds.  This is in line with information provided in the …

[Read more]
orchestrator/raft: Pre-Release 3.0

orchestrator 3.0 Pre-Release is now available. Most notable are Raft consensus, SQLite backend support, orchestrator-client no-binary-required client script.

TL;DR

You may now set up high availability for orchestrator via raft consensus, without need to set up high availability for orchestrator's backend MySQL servers (such as Galera/InnoDB Cluster). In fact, you can run a orchestrator/raft setup using embedded SQLite backend DB. Read on.

orchestrator still supports the existing shared backend DB paradigm; nothing dramatic changes if you upgrade to 3.0 and do not configure raft.

orchestrator/raft

[Read more]
Barcelona MySQL Users Group Meetup on 5th July

If you’re in Barcelona next week you may be interested in the MySQL Meetup being held there by the Barcelona MySQL Meetup group on Wednesday 7th July at 7pm. I’ll be doing a talk on MySQL Failover and Orchestration and there will be opportunity to talk about MySQL and related topics afterwards. More information can … Continue reading Barcelona MySQL Users Group Meetup on 5th July

Webinar Thursday January 26, 2017: Overcoming the Challenges of Databases in the Cloud

Please join Percona’s Jon Tobin, Director of Solution Engineering at Percona, and Rob Masson, Solutions Architect Manager at ScaleArc, on Thursday, January 26, 2017, at 9:00 am PST / 12:00 pm EST (UTC-8) for their webinar “Overcoming the Challenges of Databases in the Cloud.”

Enterprises enjoy the flexibility and simplified operations of using the cloud, but applying those advantages to database workloads has proven challenging. Resource contention, cross-region failover and elasticity at the data tier all introduce limitations. In addition, cloud providers support different services within their database offerings.

Jon and Rob’s webinar is a …

[Read more]
Webinar Wednesday January 18, 2017: Lessons from Database Failures

Join Percona’s Chief Evangelist Colin Charles on Wednesday, January 18, 2017, at 7:00 am (PST) / 10:00 am (EST) (UTC-8) as he presents “Lessons from Database Failures.”

MySQL failures at scale can teach a great deal. MySQL failures can lead to a discussion about such topics as high availability (HA), geographical redundancy and automatic failover. In this webinar, Colin will present case study material (how automatic failover caused Github to go offline, why Facebook uses assisted failover rather than fully automated failover, and other scenarios) to look at how the MySQL world is making things better. One way, for example, is using …

[Read more]
Orchestrator: Moving VIPs During Failover

In this post, I’ll discuss how to moving VIPs during a failover using Orchestrator.

In our previous post, we showed you how Orchestrator works. In this post, I am going to give you a proof-of-concept on how Orchestrator can move VIPs in case of failover. For this post, I’m assuming the Orchestrator is already installed and able to manage the topology.

Hooks

Orchestrator is a topology manager. Nothing less nothing more. In the case of failover, it will reorganize the topology, promote a new master and connect the slaves to it. But it won’t do any DNS changes, and it won’t move VIPs (or anything else).

However, Orchestrator supports hooks. Hooks are external scripts …

[Read more]
State of automated recovery via Pseudo-GTID & Orchestrator @ Booking.com

This post sums up some of my work on MySQL resilience and high availability at Booking.com by presenting the current state of automated master and intermediate master recoveries via Pseudo-GTID & Orchestrator.

Booking.com uses many different MySQL topologies, of varying vendors, configurations and workloads: Oracle MySQL, MariaDB, statement based replication, row based replication, hybrid, OLTP, OLAP, GTID (few), no GTID (most), Binlog Servers, filters, hybrid of all the above.

Topologies size varies from a single server to many-many-many. Our typical topology has a master in one datacenter, a bunch of slaves in same DC, a slave in another DC acting as an …

[Read more]
Orchestrator & Pseudo-GTID for binlog reader failover

One of our internal apps at Booking.com audits changes to our tables on various clusters. We used to use tungsten replicator, but have since migrated onto our own solution.

We have a binlog reader (uses open-replicator) running on a slave. It expects Row Based Replication, hence our slave runs with log-slave-updates, binlog-format='ROW', to translate from the master's Statement Based Replication. The binlog reader reads what it needs to read, audits what it needs to audit, and we're happy.

However what happens if that slave dies?

In such case we need to be able to point our binlog reader to another slave, and it needs to be able to pick up auditing from the same point.

This sounds an awful lot like slave repointing in case of master/intermediate master failure, and …

[Read more]
Thoughts on MaxScale automated failover (and Orchestrator)

Having attended a talk (as part of the MariaDB Developer Meeting in Amsterdam) about recent developments of MaxScale in executing automated failovers, here are some (late) observations of mine.

I will begin by noting that the project is stated to be pre-production, and so of course none of the below are complaints, but rather food for thought, points for action and otherwise recommendations.

Some functionality of the MaxScale failover is also implemented by orchestrator, which I author. Orchestrator was built in production environments by and for operational people. In this respect it has gained many insights and had to cope with many real-world cases, special cases …

[Read more]
Showing entries 1 to 10 of 48
10 Older Entries »