Showing entries 1 to 10 of 29
10 Older Entries »
Displaying posts with tag: Kafka (reset)
Leader election and sharding practices at Wix Microservices

Leader election and Sharding Practices at Wix microservicesPhoto by Glen Carrie on Unsplash

Wix’s distributed system of 2000 clustered microservices is required to process billions of business events every day with very high speed in a highly concurrent fashion.

There is a need to balance the load between the various cluster nodes, such that no bottlenecks are created. For serving HTTP requests, this can be done by load balancers such as NGINX or Amazon’s ELB — this is out of scope for this article.

A service acting as a client

[Read more]
Streaming Vitess at Bolt

Previously posted on link at Nov 3, 2020. Traditionally, MySQL has been used to power most of the backend services at Bolt. We’ve designed our schemas in a way that they’re sharded into different MySQL clusters. Each MySQL cluster contains a subset of data and consists of one primary and multiple replication nodes. Once data is persisted to the database, we use the Debezium MySQL Connector to capture data change events and send them to Kafka.

On Vertica 10.0 Interview with Mark Lyons

“Supporting arrays, maps and structs allows customer to simplify data pipelines, unify more of their semi-structured data with their data warehouse as well as maintain better real world representation of their data from relationships between entities to customer orders with item level detail. A good example is groups of cell phone towers that are used for one call while driving on the highway.” –Mark Lyons

I have interviewed Mark Lyons, Director of Product Management at Vertica. We talked about the new Vertica 10.0


Q1. What is your role at Vertica?

Mark Lyons: My role at Vertica is Director of Product Management. I have a team of 5 product managers covering analytics, security, storage integrations and cloud.

Q2. You recently announced Vertica Version 10. What is …

[Read more]
Install librdkafka on alpine docker

Made a few optimizations to reduce the size from 342 MB down to 218MB down to 53MB.

here is the gist to the the docker file.

FROM alpine:3.9
#vishnus-MacBook-Pro:librd vrao$ docker images |grep lib
#lib proper_cleanup 675073279e9c 4 seconds ago 53.3MB
#lib cleanup 7456af7df73b 2 minutes ago 218MB
#lib simple 9724aed9519c 7 minutes ago 342MB …
[Read more]
Kafka – Restoring Under replicated Partition & Broker to ISR

I recently came across a scenario in kafka where our brokers have been pushed out of the ISR and partitions have been declared under replicated. This situation has been going on for weeks & our brokers cant seem to catch up. Due to this paralysis, partition reassignment is also failing/stuck.

Inspired by this blog post on kafka storage internals, i came up with a procedure to bring the brokers back into ISR.

Before i go into the procedure, here are few things to understand:

  1. Offset of message in a partition remains the same both in the leader broker and replica broker.
  2. Position of message having a certain offset in the log file in leader broker …
[Read more]
Debezium MySQL Snapshot For CloudSQL(MySQL) From Replica

The snapshot in Debezium will do a historical data load from the source database to the Kafka topics. But generally its not a good practice to this if you have a huge data in your tables. Recently I have published many blog posts to perform this snapshot from Read Replica(with/without GTID, AWS Aurora). One guy commented that, in GCP the MySQL managed service is called CloudSQL. There we don’t have much control to stop replication, perform the modifications that we want. So how can we avoid snapshots in CloudSQL and take debezium snapshots from CloudSQL Read Replica? I have spent some time today and figured out a way to do this.

The Approach:

We can’t enable binlogs on read replica. So we have to setup an external read replica for this. If the external replica is a VM, then we can enable the log-slave-updates with GTID. Then we can …

[Read more]
Grafana Dashboard For Monitoring Debezium MySQL Connector

Debezium has packed with monitoring metrics as well. We just need to consume and expose it to the Prometheus. A lot of use of useful metrics are available in Debezium. But unfortunately, we didn’t find any Grafana dashboards to visualizing the Debezium metrics. So we built a dashboard and share it with the Debezium community. Still, a few things need to improve, but almost all the metrics are covered in one single dashboard.

Debezium MySQL monitoring metrics:

Debezium MySQL connector has three types of metrics.

  1. Schema History — Track the schema level changes.
  2. Snapshot — Track the progress about the snapshot.
  3. Binlog — Real-time reading binlog events.

Setup Monitoring for MySQL connector:

We need to install JMX exporter for monitoring the debezium MySQL connector. We have already blogged about this with detailed steps.

[Read more]
Debezium MySQL Snapshot For AWS RDS Aurora From Backup Snaphot

I have published enough Debezium MySQL connector tutorials for taking snapshots from Read Replica. To continue my research I wanted to do something for AWS RDS Aurora as well. But aurora is not using binlog bases replication. So we can’t use the list of tutorials that I published already. In Aurora, we can get the binlog file name and its position from its snapshot of the source Cluster. So I used a snapshot for loading the historical data, and once it’s loaded we can resume the CDC from the main cluster.


  1. Running aurora cluster.
  2. Aurora cluster must have binlogs enabled.
  3. Make binlog retention period to a minimum 3 days(its a best practice).
  4. Debezium connector should be able to access both the clusters.
  5. Make sure you have different security …
[Read more]
Debezium MySQL Snapshot From Read Replica And Resume From Master

In my previous post, I have shown you how to take the snapshot from Read Replica with GTID for Debezium MySQL connector. GTID concept is awesome, but still many of us using the replication without GTID. For these cases, we can take a snapshot from Read replica and then manually push the Master binlog information to the offsets topic. Injecting manual entry for offsets topic is already documented in Debezium. I’m just guiding you the way to take snapshot from Read replica without GTID.


  • Setup master slave replication.
  • The slave must have log-slave-updates=ON else connector will fail to read from beginning onwards.
  • Debezium connector should be able to …
[Read more]
Debezium MySQL Snapshot From Read Replica With GTID

When you installed the Debezium MySQL connector, then it’ll start read your historical data and push all of them into the Kafka topics. This setting can we changed via snapshot.mode parameter in the connector. But if you are going to start a new sync, then Debezium will load the existing data its called Snapshot. Unfortunately, if you have a busy transactional MySQL database, then it may lead to some performance issues. And your DBA will never agree to read the data from Master Node.[Disclaimer: I’m a DBA :) ]. So I was thinking of figuring out to take the snapshot from the Read Replica, once the snapshot is done, then start read the realtime data from the Master. I found this useful information in a StackOverflow answer.

If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that’s done, switch to the master to read the binlog. But if you don’t use …

[Read more]
Showing entries 1 to 10 of 29
10 Older Entries »