Showing entries 1 to 10 of 24
10 Older Entries »
Displaying posts with tag: Kafka (reset)
Streaming Vitess at Bolt

Previously posted on link at Nov 3, 2020. Traditionally, MySQL has been used to power most of the backend services at Bolt. We've designed our schemas in a way that they're sharded into different MySQL clusters. Each MySQL cluster contains a subset of data and consists of one primary and multiple replication nodes. Once data is persisted to the database, we use the Debezium MySQL Connector to capture data change events and send them to Kafka.

Debezium MySQL Snapshot For CloudSQL(MySQL) From Replica

The snapshot in Debezium will do a historical data load from the source database to the Kafka topics. But generally its not a good practice to this if you have a huge data in your tables. Recently I have published many blog posts to perform this snapshot from Read Replica(with/without GTID, AWS Aurora). One guy commented that, in GCP the MySQL managed service is called CloudSQL. There we don’t have much control to stop replication, perform the modifications that we want. So how can we avoid snapshots in CloudSQL and take debezium snapshots from CloudSQL Read Replica? I have spent some time today and figured out a way to do this.

The Approach:

We can’t enable binlogs on read replica. So we have to setup an external read replica for this. If the external replica is a VM, then we can enable the log-slave-updates with GTID. Then we can …

[Read more]
Grafana Dashboard For Monitoring Debezium MySQL Connector

Debezium has packed with monitoring metrics as well. We just need to consume and expose it to the Prometheus. A lot of use of useful metrics are available in Debezium. But unfortunately, we didn’t find any Grafana dashboards to visualizing the Debezium metrics. So we built a dashboard and share it with the Debezium community. Still, a few things need to improve, but almost all the metrics are covered in one single dashboard.

Debezium MySQL monitoring metrics:

Debezium MySQL connector has three types of metrics.

  1. Schema History — Track the schema level changes.
  2. Snapshot — Track the progress about the snapshot.
  3. Binlog — Real-time reading binlog events.

Setup Monitoring for MySQL connector:

We need to install JMX exporter for monitoring the debezium MySQL connector. We have already blogged about this with detailed steps.

[Read more]
Debezium MySQL Snapshot For AWS RDS Aurora From Backup Snaphot

I have published enough Debezium MySQL connector tutorials for taking snapshots from Read Replica. To continue my research I wanted to do something for AWS RDS Aurora as well. But aurora is not using binlog bases replication. So we can’t use the list of tutorials that I published already. In Aurora, we can get the binlog file name and its position from its snapshot of the source Cluster. So I used a snapshot for loading the historical data, and once it’s loaded we can resume the CDC from the main cluster.

Requirements:

  1. Running aurora cluster.
  2. Aurora cluster must have binlogs enabled.
  3. Make binlog retention period to a minimum 3 days(its a best practice).
  4. Debezium connector should be able to access both the clusters.
  5. Make sure you have different security …
[Read more]
Debezium MySQL Snapshot From Read Replica And Resume From Master

In my previous post, I have shown you how to take the snapshot from Read Replica with GTID for Debezium MySQL connector. GTID concept is awesome, but still many of us using the replication without GTID. For these cases, we can take a snapshot from Read replica and then manually push the Master binlog information to the offsets topic. Injecting manual entry for offsets topic is already documented in Debezium. I’m just guiding you the way to take snapshot from Read replica without GTID.

Requirements:

  • Setup master slave replication.
  • The slave must have log-slave-updates=ON else connector will fail to read from beginning onwards.
  • Debezium connector should be able to …
[Read more]
Debezium MySQL Snapshot From Read Replica With GTID

When you installed the Debezium MySQL connector, then it’ll start read your historical data and push all of them into the Kafka topics. This setting can we changed via snapshot.mode parameter in the connector. But if you are going to start a new sync, then Debezium will load the existing data its called Snapshot. Unfortunately, if you have a busy transactional MySQL database, then it may lead to some performance issues. And your DBA will never agree to read the data from Master Node.[Disclaimer: I’m a DBA :) ]. So I was thinking of figuring out to take the snapshot from the Read Replica, once the snapshot is done, then start read the realtime data from the Master. I found this useful information in a StackOverflow answer.

If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that’s done, switch to the master to read the binlog. But if you don’t use …

[Read more]
Monitor Debezium MySQL Connector With Prometheus And Grafana

Debezium is providing out of the box CDC solution from various databases. In my last blog post, I have published how to configure the Debezium MySQL connector. This is the next part of that post. Once we deployed the debezium, to we need some kind of monitoring to keep track of whats happening in the debezium connector. Luckily Debezium has its own metrics that are already integrated with the connectors. We just need to capture them using the JMX exporter agent. Here I have written how to monitor Debezium MySQL connector with Prometheus and Grafana. But the dashboard is having the basic metrics only. You can build your own dashboard for more detailed monitoring.

Reference: List of Debezium monitoring metrics

Install JMX exporter in …

[Read more]
RealTime CDC From MySQL Using AWS MSK With Debezium

Credit: AWS

CDC is becoming more popular nowadays. Many organizations that want to build a realtime/near realtime data pipe and reports are using the CDC as a backbone to powering their real-time reports. Debezium is an opensource product from RedHat and it supports multiple databases (both SQL and NoSQL). Apache Kafka is the core of this.

Managing and scaling Kafka clusters is not easy for everyone. If you are using AWS for your infra then let AWS manage the cluster. AWS has MSK is a managed Kafka service. We are going to configure Debezium with AWS MSK.

Configuration File:

If you are already worked with AWS MSK, then you might be familiar with this configuration file. This is similar to the RDS Parameter group but here you need to upload a with your parameter name and its value. If you are using MSK for the first time, then it’ll make you a bit confused. No worries, I’ll give you the steps to do …

[Read more]
Build Production Grade Dedezium Cluster With Confluent Kafka

We are living in the DataLake world. Now almost every oraganization wants their reporting in Near Real Time. Kafka is of the best streaming platform for realtime reporting. Based on the Kafka connector, RedHat designed the Debezium which is an OpenSource product and high recommended for real time CDC from transnational databases. I referred many blogs to setup this cluster. But I found just basic installation steps. So I setup this cluster for AWS with Production grade and publishing this blog.

A shot intro:

Debezium is a set of distributed services to capture changes in your databases so that your applications can see those changes and respond to them. Debezium records all row-level changes within each database table in a change event stream, and applications simply read these streams to see the change events in the same order in which they occurred.

Basic Tech Terms:

  • Kafka …
[Read more]
Build Production Grade Debezium Cluster With Confluent Kafka

We are living in the DataLake world. Now almost every organizations wants their reporting in Near Real Time. Kafka is of the best streaming platform for realtime reporting. Based on the Kafka connector, RedHat designed the Debezium which is an OpenSource product and high recommended for real time CDC from transnational databases. I referred many blogs to setup this cluster. But I found just basic installation steps. So I setup this cluster for AWS with Production grade and publishing this blog.

A shot intro:

Debezium is a set of distributed services to capture changes in your databases so that your applications can see those changes and respond to them. Debezium records all row-level changes within each database table in a change event stream, and applications simply read these streams to see the change events in the same order in which they occurred.

Basic Tech Terms:

  • Kafka …
[Read more]
Showing entries 1 to 10 of 24
10 Older Entries »