Showing entries 1 to 10 of 15
5 Older Entries »
Displaying posts with tag: streaming (reset)
RealTime CDC From MySQL Using AWS MSK With Debezium

Credit: AWS

CDC is becoming more popular nowadays. Many organizations that want to build a realtime/near realtime data pipe and reports are using the CDC as a backbone to powering their real-time reports. Debezium is an opensource product from RedHat and it supports multiple databases (both SQL and NoSQL). Apache Kafka is the core of this.

Managing and scaling Kafka clusters is not easy for everyone. If you are using AWS for your infra then let AWS manage the cluster. AWS has MSK is a managed Kafka service. We are going to configure Debezium with AWS MSK.

Configuration File:

If you are already worked with AWS MSK, then you might be familiar with this configuration file. This is similar to the RDS Parameter group but here you need to upload a with your parameter name and its value. If you are using MSK for the first time, then it’ll make you a bit confused. No worries, I’ll give you the steps to do …

[Read more]
Complete Megalist: 25 Helpful Tools For Back-End Developers

 

The website or mobile app is the storefront for participating in the modern digital era. It’s your portal for inviting users to come and survey your products and services. Much attention focuses on front-end development; this is where the HMTL5, CSS, and JavaScript are coded to develop the landing page that everyone sees when they visit your site.

 

But the real magic happens on the backend. This is the ecosystem that really powers your website. One writer has articulated this point very nicely as follows:

 

The technology and programming that “power” a site—what your end user doesn’t see but what makes the site run—is called the back end. Consisting of the server, the database, and the server-side applications, it’s the behind-the-scenes functionality—the brain of a site. …

[Read more]
A new big data structure for streaming counters - bit length encoding

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]
A new big data structure for streaming counters - bit length encoding

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]
Real-time streaming data aggregation

Dear Kettle users,

Most of you usually use a data integration engine to process data in a batch-oriented way.  Pentaho Data Integration (Kettle) is typically deployed to run monthly, nightly, hourly workloads.  Sometimes folks run micro-batches of work every minute or so.  However, it’s lesser known that our beloved transformation engine can also be used to stream data indefinitely (never ending) from a source to a target.  This sort of data integration is sometimes referred to as being “streaming“, “real-time“, “near real-time“, “continuous” and so on.  Typical examples of situations where you have a never-ending supply of data that needs to be processed the instance it becomes available are JMS (Java Message Service), RDBMS log sniffing, on-line fraud analyses, web or application …

[Read more]
OpenSQL Camp Europe and FrOSCon: A summary

With OpenSQL Camp and FrOSCon being over for almost a week now, it's time to come up with a short summary. I traveled home on Monday morning and then took Tuesday off, so I had some catching up to do...

As for the past years, FrOSCon rocked again! According to the closing keynote, they had around 1.500 (unique) visitors and I had a great time there. I really enjoyed meeting all the old and new faces of the various Open Source communities. The lineup of speakers was excellent, Jon "maddog" Hall's keynote about "Free and Open Source Software in the Developing World" was quite insightful and inspiring.

Most of the time I was busy with speaking at and …

[Read more]
Live video stream from OpenSQL Camp

Greetings from Sankt Augustin, Germany! I've arrived by train today and just returned from the FrOSCon venue, which will start tomorrow. The organizers are still busy with the preparations, but things already seem to be in good shape.

It was a mild and sunny evening today. Hopefully it will be the same tomorrow again, so we can enjoy a relaxed BBQ outside! The social event at FrOSCon is always a nice opportunity to meet and talk with fellow open source enthusiasts, users and developers.

And finally some good news for those of you who can't make it to FrOSCon this year: there will be live video streams from selected lecture rooms! So you will be able to attend the OpenSQL Camp sessions virtually - just head over to http://live.froscon.org/ and select room "HS6". It'll be …

[Read more]
PBMS in Drizzle

Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)

[Read more]
A join I/O manipulator for IOStream

I started playing around with protobuf when doing some stuff in Drizzle (more about that later), and since the examples where using IOStream, the table reader and writer that Brian wrote is using IOStreams. Now, IOStreams is pretty powerful, but it can be a pain to use, so of course I start tossing together some utilities to make it easier to work with.

Being a serious Perl addict since 20 years, I of course start missing a lot of nice functions for manipulating strings, and the most immediate one is join, so I wrote a C++ IOStream manipulator to join the elements of an arbitrary sequence and output them to an std::ostream.

In this case, since the I/O Manipulator takes arguments, it has to be written as a class. Recall that …

[Read more]
MySQL Conference Liveblogging: Introduction To The BLOB Streaming Project (Wednesday 3:00PM)
  • Paul McCullagh presents
  • BLOB
    • invented by Jim Starkey
    • Basic Large OBject
    • Binary Large OBject
    • photos, films, mp4 files, pdfs, etc
  • how MySQL handles BLOBs
    • mysql client send buffer -> receive buffer on the server (max_allowed_packet)
    • streaming a BLOB
      • continuous data stream
      • stream BLOB data directly in and out of the database
      • store BLOBs of any size (>4GB) in the database
      • create a scalable back-end that can handle any throughput and storage requirements. Wouldn't need to know in advance how big the database will get
      • provide an open system that can be used by all engines
      • provide extensions for …
[Read more]
Showing entries 1 to 10 of 15
5 Older Entries »