Showing entries 1 to 10 of 12
2 Older Entries »
Displaying posts with tag: streaming (reset)
A new big data structure for streaming counters - bit length encoding

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]
Real-time streaming data aggregation

Dear Kettle users,

Most of you usually use a data integration engine to process data in a batch-oriented way.  Pentaho Data Integration (Kettle) is typically deployed to run monthly, nightly, hourly workloads.  Sometimes folks run micro-batches of work every minute or so.  However, it’s lesser known that our beloved transformation engine can also be used to stream data indefinitely (never ending) from a source to a target.  This sort of data integration is sometimes referred to as being “streaming“, “real-time“, “near real-time“, “continuous” and so on.  Typical examples of situations where you have a never-ending supply of data that needs to be processed the instance it becomes available are JMS (Java Message Service), RDBMS log sniffing, on-line fraud analyses, web or application …

[Read more]
OpenSQL Camp Europe and FrOSCon: A summary

With OpenSQL Camp and FrOSCon being over for almost a week now, it's time to come up with a short summary. I traveled home on Monday morning and then took Tuesday off, so I had some catching up to do...

As for the past years, FrOSCon rocked again! According to the closing keynote, they had around 1.500 (unique) visitors and I had a great time there. I really enjoyed meeting all the old and new faces of the various Open Source communities. The lineup of speakers was excellent, Jon "maddog" Hall's keynote about "Free and Open Source Software in the Developing World" was quite insightful and inspiring.

Most of the time I was busy with speaking at and …

[Read more]
Live video stream from OpenSQL Camp

Greetings from Sankt Augustin, Germany! I've arrived by train today and just returned from the FrOSCon venue, which will start tomorrow. The organizers are still busy with the preparations, but things already seem to be in good shape.

It was a mild and sunny evening today. Hopefully it will be the same tomorrow again, so we can enjoy a relaxed BBQ outside! The social event at FrOSCon is always a nice opportunity to meet and talk with fellow open source enthusiasts, users and developers.

And finally some good news for those of you who can't make it to FrOSCon this year: there will be live video streams from selected lecture rooms! So you will be able to attend the OpenSQL Camp sessions virtually - just head over to and select room "HS6". It'll be …

[Read more]
PBMS in Drizzle

Some of you may have noticed that blob streaming has been merged into the main Drizzle tree recently. There are a few hooks inside the Drizzle kernel that PBMS uses, and everything else is just in the plug in.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is a great addition to the AlsoSQL arsenal of Drizzle. I’m looking forward to it advancing and being adopted (now much easier that it’s in the main repository)

[Read more]
A join I/O manipulator for IOStream

I started playing around with protobuf when doing some stuff in Drizzle (more about that later), and since the examples where using IOStream, the table reader and writer that Brian wrote is using IOStreams. Now, IOStreams is pretty powerful, but it can be a pain to use, so of course I start tossing together some utilities to make it easier to work with.

Being a serious Perl addict since 20 years, I of course start missing a lot of nice functions for manipulating strings, and the most immediate one is join, so I wrote a C++ IOStream manipulator to join the elements of an arbitrary sequence and output them to an std::ostream.

In this case, since the I/O Manipulator takes arguments, it has to be written as a class. Recall that …

[Read more]
MySQL Conference Liveblogging: Introduction To The BLOB Streaming Project (Wednesday 3:00PM)
  • Paul McCullagh presents
  • BLOB
    • invented by Jim Starkey
    • Basic Large OBject
    • Binary Large OBject
    • photos, films, mp4 files, pdfs, etc
  • how MySQL handles BLOBs
    • mysql client send buffer -> receive buffer on the server (max_allowed_packet)
    • streaming a BLOB
      • continuous data stream
      • stream BLOB data directly in and out of the database
      • store BLOBs of any size (>4GB) in the database
      • create a scalable back-end that can handle any throughput and storage requirements. Wouldn't need to know in advance how big the database will get
      • provide an open system that can be used by all engines
      • provide extensions for …
[Read more]
New PBXT/MyBS release enables JDBC-based BLOB streaming!

This is quite a milestone for me! At last it possible to actually do some practical work with the BLOB streaming engine (MyBS)!

For this release I have completed changes to the MySQL Connector/J 5.0.7, to allow BLOB data to be transparently stored and retrieved from the MyBS BLOB repository. The new version of the driver is called MySQL Connector/J SE (streaming enabled).

Uploading a BLOB is as simple as using setBinaryStream() or setBlob() on INSERT or UPDATE. By using getBinaryStream() or getBlob() after a SELECT you get direct access to the data stream coming from the repository. More information and some examples are provided in the documentation at:

To try this out you need to install the latest versions of PBXT and MyBS. Both are available from: …

[Read more]
BLOB streaming engine (MyBS), version 0.5 Alpha released!

With some effort just before my holiday, I have managed to complete the release of the next version of MyBS, the BLOB streaming engine for MySQL.

This version includes all the basic functionality required to stream BLOB data in and out of MySQL tables.

The main features are:

  • Uploading of BLOB data directly into the database using HTTP PUT or GET methods.
  • Downloaded of BLOB data directly from the database using HTTP GET.
  • BLOB size may exceed 4GB - theoretical BLOB size limit of 256 Terabytes.
  • BLOBs are stored in a repository which manages references from other storage engine tables.
  • BLOBs are referenced by a URL.
  • URLs referencing BLOBs in the repository have a unique access code, for security.
  • The theoretical maximum repository size is 4 Zettabytes (2^72 bytes) per …
[Read more]
The MyBS Engine and the BLOB Repository

After some consideration I have decided to move the BLOB repository from PBXT to MyBS (§). This has the advantage that any engine that does not have its own BLOB repository (or is otherwise not suitable for storing large amounts of BLOB data) can reference BLOBs in the MyBS BLOB repository.

(§) MyBS stands for "BLOB Streaming for MySQL". The BLOB Streaming engine is a new storage engine for MySQL which allows you to stream media data directly in and out of the database. More info at

Lets look at an example of this. Assume my standard example table:

CREATE TABLE notes_tab (
n_id int PRIMARY KEY,
n_text longblob

And assume we have a file called blob_eg.txt with the contents "This is a BLOB Streaming upload test".

Firstly, I can upload a BLOB to the MyBS BLOB …

[Read more]
Showing entries 1 to 10 of 12
2 Older Entries »