Dear Kettle users,
Most of you usually use a data integration engine to process data in a batch-oriented way. Pentaho Data Integration (Kettle) is typically deployed to run monthly, nightly, hourly workloads. Sometimes folks run micro-batches of work every minute or so. However, it’s lesser known that our beloved transformation engine can also be used to stream data indefinitely (never ending) from a source to a target. This sort of data integration is sometimes referred to as being “streaming“, “real-time“, “near real-time“, “continuous” and so on. Typical examples of situations where you have a never-ending supply of data that needs to be processed the instance it becomes available are JMS (Java Message Service), RDBMS log sniffing, on-line fraud analyses, web or application …
[Read more]