Showing entries 1 to 10 of 17
7 Older Entries »
Displaying posts with tag: ETL (reset)
Using JSON’s Arrays for MariaDB Dynamic Columns

The JSON format includes the concept of array. A JSON object cant contain an attribute of array type. We have seen that we can use the MariaDB CONNECT Storage Engine provided UDFs (user defined functions) to implement dynamic columns.

Let us create a table with a text column containing a a JSON string and let [...]

MariaDB CONNECT Storage Engine JSON Autodiscovery

The MariaDB CONNECT storage engine offers access to JSON file and allows you to see a external JSON file as a MariaDB table. A nice feature of the CONNECT storage Engine is its capability to auto discover a table structure when the table correspond to external data. In our case the CONNECT storage engine will automatically [...]

Log Buffer #429: A Carnival of the Vanities for DBAs

This Log Buffer Edition gathers a wide sample of blogs and then purifies the best ones from Oracle, SQL Server and MySQL.

Oracle:

  • If you take a look at the “alter user” command in the old 9i documentation, you’ll see this: DEFAULT ROLE Clause.
  • There’s been an interesting recent discussion on the OTN Database forum regarding “Index blank blocks after a large update that was rolled back.”
  • 12c Parallel Execution New Features: 1 SLAVE distribution
  • Index Tree Dumps in Oracle 12c …
[Read more]
Resources for Database Clusters: Performance Tuning for HAProxy, Support for MariaDB 10, Technical Blogs & More

August 28, 2014 By Severalnines Check Out Our Latest Resources for MySQL, MariaDB & MongoDB Clusters

 

Here is a summary of resources & tools that we’ve made available to you in the past weeks. If you have any questions on these, feel free to contact us!

 

New Technical Webinars

 

Performance Tuning of HAProxy for Database Load Balancing

09 September 2014 - with Baptiste Assmann of HAProxy Technologies

Do you know what HAProxy can tell you about your application and database instances? Do you know the difference …

[Read more]
Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics

June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.

 

Test Case

 

We have an application …

[Read more]
MariaDB CONNECT Storage Engine as an ETL (or ELT) ?

The MariaDB CONNECT Storage Engine allows to access heterogeneous data sources. In my previous post I show you how to use the MariaDB CONNECT Storage Engine to access an Oracle database. This is quite easy through the CONNECT Storage Engine ODBC table type.

For most architectures where heterogeneous databases are involved an ETL (Extract-Transform-Load) is [...]

Exploring SAP HANA – Powering Next Generation Analytics

SAP HANA , having entered the data 2.0/3.0 space at the right time, has been getting traction lately; and there will be lot of users like me who wants to[...]

Take the time now for gains later.

Regardless of which data warehouse paradigm you follow or have heard of, Kimball or Inmon. We should all agree that the data warehouse is often a requirement for business. Different people want different things and they all want it from your data. The data warehouse is not a new concept and yet they are over looked at times. A warehouse is never complete, it is an evolving entity that adjusts with the requirements it is given. It is up to us to make sure that the access to enterprise data in an accurate and timely manner is easy and the standard. MySQL can handle a data warehouse perfectly.
MySQL databases are designed in numerous ways, some good some bad. A warehouse can take that data and organize it for the best use of others. What concerns or issues do you often hear when it comes to gathering data from your database? It is easy for all of your developers to query and get the same data? How many ways does your company slice and dice data? …

[Read more]
HPCC vs Hadoop at a glance

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the HPCC Systems …

[Read more]
Memory tuning fast paced ETL

Dear Kettle friends,

on occasion we need to support environments where not only a lot of data needs to be processed but also in frequent batches.  For example, a new data file with hundreds of thousands of rows arrives in a folder every few seconds.

In this setting we want to use clustering to use “commodity” computing resources in parallel.  In this blog post I’ll detail how the general architecture would look like and how to tune memory usage in this environment.

Clustering was first created around the end of 2006.  Back then it looked like this.

The master

This is the most important part of our cluster.  It takes care of administrating network configuration and topology.  It also keeps track of the state of dynamically added slave servers.

The master …

[Read more]
Showing entries 1 to 10 of 17
7 Older Entries »