Showing entries 1 to 10 of 22
10 Older Entries »
Displaying posts with tag: ETL (reset)
On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan

“The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood.”–John Ryan

I have interviewed John Ryan, Data Warehouse Solution Architect (Director) at UBS.

RVZ

Q1. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?

John Ryan: The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the …

[Read more]
Incremental MYSQL loads to BigQuery using Matillion

As part of building an enterprise DW for one of our customers we had to sync a bunch of tables from a MYSQL slave to BigQuery at 30 min intervals. Considering the range of other non-relational data sources which will be part of the this load, we chose Matillion as ETL tool. Matillion is easy to setup (just provision the VM and start authoring jobs) and long list of integrations so it made sense.

This post explains building a Matillion job that does the following:

  1. Full Load
  2. Incremental load for tables with larger row count and an ID that can be looked up for new rows since last load.

MYSQL Drivers

If you came from a Google search looking for Matillion — I am assuming you are done with provisioning the instance, setting up default project etc are done, so I am skipping those. While Matillion ships with PostgreSQL drivers, for some reason it doesn’t have MYSQL …

[Read more]
What products & improvements are new on AWS?

Amazon is releasing new products & services to it’s global cloud compute network at a rate that has all of our heads spinning. Join 32,000 others and follow Sean Hull on twitter @hullsean. Here’s new stuff worth mentioning around databases & data. 1. For ETL – AWS GLUE Moving data from your transactional MySQL or … Continue reading What products & improvements are new on AWS? →

Using JSON’s Arrays for MariaDB Dynamic Columns

The JSON format includes the concept of array. A JSON object cant contain an attribute of array type. We have seen that we can use the MariaDB CONNECT Storage Engine provided UDFs (user defined functions) to implement dynamic columns. Let us create a table with a text column containing a a JSON string and let ...continue reading "Using JSON’s Arrays for MariaDB Dynamic Columns"

MariaDB CONNECT Storage Engine JSON Autodiscovery

The MariaDB CONNECT storage engine offers access to JSON file and allows you to see a external JSON file as a MariaDB table. A nice feature of the CONNECT storage Engine is its capability to auto discover a table structure when the table correspond to external data. In our case the CONNECT storage engine will automatically [...]

Log Buffer #429: A Carnival of the Vanities for DBAs

This Log Buffer Edition gathers a wide sample of blogs and then purifies the best ones from Oracle, SQL Server and MySQL.

Oracle:

  • If you take a look at the “alter user” command in the old 9i documentation, you’ll see this: DEFAULT ROLE Clause.
  • There’s been an interesting recent discussion on the OTN Database forum regarding “Index blank blocks after a large update that was rolled back.”
  • 12c Parallel Execution New Features: 1 SLAVE distribution
  • Index Tree Dumps in Oracle 12c …
[Read more]
Resources for Database Clusters: Performance Tuning for HAProxy, Support for MariaDB 10, Technical Blogs & More

August 28, 2014 By Severalnines Check Out Our Latest Resources for MySQL, MariaDB & MongoDB Clusters

 

Here is a summary of resources & tools that we’ve made available to you in the past weeks. If you have any questions on these, feel free to contact us!

 

New Technical Webinars

 

Performance Tuning of HAProxy for Database Load Balancing

09 September 2014 - with Baptiste Assmann of HAProxy Technologies

Do you know what HAProxy can tell you about your application and database instances? Do you know the difference …

[Read more]
Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics

June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.

 

Test Case

 

We have an application …

[Read more]
MariaDB CONNECT Storage Engine as an ETL (or ELT) ?

The MariaDB CONNECT Storage Engine allows to access heterogeneous data sources. In my previous post I show you how to use the MariaDB CONNECT Storage Engine to access an Oracle database. This is quite easy through the CONNECT Storage Engine ODBC table type.

For most architectures where heterogeneous databases are involved an ETL (Extract-Transform-Load) is [...]

Exploring SAP HANA – Powering Next Generation Analytics

SAP HANA , having entered the data 2.0/3.0 space at the right time, has been getting traction lately; and there will be lot of users like me who wants to[...]

Showing entries 1 to 10 of 22
10 Older Entries »