Planet MySQL

Displaying posts with tag: ETL (reset)

May

2013

Exploring SAP HANA â€“ Powering Next Generation Analytics

Posted by Venu Anuganti on Thu 02 May 2013 20:19 UTC
Tags:

database, ETL, data warehouse, reporting, NoSQL, bigdata, MySQL, Data Analytics, Predictive Analytics, SAP HANA, Good and Bad about HANA, Hana Analytical Platform Overview, Hana History time travel table, Hana how to use for real time analytics, Hana MDX, Hana Predictive Analysis LIbrary, Hana R integration, Hana Rest API, Important features of SAP HANA, Pros and Cons of HANA, SAP HANA ANALYTICS, SAP HANA features, Sap Predictive Analytics How to

SAP HANA , having entered the data 2.0/3.0 space at the right time, has been getting traction lately; and there will be lot of users like me who wants to[...]

Nov

2012

Typical “Big” Data Architecture

Posted by Venu Anuganti on Fri 30 Nov 2012 22:15 UTC
Tags:

postgresql, sql, database, scalability, ETL, hadoop, data warehouse, MapReduce, hbase, reporting, cloudera, NoSQL, vertica, Hive, bigdata, MySQL, SAS, Big Data Architecture, Big Data Warehouse, Data Architecture, Impala, NoSQL and BigData, Data Analytics, Data Science, kognitio, druid

Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]

Apr

2012

The Conflicted Data Analyst

Posted by David Holoboff on Tue 24 Apr 2012 22:30 UTC
Tags:

Pentaho, ETL, analytics, JasperSoft, data warehouse, Infobright, Reports, data visualization, SSIS, Tableau, qlikview, spotfire

Inspired by a post from Juice Analytics.

We are a conflicted people. We love our TV and movie violence but worry that it ruins our children’s minds. We want to reduce healthcare costs, but don’t want to restrict the free market.

Conflicts like these leave little room for a satisfactory answer. Basic principles are in conflict and deeply-rooted desires run up against painful consequences. We

Jul

2011

Take the time now for gains later.

Posted by Keith Larson on Wed 13 Jul 2011 18:31 UTC
Tags:

ETL, warehouse

Regardless of which data warehouse paradigm you follow or have heard of, Kimball or Inmon. We should all agree that the data warehouse is often a requirement for business. Different people want different things and they all want it from your data. The data warehouse is not a new concept and yet they are over looked at times. A warehouse is never complete, it is an evolving entity that adjusts with the requirements it is given. It is up to us to make sure that the access to enterprise data in an accurate and timely manner is easy and the standard. MySQL can handle a data warehouse perfectly.
MySQL databases are designed in numerous ways, some good some bad. A warehouse can take that data and organize it for the best use of others. What concerns or issues do you often hear when it comes to gathering data from your database? It is easy for all of your developers to query and get the same data? How many ways does your company slice and dice data? …

[Read more]

Jun

2011

HPCC vs Hadoop at a glance

Posted by Roland Bouman on Sat 18 Jun 2011 08:22 UTC
Tags:

Open Source, gpl, Pentaho, ETL, hadoop, agpl, business intelligence, big data, sqoop, NoSQL, Pig, Hive, Roxie, Thor, Apache v2 license, ECL, HPCC Systems

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the HPCC Systems …

[Read more]

May

2011

Memory tuning fast paced ETL

Posted by Matt Casters on Tue 31 May 2011 19:12 UTC
Tags:

Data Integration, master, slave, ETL, Clustering, Kettle, memory, PDI, pentaho data integration, parallel, Never ending

Dear Kettle friends,

on occasion we need to support environments where not only a lot of data needs to be processed but also in frequent batches. For example, a new data file with hundreds of thousands of rows arrives in a folder every few seconds.

In this setting we want to use clustering to use “commodity” computing resources in parallel. In this blog post I’ll detail how the general architecture would look like and how to tune memory usage in this environment.

Clustering was first created around the end of 2006. Back then it looked like this.

The master

This is the most important part of our cluster. It takes care of administrating network configuration and topology. It also keeps track of the state of dynamically added slave servers.

The master …

[Read more]

May

2011

Dynamic de-normalization of attributes stored in key-value pair tables

Posted by Matt Casters on Mon 23 May 2011 14:06 UTC
Tags:

Data Integration, metadata, ETL, Normalization, de-normalization, pentaho data integration, injection, key value pairs

Dear Kettlers,

A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions. The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution. And if you have to, do some denormalization.

However, there are occasions where you need to query a source system and get some report going on them. Let’s take a look at an example :

mysql> select * from person;
+----+-------+----------+
| id | name  | lastname |
+----+-------+----------+
|  1 | Lex   | Luthor   |
|  2 | Clark | Kent     |
|  3 | Lois  | Lane     |
+----+-------+----------+
3 rows in set (0.00 sec)

mysql> select * from person_attribute;
+----+-----------+---------------+------------+
| id | person_id | attr_key      | attr_value | …

[Read more]

Feb

2011

Parse nasty XLS with dynamic ETL

Posted by Matt Casters on Fri 25 Feb 2011 12:07 UTC
Tags:

Data Integration, metadata, ETL, Kettle, Excel, pentaho data integration, injection, Spreadsheet

Dear Kettle friends,

Last year, right after the summer in version 4.1 of Pentaho Data Integration, we introduced the notion of dynamically inserted ETL metadata (Youtube video here). Since then we received a lot of positive feedback on this functionality which encouraged me to extend it to a few more steps. Already with support for “CSV Input” and “Select Values” we could do a lot of dynamic things. However, we can clearly do a lot better by extending our initiative to a few more steps: “Microsoft Excel Input” (which can also read ODS by the way), “Row Normalizer” and “Row De-normalizer”.

Below I’ll describe an actual (obfuscated) example that you will probably recognize as it is equally hideous as simple in it’s horrible complexity.

Take a look at this file:

Let’s assume that this spreadsheet …

[Read more]

Mar

2010

Log Buffer #182, a Carnival of the Vanities for DBAs

Posted by The Pythian Group on Fri 12 Mar 2010 17:01 UTC
Tags:

Oracle, convention, Security, Fun, scaling, Log Buffer, humor, scalability, drizzle, SQL Server, DB2, ETL, Migration, patch, io, column store, best practice, scale, slides, survey, alias, twitter, top, date, column, dw, Batch, NoSQL, cassandra, vertical, capacity planning, infinidb, Technical Blog, confoo, relational, Migrate, column-oriented, e-book, ebook, implicit commit, internship, iptables, mainframes, patching, STATISTICS IO, string, table alias, TSQL, MySQL

This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended.

This week was also the Hotsos Symposium. …

[Read more]

Apr

2009

Reporting redefined - How the Kickfire MySQL appliance simplifies data marts and analytics for the mass market.

Posted by Kickfire Team Blog on Sat 18 Apr 2009 05:26 UTC
Tags:

Uncategorized, sql, ETL, analytics, Kickfire, optimizer, dw, datamart, rolap, Performance

The Kickfire appliance is designed for business intelligence and analytical workloads, as opposed to OLTP (online transaction processing) environments. Most of the focus in the MySQL area right now revolves around increasing performance for OLTP type workloads, which makes sense as this is the traditional workload that MySQL has been used for. In contrast, Kickfire focuses squarely on analytic environments, delivering high performance execution of analytical and reporting queries .

A MySQL server with fast processors, fast disks (or ssd) and lot of memory will answer many OLTP queries easily. Kickfire will outperform such a server for typical analytical queries such as aggregation over a large number of rows.

A typical OLTP query might ask “What is the shipping address for this invoice?”. Contrast this with a typical analytical query, which asks “How much of this item did we sell in all of …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links