Showing entries 1 to 10 of 39
10 Older Entries »
Displaying posts with tag: data warehouse (reset)
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

By Reza Shiftehfar

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying …

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Databook: Turning Big Data into Knowledge with Metadata at Uber

By Luyao Li, Kaan Onuk, Lauren Tindal

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data. Data powers Uber’s global marketplace, enabling more reliable and seamless …

The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Easy and Effective Way of Building External Dictionaries for ClickHouse with Pentaho Data Integration Tool

In this post, I provide an illustration of how to use Pentaho Data Integration (PDI) tool to set up external dictionaries in MySQL to support ClickHouse. Although I use MySQL in this example, you can use any PDI supported source.

ClickHouse

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing. Source: wiki.

Pentaho Data Integration

Information from the Pentaho wiki: Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:

  • Migrating data between …
[Read more]
On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan

“The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood.”–John Ryan

I have interviewed John Ryan, Data Warehouse Solution Architect (Director) at UBS.

RVZ

Q1. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?

John Ryan: The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the …

[Read more]
On the future of Data Warehousing. Interview with Jacque Istok and Mike Waas

” Open source software comes with a promise, and that promise is not about looking at the code, rather it’s about avoiding vendor lock-in.” –Jacque Istok.

” The cloud has out-paced the data center by far and we should expect to see the entire database market being replatformed into the cloud within the next 5-10 years.” –Mike Waas.

I have interviewed Jacque Istok, Head of Data Technical Field for Pivotal, and Mike Waas, founder and CEO Datometry.
Main topics of the interview are: the future of Data Warehousing, how are open source and the Cloud affecting the Data Warehouse market, and Datometry Hyper-Q and Pivotal Greenplum.

RVZ

Q1. What is the future of Data Warehouses?

Jacque Istok: I believe that what we’re seeing …

[Read more]
Oracle HA, DR, data warehouse loading, and license reduction through edge apps

Database replication is vital enterprise technology but is dominated by inflexible, high-cost incumbents, especially in the case of Oracle. This webinar-on-demand introduces VMware Continuent replication and shows how it solves database replication problems in environments from bare metal to clouds. 

We introduce exciting new Oracle replication improvements that allow users to apply VMware

Replication in real-time from Oracle and MySQL into data warehouses and analytics

Practical tips and a live demo of how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from MySQL and Oracle into Amazon Redshift, HP Vertica and Hadoop.

Webinar-on-demand. Recorded 07/23/15.

Replication in real-time from Oracle and MySQL into data warehouses and analytics

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem. VMware Continuent provides

Real-time data loading from Oracle and MySQL to data warehouses, analytics

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem.In this webinar-on-demand we showcase

Data Warehouse in the Cloud - How to Upload MySQL data into Amazon Redshift for reporting and analytics

October 27, 2014 By Severalnines

The term data warehousing often brings to mind things like large complex projects, big businesses, proprietary hardware and expensive software licenses. With Hadoop came open source data analysis software that ran on commodity hardware, this helped address at least some of the cost aspects. We had previously blogged about MongoDB and MySQL to Hadoop. But setting up and maintaining a Hadoop infrastructure might still be out of reach for small businesses or small projects with limited budgets. Well, perhaps then you might want to have a look at Redshift.

[Read more]
Showing entries 1 to 10 of 39
10 Older Entries »