Planet MySQL

Displaying posts with tag: data warehouse (reset)

Oct

2018

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Posted by Uber Engineering on Wed 17 Oct 2018 16:00 UTC
Tags:

Apache, engineering, storage, Architecture, hadoop, data warehouse, big data, json, MySQL, Data Modeling, latency, Apache Hadoop, Docker, Apache Spark, Uber Data, PostgresSQL, hoodie, Apache Parquet, Hudi, Uber Eng

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks…

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Aug

2018

Databook: Turning Big Data into Knowledge with Metadata at Uber

Posted by Uber Engineering on Fri 03 Aug 2018 15:30 UTC
Tags:

postgres, Infrastructure, metadata, Architecture, data warehouse, Data Management, vertica, cassandra, quartz, Hive, MySQL, hdfs, Kafka, gradle, Uber, Uber Data, Data Storage, Databook, Dropwizard, Queryparser, RESTful API, Uber Data Knowledge, Uber Engineering

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data. Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, …

The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Aug

2018

Easy and Effective Way of Building External Dictionaries for ClickHouse with Pentaho Data Integration Tool

Posted by Percona Community on Thu 02 Aug 2018 16:09 UTC
Tags:

Tools, data warehouse, MySQL, open source databases, ClickHouse

In this post, I provide an illustration of how to use Pentaho Data Integration (PDI) tool to set up external dictionaries in MySQL to support ClickHouse. Although I use MySQL in this example, you can use any PDI supported source.

ClickHouse

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing. Source: wiki.

Pentaho Data Integration

Information from the Pentaho wiki: Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:

Migrating data between …

[Read more]

Mar

2018

On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan

Posted by Roberto V. Zicari on Fri 09 Mar 2018 11:05 UTC
Tags:

Oracle, Open Source, Uncategorized, Databases, ibm, interview, amazon, RDBMS, ETL, data warehouse, big data, NoSQL, mongodb, cassandra, redis, voltdb, newsql, MySQL, storm, MemSQL, Amazon Redshift, CockroachDB, Spark Streaming, Michael Stonebraker, Flink, Google Big Query, John Ryan, Lambda Architecture, Microsoft and, Snowflake, UBS

“The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood.”–John Ryan

I have interviewed John Ryan, Data Warehouse Solution Architect (Director) at UBS.

RVZ

Q1. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?

John Ryan: The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the …

[Read more]

Sep

2015

Oracle HA, DR, data warehouse loading, and license reduction through edge apps

Posted by Petri Virsunen of Continuent on Thu 24 Sep 2015 18:30 UTC
Tags:

Oracle, VMWare, data warehouse, Data Management, MySQL, Data Analytics, database replication

Database replication is vital enterprise technology but is dominated by inflexible, high-cost incumbents, especially in the case of Oracle. This webinar-on-demand introduces VMware Continuent replication and shows how it solves database replication problems in environments from bare metal to clouds.

We introduce exciting new Oracle replication improvements that allow users to apply VMware

Jul

2015

Replication in real-time from Oracle and MySQL into data warehouses and analytics

Posted by Petri Virsunen of Continuent on Thu 23 Jul 2015 21:00 UTC
Tags:

Oracle, data warehouse, mysql replication, big data, vertica, mapr, MySQL, HortonWorks, Apache Hadoop, Data Analytics, database replication, Amazon Redshift, HP Vertica

Practical tips and a live demo of how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from MySQL and Oracle into Amazon Redshift, HP Vertica and Hadoop.

Webinar-on-demand. Recorded 07/23/15.

Jun

2015

Replication in real-time from Oracle and MySQL into data warehouses and analytics

Posted by Petri Virsunen of Continuent on Thu 11 Jun 2015 21:00 UTC
Tags:

Oracle, VMWare, hadoop, data warehouse, continuent, big data, Data Management, NoSQL, MySQL, Data Analytics, database replication, Amazon Redshift, HP Vertica

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem. VMware Continuent provides

Feb

2015

Real-time data loading from Oracle and MySQL to data warehouses, analytics

Posted by Petri Virsunen of Continuent on Mon 23 Feb 2015 19:45 UTC
Tags:

Oracle, Replication, data warehouse, mongodb, Continuent Tungsten, Apache Hadoop, Data Analytics, database replication, Amazon Redshift, HP Vertica

Oct

2014

Data Warehouse in the Cloud - How to Upload MySQL data into Amazon Redshift for reporting and analytics

Posted by Severalnines on Mon 27 Oct 2014 14:23 UTC
Tags:

Other, scaling, Migration, analytics, data warehouse, reporting, big data, mariadb, galera, MySQL, redshift

October 27, 2014 By Severalnines

The term data warehousing often brings to mind things like large complex projects, big businesses, proprietary hardware and expensive software licenses. With Hadoop came open source data analysis software that ran on commodity hardware, this helped address at least some of the cost aspects. We had previously blogged about MongoDB and MySQL to Hadoop. But setting up and maintaining a Hadoop infrastructure might still be out of reach for small businesses or small projects with limited budgets. Well, perhaps then you might want to have a look at Redshift.

…

[Read more]

Sep

2014

Replicating from MySQL to Amazon Redshift

Posted by Petri Virsunen of Continuent on Fri 05 Sep 2014 01:00 UTC
Tags:

Oracle, amazon, hadoop, data warehouse, mysql replication, vertica, MySQL, database replication, #mysql, redshift

Continuent is delighted to announce an exciting Continuent Tungsten feature addition for MySQL users: replication in real-time from MySQL into Amazon RedShift.

In this webinar-on-demand we survey Continuent Tungsten capabilities for data warehouse loading, then zero in on practical details of setting up replication from MySQL into RedShift. We cover:

Introduction to real-time movement

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links