Showing entries 1 to 7
Displaying posts with tag: Data Modeling (reset)
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL

The making of Schemaless, Uber Engineering’s custom designed datastore using MySQL, which has allowed us to scale from 2014 to beyond. This is part one of a three-part series on Schemaless.

In Project Mezzanine we described how we migrated Uber’s …

The post Designing Schemaless, Uber Engineering’s Scalable Datastore Using MySQL appeared first on Uber Engineering Blog.

MySQL Workbench 5.2: changing the life of MySQL developers and DBAs

MySQL Workbench provides several necessary tools in order to manage a MySQL environment. MySQL Workbench 5.2 is available for Windows, Mac and Linux platforms, it encapsulates three main functionalities:

1.    SQL Development: Allows connecting to existing databases and run SQL Queries, SQL scripts, edit data and manage database objects. This functionality replaces the ones previously provided by the Query Browser stand-alone application.

2.    Data Modeling: Helps to create and manage models, allows forward & reverse engineering and provides capability to compare and synchronize schemas

3.    Server Administration: Helps to configure a database server, to setup user accounts and browse status variables and server logs


Let’s have a look deeper in each of those features:

A setting star

On my second blog I stated I would demonstrate a case where a denormalized Star schema is 50 times slower than a normalized data model.  Well, while writing this I looked at the tables again and realized that I didn't have optimal indexes and my statistics weren't optimal.  Yeah, I feel a bit foolish.  Once I fixed those issues there wasn't a performance difference.  I'll go more into this during the post test discussion. 

First, these are the table that will be used to test the performance of the normalized and denormalized data models.  The ProductNormal table represents the normalized table, has 10 million rows, and is about 0.8 gig. 


create table ProductNormal (
     productId int(11) not null,
     productName varchar(32) not null,
     productGroupId int(11) not null,

[Read more]
Denormalized data, not so fast?

The last article demonstrated that a denormalized data model was a bit faster than a normalized data model when all the data fit into memory, but only by 20-30 percent or so, at least for the experiments I ran.  This article deals with larger data volumes and shows a case where normalized data is faster than denormalized data and then details another case where a denormalized data model faster. 

First, this is the what the tables in question look like.  All the product tables have 10 million rows.  The denormalized ProductAll table is 5.2 gig, and as innodb_buffer_pool_size is set to 4 gig, this table won't fit into memory.  The filler column represents other product attributes without having to specify them in detail.


create table ProductAll (
    productAllId int(11) not null,
    productAllName varchar(32) not null,

[Read more]
Denormalized data is a bit faster than normalized data (but by enough?)

In the last article the performance impact of joins was shown.  This one will demonstrate cases where denormalized joins are a bit faster, as will the third article with larger data volumes.  The fourth article, the most interesting one, will show where a denormalized data model can be 50 times faster than a normalized data model. 

Here are the tables that will be involved in the sql.  The normalized ProductSmall table has a 100 million rows and is about 0.67 gig. 


create table ProductSmall (

    productSmallId int(11) not null,

    productSmallName varchar(32) not null,

    productGroupId int(11) not null,

    primary key (productSmallId),

    key idx_ProductSmall_productGroup (productGroupId),

    constraint con_ProductSmall_productGroup

[Read more]
Joins are slow, memory is fast

I've been working with various databases for a number of years. In that time I've found there is plenty of documentation on various features but not much on how database modeling impacts performance. Some people say one should use a fully normalized data model with a few denormalizations that are required for performance reasons, but those denormalizations are never discussed. And Ralph Kimball has many articles and a book concentrating on the physical data modeling, but they are more of the "here is how you solve the problem" and they don't detail why his method works better.

I've always found this odd as the physical data model has a major impact on database performance. I want this to be more of the whys behind various physical data modeling options with some examples showing the magnitude of the performance differences.

The first goal of this blog (over a few articles) will be to show cases where a denormalized dimensional data …

[Read more]
Showing entries 1 to 7