Showing entries 1 to 10 of 210
10 Older Entries »
Displaying posts with tag: big data (reset)
Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

By Min Cai & Mayank Bansal

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data …

The post Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads appeared first on Uber Engineering Blog.

Percona Live Europe Presents: ClickHouse at Messagebird: Analysing Billions of Events in Real-Time*

We’ll look into how Clickhouse allows us to ingest a large amount of data and run complex analytical interactive queries at MessageBird,. We also present the business needs that brought ClickHouse to our attention and detail the journey to its deployment. We cover the problems we faced, and how we dealt with them. We talk about our current Cloud production setup and how we deployed and use it.

We are really enthusiastic to share a use case of Clickhouse, how it helped us to scale our analytics stack with the good, the bad and the ugly.

The talk could be useful to newcomers and everyone wondering if Clickhouse could be useful to them.

What we’re looking forward to…

There are many talks, but these are among the top ones we’re looking forward to in particular:

[Read more]
One Billion Tables in MySQL 8.0 with ZFS

The short version

I created > one billion InnoDB tables in MySQL 8.0 (tables, not rows) just for fun. Here is the proof:

$ mysql -A
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1425329
Server version: 8.0.12 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> select count(*) from information_schema.tables;
+------------+
| count(*)   |
+------------+
| 1011570298 |
+------------+
1 row in set (6 hours 57 min 6.31 sec)

Yes, it took 6 hours and 57 minutes to count them all!

Why does anyone need one billion tables?

In my previous blog post, I created and tested  …

[Read more]
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

By Reza Shiftehfar

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying …

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

40 million tables in MySQL 8.0 with ZFS

In my previous blog post about millions of table in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this blog post I’m showing how to do it and what challenges we can expect.

Background

Once again – why do we need so many tables in MySQL, what is the use case? The main reason is: customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or “database” in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:

[Read more]
On Innovation. Interview with Scott McNealy

“We made it a point to hire really smart, visionary people and then let them do their work.
I wanted to delegate and let people be in charge of things. My own decision-making process was to decide who got to decide. To make decisions, you have to first outline the problem, and if you hire really great people, they’re going to know more about the problem they’re dealing with than you ever will.”–Scott McNealy

I have interviewed Scott McNealy. Scott is a Silicon Valley pioneer, most famous for co-founding Sun Microsystems in 1982. We talked about Innovation, AI, Big Data, Redis, Curriki and Wayin.

RVZ

Q1. You co-Founded Sun Microsystems in 1982, and served as CEO and Chairman of the Board for 22 years. What are the main lessons learned in all these years?

Scott …

[Read more]
Don’t Drown in your Data Lake

A data lake is “…a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms…”1. Many companies find value in using a data lake but aren’t clear that they need to properly plan for it and maintain it in order to prevent issues.

The idea of a data lake rose from the need to store data in a raw format that is accessible to a variety of applications and authorized users. Hadoop is often used to query the data, and the necessary structures for querying are created through the query tool (schema on read) rather than as part of the data design (schema on write). There are other tools available for analysis, and many cloud providers are actively developing additional options for creating …

[Read more]
Percona Live 2018: Securing Access to Facebook’s Databases

We’re moving along at Percona Live 2018, and there are still packed and energetic talks after lunch.

My next session was with Andrew Regner, Production Engineer at Facebook. His talk was on securing access to Facebook’s databases.

Since the beginning, Facebook has used a conventional username/password to secure access to production MySQL instances. Over the last few years, they’ve been working on moving to x509 TLS client certificate authenticated connections. Given the many types of languages and systems at Facebook that use MySQL in some way, this required a massive amount of changes for a lot of teams.

This talk is both a technical overview of how their new solution works and hard-learned tricks for getting an entire company to change their underlying MySQL client libraries.

After his talk, …

[Read more]
On RDBMS, NoSQL and NewSQL databases. Interview with John Ryan

“The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood.”–John Ryan

I have interviewed John Ryan, Data Warehouse Solution Architect (Director) at UBS.

RVZ

Q1. You are an experienced Data Warehouse architect, designer and developer. What are the main lessons you have learned in your career?

John Ryan: The single most important lesson I’ve learned is to keep it simple. I find designers sometimes deliver over-complex, generic solutions that could (in theory) do anything, but in reality are remarkably difficult to operate, and often misunderstood. I believe this stems from a lack of understanding of the …

[Read more]
Archiving MySQL Tables in ClickHouse

In this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics.

Why Archive?

Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues:

  1. The larger the table and index, the slower the performance of all operations (both writes and reads)
  2. Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times

The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a …

[Read more]
Showing entries 1 to 10 of 210
10 Older Entries »