Showing entries 1 to 10 of 211
10 Older Entries »
Displaying posts with tag: big data (reset)
MySQL table to JSON with 10 lines of Spark

Apache Spark is the de facto framework of the big data world. Any serious organization that’s dealing with big data uses Spark almost exclusively. Though, it has some caveats. For the starter, it’s hard to use. And it’s very confusing to get started with, even for those with a solid …

The post MySQL table to JSON with 10 lines of Spark appeared first on Geeky Hacker.

Managing Big Data with MySQL Can be Challenging

Author: Robert Agar

MySQL is an extremely popular open-source database platform originally developed by Oracle. It currently is the second most popular database management system in the world, only trailing Oracle’s proprietary offering. If you aim to be a professional database administrator, knowledge of MySQL is almost a prerequisite. It is an important part of the multi-platform database environment found in the majority of IT departments.

A recent addition that has added to the complexity of managing a MySQL environment is the introduction of big data. Big data is characterized by the volume, velocity, and variety of information that is gathered and which needs to be processed. It can be used to provide an organization with the business …

[Read more]
DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake

Keeping the Uber platform reliable and real-time across our global markets is a 24/7 business. People may be going to sleep in San Francisco, but in Paris they’re getting ready for work, requesting rides from Uber driver-partners. At that same …

The post DBEvents: A Standardized Framework for Efficiently Ingesting Data into Uber’s Apache Hadoop Data Lake appeared first on Uber Engineering Blog.

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management …

The post Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads appeared first on Uber Engineering Blog.

Percona Live Europe Presents: ClickHouse at Messagebird: Analysing Billions of Events in Real-Time*

We’ll look into how Clickhouse allows us to ingest a large amount of data and run complex analytical interactive queries at MessageBird,. We also present the business needs that brought ClickHouse to our attention and detail the journey to its deployment. We cover the problems we faced, and how we dealt with them. We talk about our current Cloud production setup and how we deployed and use it.

We are really enthusiastic to share a use case of Clickhouse, how it helped us to scale our analytics stack with the good, the bad and the ugly.

The talk could be useful to newcomers and everyone wondering if Clickhouse could be useful to them.

What we’re looking forward to…

There are many talks, but these are among the top ones we’re looking forward to in particular:

[Read more]
One Billion Tables in MySQL 8.0 with ZFS

The short version

I created > one billion InnoDB tables in MySQL 8.0 (tables, not rows) just for fun. Here is the proof:

$ mysql -A
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1425329
Server version: 8.0.12 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> select count(*) from information_schema.tables;
+------------+
| count(*)   |
+------------+
| 1011570298 |
+------------+
1 row in set (6 hours 57 min 6.31 sec)

Yes, it took 6 hours and 57 minutes to count them all!

Why does anyone need one billion tables?

In my previous blog post, I created and tested  …

[Read more]
Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

40 million tables in MySQL 8.0 with ZFS

In my previous blog post about millions of table in MySQL 8, I was able to create one million tables and test the performance of it. My next challenge is to create 40 million tables in MySQL 8 using shared tablespaces (one tablespace per schema). In this blog post I’m showing how to do it and what challenges we can expect.

Background

Once again – why do we need so many tables in MySQL, what is the use case? The main reason is: customer isolation. With the new focus on security and privacy (take GDPR for example) it is much easier and more beneficial to create a separate schema (or “database” in MySQL terms) for each customer. That creates a new set of challenges that we will need to solve. Here is the summary:

[Read more]
On Innovation. Interview with Scott McNealy

“We made it a point to hire really smart, visionary people and then let them do their work.
I wanted to delegate and let people be in charge of things. My own decision-making process was to decide who got to decide. To make decisions, you have to first outline the problem, and if you hire really great people, they’re going to know more about the problem they’re dealing with than you ever will.”–Scott McNealy

I have interviewed Scott McNealy. Scott is a Silicon Valley pioneer, most famous for co-founding Sun Microsystems in 1982. We talked about Innovation, AI, Big Data, Redis, Curriki and Wayin.

RVZ

Q1. You co-Founded Sun Microsystems in 1982, and served as CEO and Chairman of the Board for 22 years. What are the main lessons learned in all these years?

Scott …

[Read more]
Don’t Drown in your Data Lake

A data lake is “…a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms…”1. Many companies find value in using a data lake but aren’t clear that they need to properly plan for it and maintain it in order to prevent issues.

The idea of a data lake rose from the need to store data in a raw format that is accessible to a variety of applications and authorized users. Hadoop is often used to query the data, and the necessary structures for querying are created through the query tool (schema on read) rather than as part of the data design (schema on write). There are other tools available for analysis, and many cloud providers are actively developing additional options for creating …

[Read more]
Showing entries 1 to 10 of 211
10 Older Entries »