Showing entries 21 to 30 of 211
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: big data (reset)
Percona Live 2017 Tutorials Day

Welcome to the first day of the Percona Live Open Source Database Conference: Percona Live 2017 tutorials day! While technically the first day of the conference, this day focused on provided hands-on tutorials for people interested in learning directly how to use open source tools and technologies.

Today attendees went to training sessions taught by open source database experts and got first-hand experience configuring, working with, and experimenting with various open source technologies and software.

The first full day (which includes opening keynote speakers and breakout sessions) starts Tuesday 4/25 at 9:00 am.

Some of the …

[Read more]
Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark

This blog shares some column store database benchmark results, and compares the query performance of MariaDB ColumnStore v. 1.0.7 (based on InfiniDB), Clickhouse and Apache Spark.

I’ve already written about ClickHouse (Column Store database).

The purpose of the benchmark is to see how these three solutions work on a single big server, with many CPU cores and large amounts of RAM. Both systems are massively parallel (MPP) database systems, so they should use many cores for SELECT queries.

For the benchmarks, I chose …

[Read more]
Database Challenges and Innovations. Interview with Jim Starkey

“Isn’t it ironic that in 2016 a non-skilled user can find a web page from Google’s untold petabytes of data in millisecond time, but a highly trained SQL expert can’t do the same thing in a relational database one billionth the size?.–Jim Starkey.

I have interviewed Jim Starkey. A database legendJim’s career as an entrepreneur, architect, and innovator spans more than three decades of database history.

RVZ

Q1. In your opinion, what are the most significant advances in databases in the last few years?

Jim Starkey: I’d have to say the “atom programming model” where a database is layered on a substrate of peer-to-peer replicating distributed objects rather than disk files. The atom programming model enables scalability, redundancy, high availability, and distribution not available in traditional, disk-based database …

[Read more]
LinkedIn China new Social Platform Chitu. Interview with Dong Bin.

“Complicated queries, like looking for second degree friends, is really hard to traditional databases.” –Dong Bin

I have interviewed Dong Bin, Engineer Manager at LinkedIn China. The LinkedIn China development team launched a new social platform — known as Chitu — to attract a meaningful segment of the Chinese professional networking market.

RVZ

Q1. What is your role at LinkedIn China?

Dong Bin: I am an Engineer Manager in charge of the backend services for Chitu. The backend includes all Chitu`s consumer based features, like feeds, chat, event, etc.

Q2. You recently launched a new social platform, called Chitu. Which segment of the …

[Read more]
The Uber Engineering Tech Stack, Part II: The Edge and Beyond

Uber Engineering

Uber’s mission is transportation as reliable as running water, everywhere, for everyone. Last time, we talked about the foundation that powers Uber Engineering. Now, we’ll explore the parts of the stack that face riders and drivers, starting …

The post The Uber Engineering Tech Stack, Part II: The Edge and Beyond appeared first on Uber Engineering Blog.

A Grand Tour of Big Data. Interview with Alan Morrison

“Leading enterprises have a firm grasp of the technology edge that’s relevant to them. Better data analysis and disambiguation through semantics is central to how they gain competitive advantage today.”–Alan Morrison.

I have interviewed Alan Morrison, senior research fellow at PwC, Center for Technology and Innovation.
Main topic of the interview is how the Big Data market is evolving.

RVZ

Q1. How do you see the Big Data market evolving? 

Alan Morrison: We should note first of all how true Big Data and analytics methods emerged and what has been disruptive. Over the course of a decade, web companies have donated IP and millions of lines of code that serves as the foundation for what’s being built on top.  In the …

[Read more]
How to Deploy a Cluster

 

In this blog post I will talk about how to deploy a cluster, the methods I tried and my solution to resolving the prerequisites problem.

I’m fairly new to the big data field. Learning about Hadoop, I kept hearing the term “clusters”, deploying a cluster, and installing some services on namenode, some on datanode and so on. I also heard about Cloudera manager which helps me to deploy services on my cluster, so I set up a VM and followed several tutorials including the Cloudera documentation to install cloudera manager. However, every time I reached the “cluster installation” step my installation failed. I later found out that there are several prerequisites for a Cloudera Manager Installation, which was the reason for the failure to install.

 

Deploy a Cluster

Though I discuss 3 other methods in detail, ultimately I recommend method …

[Read more]
New VMware Continuent 5.0 – A powerful and cost efficient Oracle GoldenGate alternative!

VMware Continuent 5.0 is a complete data replication solution that includes all the functionality you need at one low price. In this webinar-on-demand, you’ll see how VMware Continuent delivers:  Migration. Replicate from an old version of Oracle, often running on non-Linux platform (Windows, AIX, HP-UX, Solaris), to a new version of Oracle (often running in Linux). VMware Continuent supports

Big Data: InfiniDB vs Spider: What else ?

Many of my recent engagements have been all around strategy to implement Real Time Big Data Analytics: Computing hardware cost of extending a single table collection with MariaDB and Parallel Query found in the Spider storage engine to offload columnar MPP storage like InfiniDB or Vertica.

As of today Parallel Query is only available from releases of MariaDB Spider supported by spiral arms. The more efficient way to use parallel query with Spider can be done on group by, and count queries that use a single spider table. In such case Spider Engine will execute query push down AKA map reduce.

Spider gets multiple levels of parallel execution for a single partitioned tables.

First level is per backend server:
The way to actually tell spider to scan different backends in concurrency is to set  spider_sts_bg_mode=1

Other level is per …

[Read more]
Replication in real-time from Oracle and MySQL into data warehouses and analytics

Practical tips and a live demo of how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from MySQL and Oracle into Amazon Redshift, HP Vertica and Hadoop.

Webinar-on-demand. Recorded 07/23/15.

Showing entries 21 to 30 of 211
« 10 Newer Entries | 10 Older Entries »