Showing entries 11 to 20 of 164
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: hadoop (reset)
Rosetta Stone: MySQL, Pig and Spark (Basics)

In a world where new data processing languages appear every day, it can be helpful to have tutorials explaining language characteristics in detail from the ground up.  This blog post is not such a tutorial.   It also isn’t a tutorial on getting started with MySQL or Hadoop, nor is it a list of best practices for the various languages I’ll reference here – there are bound to be better ways to accomplish certain tasks, and where a choice was required, I’ve emphasized clarity and readability over performance.  Finally, this isn’t meant to be a quickstart for SQL experts to access Hadoop – there are a number of SQL interfaces to Hadoop such as Impala or Hive that make Hadoop incredibly accessible to those with existing SQL skills.

Instead, this post is a pale equivalent of the …

[Read more]
A Grand Tour of Big Data. Interview with Alan Morrison

“Leading enterprises have a firm grasp of the technology edge that’s relevant to them. Better data analysis and disambiguation through semantics is central to how they gain competitive advantage today.”–Alan Morrison.

I have interviewed Alan Morrison, senior research fellow at PwC, Center for Technology and Innovation.
Main topic of the interview is how the Big Data market is evolving.

RVZ

Q1. How do you see the Big Data market evolving? 

Alan Morrison: We should note first of all how true Big Data and analytics methods emerged and what has been disruptive. Over the course of a decade, web companies have donated IP and millions of lines of code that serves as the foundation for what’s being built on top.  In the …

[Read more]
How to Deploy a Cluster

 

In this blog post I will talk about how to deploy a cluster, the methods I tried and my solution to resolving the prerequisites problem.

I’m fairly new to the big data field. Learning about Hadoop, I kept hearing the term “clusters”, deploying a cluster, and installing some services on namenode, some on datanode and so on. I also heard about Cloudera manager which helps me to deploy services on my cluster, so I set up a VM and followed several tutorials including the Cloudera documentation to install cloudera manager. However, every time I reached the “cluster installation” step my installation failed. I later found out that there are several prerequisites for a Cloudera Manager Installation, which was the reason for the failure to install.

 

Deploy a Cluster

Though I discuss 3 other methods in detail, ultimately I recommend method …

[Read more]
Using Apache Spark and MySQL for Data Analysis

What is Spark

Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it:

Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provides performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

In contrast to popular belief, Spark does not require all data to fit into memory but will use caching to speed up the operations …

[Read more]
Log Buffer #443: A Carnival of the Vanities for DBAs

This Log Buffer Edition finds and publishes blog posts from Oracle, SQL Server and MySQL.

Oracle:

  • SCRIPT execution errors when creating a DBaaS instance with local and cloud backups.
  • Designing Surrogate vs Natural Keys with Oracle SQL Developer.
  • EBS General Ledger – Accounting Hub Reporting Cloud Service.
  • Oracle Database Standard Edition 2 is available.
  • Disable “Build After Save at …
[Read more]
Replication in real-time from Oracle and MySQL into data warehouses and analytics

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem. VMware Continuent provides

What’s the latest with Hadoop

The Big Data explosion in recent years has created a vast number of new technologies in the area of data processing, storage, and management. One of the biggest names to appear on the scene is Hadoop. In case you need a quick review, Hadoop is a Big Data storage system that takes in large amounts of data from servers and breaks it into smaller, manageable chunks. The technology is complex but at a high level the Hadoop ecosystem essentially takes a “divide and conquer” approach to processing Big Data instead of processing data in tables, as in a relational database like Oracle or MySQL.

 

 

One projection expects …

[Read more]
2015: More innovation, but still a year of transition

First things first: I could use this title for every year, it is an evergreen. In order for this title to make sense, there must be a specific context and in this case the context is Big Data. We have seen new ideas and many announcements in 2014, and in 2015 those ideas will shape up and early versions of innovative products will start flourishing. Like many other people, I prepared some comments and opinions to post back in early January then, soon after the season’s break, I started flying around the world and the daily routine kept me away from the blog for some time. So, as a good last blogger, it may be time for me to post my own predictions, for the joy of my usual 25 readers. Small Data, Big Data, Any Data The term Big Data is often misused. Many different architectures, objectives, projects and issues deviate from its initial meaning. Everything today seems to be “Big Data” – whether you collect structured or …

[Read more]
On Hadoop RDBMS. Interview with Monte Zweben.

“HBase and Hadoop are the only technologies proven to scale to dozens of petabytes on commodity servers, currently being used by companies such as Facebook, Twitter, Adobe and Salesforce.com.”–Monte Zweben.

Is it possible to turn Hadoop into a RDBMS? On this topic, I have interviewed Monte Zweben, Co-Founder and Chief Executive Officer of Splice Machine.

RVZ

Q1. What are the main challenges of applications and operational analytics that support real-time, interactive queries on data updated in real-time for Big Data?

Monte Zweben: Let’s break down “real-time, interactive queries on data updated in real-time for Big Data”. “Real-time, interactive queries” means that results need to be returned in milliseconds to a few seconds.
For “Data updated in real-time” to happen, …

[Read more]
An Ending and a Beginning: VMware Has Acquired Continuent

As of today, Continuent is part of VMware. We are absolutely over the moon about it.


You can read more about the news on the VMware vCloud blog by Ajay Patel, our new boss. There’s also an official post on our Continuent company blog. In a nutshell the Continuent team is joining the VMware Cloud Services Division. We will continue to improve, sell, and support our Tungsten products and work on innovative integration into VMware’s product line.


So why do I feel exhilarated about joining VMware? There are three reasons. 


1.     Continuent is joining a world-class company that is the leader in virtualization and cloud infrastructure solutions. Even …

[Read more]
Showing entries 11 to 20 of 164
« 10 Newer Entries | 10 Older Entries »