Showing entries 1 to 10 of 156
10 Older Entries »
Displaying posts with tag: hadoop (reset)
The Uber Engineering Tech Stack, Part II: The Edge and Beyond

The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.

The post The Uber Engineering Tech Stack, Part II: The Edge and Beyond appeared first on Uber Engineering Blog.

Eight Ways To Ensure Your Applications Are Enterprise-Ready

When it comes to building database applications and solutions, developers, DBAs, engineers and architects have a lot of new and exciting tools and technologies to play with, especially with the Hadoop and NoSQL environments growing so rapidly.

While it’s easy to geek out about these cool and revolutionary new technologies, at some point in the development cycle you’ll need to stop to consider the real-world business implications of the application you’re proposing. After all, you’re bound to face some tough questions, like:

Why did you choose that particular database for our mission-critical application? Can your team provide 24/7 support for the app? Do you have a plan to train people on this new technology? Do we have the right hardware infrastructure to support the app’s deployment? How are you going to ensure there won’t be any bugs or security vulnerabilities?

If you don’t have a plan for …

[Read more]
On Using HPE Vertica. Interview with Eva Donaldson.

“After you have built out your data lake, use it. Ask it questions. You will begin to see patterns where you want to dig deeper. The Hadoop ecosystem doesn’t allow for that digging and not at a speed that is customer facing. For that, you need some sort of analytical database.”– Eva Donaldson.

I have interviewed Eva Donaldson, software engineer and data architect at iContact. Main topic of the interview is her experience in using HPE Vertica.

RVZ

Q1. What is the business of iContact?

Eva Donaldson: iContact is a provider of cloud based email marketing, marketing automation and social media marketing products. We offer expert advice, design services, and an award-winning Salesforce email integration and Google Analytics tracking features specializing in …

[Read more]
Rosetta Stone: MySQL, Pig and Spark (Basics)

In a world where new data processing languages appear every day, it can be helpful to have tutorials explaining language characteristics in detail from the ground up.  This blog post is not such a tutorial.   It also isn’t a tutorial on getting started with MySQL or Hadoop, nor is it a list of best practices for the various languages I’ll reference here – there are bound to be better ways to accomplish certain tasks, and where a choice was required, I’ve emphasized clarity and readability over performance.  Finally, this isn’t meant to be a quickstart for SQL experts to access Hadoop – there are a number of SQL interfaces to Hadoop such as Impala or Hive that make Hadoop incredibly accessible to those with existing SQL skills.

Instead, this post is a pale equivalent of the …

[Read more]
A Grand Tour of Big Data. Interview with Alan Morrison

“Leading enterprises have a firm grasp of the technology edge that’s relevant to them. Better data analysis and disambiguation through semantics is central to how they gain competitive advantage today.”–Alan Morrison.

I have interviewed Alan Morrison, senior research fellow at PwC, Center for Technology and Innovation.
Main topic of the interview is how the Big Data market is evolving.

RVZ

Q1. How do you see the Big Data market evolving? 

Alan Morrison: We should note first of all how true Big Data and analytics methods emerged and what has been disruptive. Over the course of a decade, web companies have donated IP and millions of lines of code that serves as the foundation for what’s being built on top.  In the …

[Read more]
How to Deploy a Cluster

 

In this blog post I will talk about how to deploy a cluster, the methods I tried and my solution to resolving the prerequisites problem.

I’m fairly new to the big data field. Learning about Hadoop, I kept hearing the term “clusters”, deploying a cluster, and installing some services on namenode, some on datanode and so on. I also heard about Cloudera manager which helps me to deploy services on my cluster, so I set up a VM and followed several tutorials including the Cloudera documentation to install cloudera manager. However, every time I reached the “cluster installation” step my installation failed. I later found out that there are several prerequisites for a Cloudera Manager Installation, which was the reason for the failure to install.

 

Deploy a Cluster

Though I discuss 3 other methods in detail, ultimately I recommend method …

[Read more]
Using Apache Spark and MySQL for Data Analysis

What is Spark

Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it:

Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provides performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

In contrast to popular belief, Spark does not require all data to fit into memory but will use caching to speed up the operations …

[Read more]
Log Buffer #443: A Carnival of the Vanities for DBAs

This Log Buffer Edition finds and publishes blog posts from Oracle, SQL Server and MySQL.

Oracle:

  • SCRIPT execution errors when creating a DBaaS instance with local and cloud backups.
  • Designing Surrogate vs Natural Keys with Oracle SQL Developer.
  • EBS General Ledger – Accounting Hub Reporting Cloud Service.
  • Oracle Database Standard Edition 2 is available.
  • Disable “Build After Save at …
[Read more]
Replication in real-time from Oracle and MySQL into data warehouses and analytics

Analyzing transactional data is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading data, getting effective data in real-time into your data warehouse store is a more difficult problem. VMware Continuent provides

What’s the latest with Hadoop

The Big Data explosion in recent years has created a vast number of new technologies in the area of data processing, storage, and management. One of the biggest names to appear on the scene is Hadoop. In case you need a quick review, Hadoop is a Big Data storage system that takes in large amounts of data from servers and breaks it into smaller, manageable chunks. The technology is complex but at a high level the Hadoop ecosystem essentially takes a “divide and conquer” approach to processing Big Data instead of processing data in tables, as in a relational database like Oracle or MySQL.

 

 

One projection expects …

[Read more]
Showing entries 1 to 10 of 156
10 Older Entries »