Showing entries 1 to 10 of 163
10 Older Entries »
Displaying posts with tag: hadoop (reset)
Don’t Drown in your Data Lake

A data lake is “…a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms…”1. Many companies find value in using a data lake but aren’t clear that they need to properly plan for it and maintain it in order to prevent issues.

The idea of a data lake rose from the need to store data in a raw format that is accessible to a variety of applications and authorized users. Hadoop is often used to query the data, and the necessary structures for querying are created through the query tool (schema on read) rather than as part of the data design (schema on write). There are other tools available for analysis, and many cloud providers are actively developing additional options for creating …

[Read more]
MariaDB to Hadoop in Spanish

Nicolas Tobias has written an awesome guide to setting up replication from MariaDB to Hadoop/HDFS using Tungsten Replicator, in Spanish! He’s planning more of these so if you like what you see, please let him know!

Semana santa y yo con nuevas batallas que contar.
Me hayaba yo en el trabajo, pensando en que iba a invertir la calma que acompa;a a los dias de vacaciones que libremente podemos elegir trabajar y pense: No seria bueno terminar esa sincronizacion entre los servidores de mariaDB y HIVE?

Ya habia buscado algo de info al respecto en Enero hasta tenia una PoC montada con unas VM que volvi a encender, pero estaba todo podrido: no arrancaba, no funcionba ni siquiera me acordaba como lo habia hecho y el history de la shell er un galimatias. Decidi que si lo rehacia todo desde cero iba a poder dejarlo escrito en un playbook y ademas, aprenderlo y automatizarlo hasta el limite de poder desplegar de forma automatica on …

[Read more]
On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov

“Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.”–Nikita Ivanov.

I have interviewed Nikita Ivanov,CTO of GridGain.
Main topics of the interview are Apache Ignite, Apache Spark and MySQL, and how well they perform on big data analytics.

RVZ

Q1. What are the main technical challenges of SaaS development projects?

Nikita Ivanov: SaaS requires that the applications be highly responsive, reliable and web-scale. SaaS development projects face many of the same challenges as …

[Read more]
A roughneck walk down database alley

via GIPHY I was just responding to some Disqus comments on a recent blog post. Admittedly it had a provocative title Will SQL databases just die already. What do you think? Join 34,000 others and follow Sean Hull on twitter @hullsean. A reader pointed out that some No-SQL databases do support joins. Huh? My face … Continue reading A roughneck walk down database alley →

Will SQL just die already?

With tons of new No-SQL database offerings everyday, developers & architects have a lot of options. Cassandra, Mongodb, Couchdb, Dynamodb & Firebase to name a few. Join 33,000 others and follow Sean Hull on twitter @hullsean. What’s more in the data warehouse space, you have Hadoop, which can churn through terabytes of data and get … Continue reading Will SQL just die already? →

HopsFS based on MySQL Cluster 7.5 delivers a scalable HDFS

The swedish research institute, SICS, have worked hard for a few years on
developing a scalable and a highly available Hadoop implementation using
MySQL Cluster to store the metadata. In particular they have focused on the
Hadoop file system (HDFS) and the YARN. Using features of MySQL
Cluster 7.5 they were able to achieve linear scaling in number of name
nodes as well as in number of NDB data nodes to the number of nodes
available for the experiment (72 machines). Read the press release from
SICS here

The existing metadata layer of HDFS is based on a single Java server
that acts as name node in HDFS. There are implementations to ensure
that this metadata layer have HA by using a backup name node and to
use ZooKeeper for heartbeats and a number of …

[Read more]
Designing Euclid to Make Uber Engineering Marketing Savvy

In this article, we take a look at Euclid, Uber Engineering's Hadoop and Spark-based in-house marketing platform.

The post Designing Euclid to Make Uber Engineering Marketing Savvy appeared first on Uber Engineering Blog.

The Uber Engineering Tech Stack, Part II: The Edge and Beyond

The end of a two-part series on the tech stack that Uber Engineering uses to make transportation as reliable as running water, everywhere, for everyone, as of spring 2016.

The post The Uber Engineering Tech Stack, Part II: The Edge and Beyond appeared first on Uber Engineering Blog.

Eight Ways To Ensure Your Applications Are Enterprise-Ready

When it comes to building database applications and solutions, developers, DBAs, engineers and architects have a lot of new and exciting tools and technologies to play with, especially with the Hadoop and NoSQL environments growing so rapidly.

While it’s easy to geek out about these cool and revolutionary new technologies, at some point in the development cycle you’ll need to stop to consider the real-world business implications of the application you’re proposing. After all, you’re bound to face some tough questions, like:

Why did you choose that particular database for our mission-critical application? Can your team provide 24/7 support for the app? Do you have a plan to train people on this new technology? Do we have the right hardware infrastructure to support the app’s deployment? How are you going to ensure there won’t be any bugs or security vulnerabilities?

If you don’t have a plan for …

[Read more]
On Using HPE Vertica. Interview with Eva Donaldson.

“After you have built out your data lake, use it. Ask it questions. You will begin to see patterns where you want to dig deeper. The Hadoop ecosystem doesn’t allow for that digging and not at a speed that is customer facing. For that, you need some sort of analytical database.”– Eva Donaldson.

I have interviewed Eva Donaldson, software engineer and data architect at iContact. Main topic of the interview is her experience in using HPE Vertica.

RVZ

Q1. What is the business of iContact?

Eva Donaldson: iContact is a provider of cloud based email marketing, marketing automation and social media marketing products. We offer expert advice, design services, and an award-winning Salesforce email integration and Google Analytics tracking features specializing in …

[Read more]
Showing entries 1 to 10 of 163
10 Older Entries »