Planet MySQL

Displaying posts with tag: big data (reset)

Oct

2014

Posted by Robert Hodges on Mon 06 Oct 2014 15:37 UTC
Tags:

Replication, hadoop, tungsten, big data, mariadb, bigdata, MySQL

Computer science is like an enormous tool box you can rummage through whenever you have a problem to solve. Most of the tools are sturdy and practical, like algorithms for B-trees. Some are also elegant, like consistent hashing in Dynamo. Finally there are some tools that you never quite figure out even after years of reflection. That piece of steel you are looking at could be Excalibur. Or it could be a rusty knife.

The CAP theorem falls into the last category, at least for me. It was a major topic in the blogosphere a few years ago and Google Trends shows steadily increasing interest in the term since 2010. It's not my goal to explain CAP fully--a good informal description is …

[Read more]

Sep

2014

Managing big data? Say ‘hello’ to HP Vertica

Posted by MySQL Performance Blog on Thu 18 Sep 2014 14:29 UTC
Tags:

big data, primary, Insight for DBAs, MySQL, Mike Benshoof, HP Vertica

Over the past few months, I’ve seen an increase in the following use case while working on performance and schema review engagements:

I need to store exponentially increasing amounts of data and analyze all of it in real-time.

This is also known simply as: “We have big data.” Typically, this data is used for user interaction analysis, ad tracking, or other common click stream applications. However, it can also be seen in threat assessment (ddos mitigation, etc), financial forecasting, and other applications as well. While MySQL (and other OLTP systems) can handle this to a degree, it is by no means a forte. Some of the pain points include:

Cost of rapidly increasing, expensive disk storage (OLTP disks need to be fast == $$)
Performance decrease as the data size increases …

[Read more]

Aug

2014

Resources for Database Clusters: Performance Tuning for HAProxy, Support for MariaDB 10, Technical Blogs & More

Posted by Severalnines on Thu 28 Aug 2014 07:28 UTC
Tags:

Tools, Other, ha, Nginx, High Availability, webinar, ETL, analytics, hadoop, performance tuning, big data, mariadb, mongodb, haproxy, MySQL, clustercontrol

August 28, 2014 By Severalnines Check Out Our Latest Resources for MySQL, MariaDB & MongoDB Clusters

Here is a summary of resources & tools that we’ve made available to you in the past weeks. If you have any questions on these, feel free to contact us!

New Technical Webinars

Performance Tuning of HAProxy for Database Load Balancing

09 September 2014 - with Baptiste Assmann of HAProxy Technologies

Do you know what HAProxy can tell you about your application and database instances? Do you know the difference between …

[Read more]

Jul

2014

Hadoop BoF Session at OSCON

Posted by MC Brown on Fri 18 Jul 2014 10:26 UTC
Tags:

oscon, hadoop, continuent, cloudera, big data, MySQL, Presentations and Conferences, oscon2014

I have a BoF session next week at OSCON next week:

Migrating Data from MySQL and Oracle into Hadoop

The session is at 7pm Tuesday night – look for rooms D135 and/or D137/138.

Correction: We are now in E144 on Tuesday with the Hadoop get together first at 7pm, and the Data Migration to follow at 8pm.

I’m actually going to be joined by Gwen Shapira from Cloudera, who has a BoF session on Hadoop next door at the same time, along with Eric Herman from Booking.com. We’ll use the opportunity to talk all things Hadoop, but particularly the ingestion of data from MySQL and other databases into the Hadoop datastore.

As always, it’d be great to meet anybody interested in Hadoop at the BoF, please come along and introduce yourselves, and …

[Read more]

Jul

2014

Making Real-Time Analytics a Reality — TDWI -The Data Warehousing Institute

Posted by MC Brown on Tue 15 Jul 2014 13:52 UTC
Tags:

Oracle, Articles, Databases, analytics, hadoop, data migration, big data, MySQL

My article on how to make the real-time processing of information from traditional transactional stores into Hadoop a reality has been published over at TDWI:

Making Real-Time Analytics a Reality — TDWI -The Data Warehousing Institute.

Filed under: Articles Tagged: analytics, big data, data migration, databases, hadoop, mysql, …

[Read more]

Jun

2014

Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics

Posted by Severalnines on Mon 16 Jun 2014 08:15 UTC
Tags:

Other, Data Integration, ETL, Migration, analytics, hadoop, talend, data migration, big data, mongodb, MySQL, hdfs, tokumx, clickstream

June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.

Test Case

We have an application …

[Read more]

May

2014

Continuent at Hadoop Summit

Posted by MC Brown on Fri 30 May 2014 09:04 UTC
Tags:

Oracle, hadoop, continuent, big data, MySQL, Presentations and Conferences

I’m pleased to say that Continuent will be at the Hadoop Summit in San Jose next week (3-5 June). Sadly I will not be attending as I’m taking an exam next week, but my colleagues Robert Hodges, Eero Teerikorpi and Petri Versunen will be there to answer any questions you have about Continuent products, and, of course, Hadoop replication support built into Tungsten Replicator 3.0.

If you are at the conference, please go along and say hi to the team. And, as always, if there are any questions please let them or me know.

Filed under: Presentations and Conferences Tagged: big data, continuent, …

[Read more]

May

2014

Webinar-on-demand: Set up & operate real-time data loading into Hadoop

Posted by Petri Virsunen of Continuent on Thu 29 May 2014 19:32 UTC
Tags:

Oracle, hadoop, mysql replication, big data, MySQL, Continuent Tungsten, Continuent Tungsten Replicator

Getting data into Hadoop is not difficult, but it is complex if you want to load 'live' or semi-live data into your Hadoop cluster from your Oracle and MySQL databases. There are plenty of solutions available, from manually dumping and loading to the good and bad sides of using a tool like Sqoop. Neither are easy and both prone to the problems of lag between the moment you perform the dump and

May

2014

Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop

Posted by MC Brown on Thu 22 May 2014 20:40 UTC
Tags:

Oracle, Articles, Databases, hadoop, big data, MySQL

An article about moving data into Hadoop in real-time has just been published over at DBTA, written by me and my CEO Robert Hodges.

In the article I talk about one of the major issues for all people deploying databases in the modern heterogenous world – how do we move and migrate data effectively between entirely different database systems in a way that is efficient and usable. How do you get the data you need to the database you need it in. If your source is a transactional database, how does that data get moved into Hadoop in a way that makes the data usable to be queried by Hive, Impala or HBase?

You can read the full article here: Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop

Filed under: …

[Read more]

May

2014

Cross your Fingers for Tech14, see you at OSCON

Posted by MC Brown on Thu 15 May 2014 21:09 UTC
Tags:

Oracle, Conferences, Databases, hadoop, continuent, big data, UKOUG, MySQL, Presentations and Conferences

So I’ve submitted my talks for the Tech14 UK Oracle User Group conference which is in Liverpool this year. I’m not going to give away the topics, but you can imagine they are going to be about data translation and movement and how to get your various databases talking together.

I can also say, after having seen other submissions for talks this year (as I’m helping to judge), that the conference is shaping up to be very interesting. There’s a good spread of different topics this year, but I know from having talked to the organisers that they are looking for more submissions in the areas of Operating Systems, Engineered Systems and Development (mobile and cloud).

If you’ve got a paper, presentation, or idea for one that you think would be useful, …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links