Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 31 to 60 of 169 Next 30 Older Entries

Displaying posts with tag: big data (reset)

SQL to Hadoop and back again, Part 2: Leveraging HBase and Hive
+0 Vote Up -0Vote Down

The second article in a series covering Big Data and SQL interaction is available now:

“Big data” is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. Here in Part 2, we will concentrate on how to use HBase and Hive for exchanging data with your SQL data stores. From the outside, the two systems seem to be largely similar, but the systems have very different goals and aims. Let\’s start by looking at how the two systems differ and how we can take advantage of that in our big data requirements.

  [Read more...]
Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett.
+0 Vote Up -0Vote Down
“The most valuable thing I’ve learned in this role is that judicious use of a little bit of knowledge can go a long way. I’ve seen colleagues and other companies get caught up in the “Big Data” craze by spend hundreds of thousands of pounds sterling on a Hadoop cluster that sees a few megabytes [...]
Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 1
Employee +1 Vote Up -0Vote Down

Ask ten DBAs for a definition of ‘Big Data’ and you well get more than ten replies. And the majority of those replies will lead you to Hadoop. Hadoop has been the most prominent of the big data frameworks in the open source world. Over 80% of the Hadoop instances in the world are feed their data from MySQL1. But Hadoop is made up of many parts, some confusing and many that do not play nicely with each other. It is analogous to being given a pile of automotive parts from different models and tyring to come up with a car at the end of the day. So what if you do if you are wanting to copy some of your relational data into Hadoop and want to avoid the equivilent of scraped knuckles? The answer is Bigtop and what follows is a way to get a one node does all system running so you can experiement with Hadoop, Map/Reduce, Hive, and all

  [Read more...]
Big Data.. So what? Part 2
+0 Vote Up -2Vote Down
Sorry for this delay in providing part 2 of this series, but stuff happened that had really high priority, and in addition I was on vacation. But now I'm back in business!

So, last time I left you with some open thought on why Big Data can be useful, but that we also need new analysis tools as well as new ways of visualizing data for this to be truly useful. As for analysis, lets have a look at text, which should be simple enough, right? And sometimes it is simple. One useful analysis tool that is often overlooked is Google. Let's give it a shot, just for fun: if I think of two fierce competitors, somehow, that we can compare, say Oracle and MySQL.. Oracle is much older, both as a technology and as a company and in addition owns the MySQL brand these days. But on the other hand, the Web is where MySQL has it's sweet spot. Just Googling for MySQL and Oracle shows

  [Read more...]
Big Data with MySQL and Hadoop at MySQL Connect 2013
+1 Vote Up -0Vote Down

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

  • From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform
  [Read more...]
Big Data.. So what? Part 1
+0 Vote Up -0Vote Down
This is the first blog post in a series where I hope to raise a bit above the technical stuff and instead focus on how we can put Big Data to effective use. I ran a SkySQL Webinar on the subject recently that you might also want to watch, and a recording is available here:http://bit.ly/17TTQnJ

Yes, so what? Why do you need or want all that data? All data you need from your customers you have in your Data Warehouse, and all data you need on the market you are in, you can get from some analyst? Right?

Well, yes, that is one source of data, but there is more to it than that. The deal with Data is that once you have enough of it, you can start to see things you haven't seen before. Trend analysis is only relevant when you have enough data, and the more you have, the more accurate it gets.Big Data is



  [Read more...]
Big Data from Space: the “Herschel” telescope.
+0 Vote Up -0Vote Down
” One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on”–Jon Brumfitt. On May 14, 2009, the European Space Agency launched an Arianne 5 rocket [...]
Big data processing with Disco
+0 Vote Up -0Vote Down

Those who deal with big data probably know about Disco – a distributed computing framework aimed to provide a MapReduce platform for big data processing Python applications. We are proud to say that we are one of the largest users of Disco in the Netherlands. As an owner of multiple high-traffic portals with lots of […]

The post Big data processing with Disco appeared first on Spil Games Engineering.

On Oracle NoSQL Database –Interview with Dave Segleau.
+0 Vote Up -0Vote Down
“We went down the path of building Oracle NoSQL database because of explicit request from some of our largest Oracle Berkeley DB installations that wanted to move away from maintaining home grown sharding implementations and very much wanted an out of box technology that can replicate the robustness of what they had built “out of [...]
A new big data structure for streaming counters - bit length encoding
+1 Vote Up -0Vote Down
One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes





  [Read more...]
On PostgreSQL. Interview with Tom Kincaid.
+0 Vote Up -1Vote Down
“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with [...]
The Needle in Big Data Noise
+0 Vote Up -0Vote Down

Read the original article at The Needle in Big Data Noise

Join 5500 others and follow Sean Hull on twitter @hullsean. Also take a look at: I hacked Disqus Digests to discover new blogs Who the heck is Bayes Thomas Bayes was a scientist & thinker, Fellow of the Royal Society, and back in 1763 author of “An Essay toward Solving a Problem in the Doctrine [...]

For more articles like these go to Sean Hull's Scalable Startups

Related posts:
  • Big Data – What is it and why is it important?
  • NYC Tech Firms Are Hiring – Map
  •   [Read more...]
    MySQL Applier For Hadoop: Implementation
    Employee +4 Vote Up -0Vote Down

    This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in the previous post.

    Design and Implementation:

    Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.

    The library comes pre-compiled with Hadoop distributions.It






      [Read more...]
    May 2nd Webinar: Introduction to TokuDB v7 Community & Enterprise Editions
    +0 Vote Up -0Vote Down

    With this version, the source code is now freely available under the GPL License v2. For more details, see our blog here. Open source pioneer Mozilla has been using TokuDB to manage its MySQL-driven Datazilla Data cluster, an open-source system for managing and visualizing performance data.

    Date: May 2nd
    Time: 2 PM EST / 11 AM PST
    REGISTER TODAY

    In the past TokuDB has been free for evaluation; the new TokuDB Community Edition extends free use to deployed environments. With this release Tokutek is also planning on making available a TokuDB Enterprise Edition, which includes technical support,



      [Read more...]
    From Oracle to 10gen, The MongoDB Company
    +2 Vote Up -0Vote Down
    Those who are familiar with me know I've a dream.

    5 years ago I decided to leave a systems integrator where I was doing great. Why? I wanted to be in a company with the same growth prospects that Oracle had in the 80s. I dreamed to be in the Oracle of 30 years ago and, as time travel wasn't affordable, I decided to join MySQL AB to help expand the business in Europe, the Middle East and Africa.
    A few years later my dream came true, but in a slightly different sense. Sun acquired MySQL and was later swallowed by


      [Read more...]
    Thanks to Community for Selecting Tokutek for Prestigious MySQL Award
    +3 Vote Up -0Vote Down

    We wanted to thank everyone for naming Tokutek the Corporate Contributor of the Year 2013 for ongoing contribution to the MySQL community.

    The MySQL Community Awards are given annually to the people and companies that support the MySQL ecosystem. The MySQL Community Award for Corporate Contributor of the Year recognizes a company or other organization or entity that has made valuable contributions to the MySQL ecosystem either in terms of open source code, knowledge,

      [Read more...]
    Percona Live - Keynote: How MySQL can thrive in the world of massive data hype
    +0 Vote Up -1Vote Down
      Continuent CEO Robert Hodges says that NoSQL solutions are oversold, but this is no reason for MySQL fans to become complacent. He kicked off Day 2 of the Percona Live MySQL Conference and Expo with his keynote, "How MySQL can thrive in the world of massive data hype."He said there are new challenges in data management, and relational databases must solve them or risk becoming irrelevant. This
    Open Source TokuDB Resources
    +2 Vote Up -0Vote Down

    Since we announced that TokuDB is now open source, there has been a lot of positive feedback (thanks!) and also some questions about the details. I want to take this opportunity to give a quick high level guide to describe what our repositories on Github are.

    Here are the repositories:

    • ft-index. This repository is the “magic”. It contains the Fractal Tree data structures we have been talking about for years. This is also the main piece that was previously closed source. Here are some interesting directories:
      • src: This directory is a layer that implements an API that is similar to the BDB API.
      • locktree: an in-memory data structure that maintains transactions’ row-level locks.
      [Read more...]
    MySQL Applier For Hadoop: Real time data export from MySQL to HDFS
    Employee +2 Vote Up -0Vote Down

    MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time! 
    This is what the new Hadoop Applier empowers you to do.
    An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive


      [Read more...]
    Announcing TokuDB v7: Open Source and More
    +3 Vote Up -0Vote Down

    Every few months, I get the fun job of announcing what’s new in TokuDB®, but this time is special. With Version 7, TokuDB for MySQL and MariaDB is going open source.

    The free Community Edition is fully functional and fully performant. It has all the compression you’ve come to expect from TokuDB. It has hot schema changes: no-down-time column insertion, deletion, renaming, etc., as well as index creation. It has clustering secondary keys. We are also announcing an Enterprise Edition (coming soon) with additional benefits, such as a support package and advanced backup and recovery tools.

    Making TokuDB open source is a natural next step for Tokutek’s involvement in the MySQL community. So far, Tokutek has been involved in the community in many ways:

    • We’ve
      [Read more...]
    April is the Coolest Month
    +0 Vote Up -0Vote Down

    If T.S. Eliot were a MySQL DBA, I think he would have been more upbeat about April.

    We are gearing up for an incredible second half of April. We will be presenting three separate sessions at the Percona Live: MySQL Conference and Expo 2013, April 22-25, in Santa Clara, CA. In addition, we will be presenting at SkySQL’s MySQL & Cloud Database Solutions Day on Friday, April 26 at the same location.

    Come by to see us in Booth #114, or stop by one of our sessions:

      [Read more...]
    MySQL thread pool and scalability examples
    +2 Vote Up -0Vote Down
    Nice article about SimCity outage and ways to defend databases: http://www.mysqlperformanceblog.com/2013/03/16/simcity-outages-traffic-control-and-thread-pool-for-mysql/

    The graphs showing throughput with and without the thread pool are taken from the benchmark performed by Oracle and taken from here:
    http://www.mysql.com/products/enterprise/scalability.html (http://www.mysql.com/products/enterprise/scalability.html)

    The main take away is this graph (all rights reserved to Oracle, picture original URL (http://www.mysql.com/common/images/enterprise/MySQL_Threadpool_Benchmark_RW.png" target="_blank)):

    Scalability is






      [Read more...]
    Deploying Cloudera Impala on EC2 with Example Live Demo
    +0 Vote Up -0Vote Down

    A little while ago I blogged about (and open sourced) an Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance. You can try the visualization; we’ve also opened up the Impala web interface, where you can see query profiles and performance numbers, and Hue (username and password are both ‘test’), where you can run your own queries on the dataset.

    Deploying  [Read more...]

    They say: "Relational Databases Aren't Dead"
    +2 Vote Up -0Vote Down
    This is a good read, claiming: "Relational Databases Aren't Dead. Heck, They're Not Even Sleeping", http://readwrite.com/2013/03/26/relational-databases-far-from-dead. A key quote:
    "While not comprehensive, the uses for NoSQL databases center around the acquisition of fast-growing data or data that does not easily fit within uniform structures."

    There were 2 parts in the statement about NoSQL's uses. I'll start with the latter:


    "data that does not easily fit within uniform structures" - NoSQL is probably the right choice, hmm although I always encourage thinking and architecting in advance. And also online structure changes do exist in the RDBMS world and recently in MySQL:




      [Read more...]
    Big Data for Genomic Sequencing. Interview with Thibault de Malliard.
    +0 Vote Up -0Vote Down
    “Working with empirical genomic data and modern computational models, the laboratory addresses questions relevant to how genetics and the environment influence the frequency and severity of diseases in human populations” –Thibault de Malliard. Big Data for Genomic Sequencing. On this subject, I have interviewed Thibault de Malliard, researcher at the University of Montreal’s Philip Awadalla [...]
    The Last Mile for Big Data – Strata Overview with Jeff Kelly of Wikibon (Part 2)
    +0 Vote Up -0Vote Down

    During the second half of our CUBE discussion with Wikibon analyst Jeff Kelly at this year’s Strata Conference in Santa Clara, we talked about the tipping point for Big Data. Strata veterans could see at a glance that this year’s conference was markedly different. No longer the exclusive domain of geeks and database administrators, this year’s Strata featured some of the biggest enterprise vendors around. With heavy weight enterprise players Intel and EMC Greenplum announcing their own Hadoop distributions, big data is clearly going mainstream. Now that we know how to capture, store, access and analyze big data, what’s the next step? Listen in to hear my conversation with Jeff Kelly about taking big data

      [Read more...]
    MySQL and MongoDB – Strata Discussion with Jeff Kelly of Wikibon (Part 1)
    +1 Vote Up -0Vote Down

    We had the opportunity to do a CUBE interview with Wikibon analyst Jeff Kelly at last week’s Strata Conference in Santa Clara. In the first part of our conversation, we discuss how our success in integrating Tokutek’s Fractal Tree® technology into MySQL has led us to another popular database, MongoDB. We explain the results of our recent benchmarking tests with MongoDB, which indicate that adding indexing can also improve performance for this popular NoSQL database with faster insertion rates, lower query latency and

      [Read more...]
    Fast Updates with TokuDB
    +4 Vote Up -0Vote Down

    With TokuDB v6.6 out now, I’m excited to present one of my favorite enhancements: fast updates with TokuDB. Update intensive applications can have their throughput limited by the random read capacity of the storage system. The cause of the throughput limit is the read-modify-write algorithm that MySQL uses when processing update statements. MySQL reads a row from the storage engine, applies the updates to it, and then writes the new row to the storage engine. To address this throughput limit, TokuDB uses a different update algorithm that simply encodes the update expressions of the SQL statement into tiny programs that are stored in an update Fractal Tree® message. This update message is

      [Read more...]
    MySQL-State of the Union. Interview with Tomas Ulin.
    +8 Vote Up -0Vote Down
    “With MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements” –Tomas Ulin. On February 5, 2013, Oracle announced the general availability of MySQL 5.6. I have interviewed Tomas Ulin, Vice President for the MySQL Engineering team [...]
    Introducing Data Fabric Design for Commodity SQL Databases
    +3 Vote Up -0Vote Down
    Extract from THE SCALE-OUT BLOG by Robert Hodges (CEO, Continuent)http://scale-out-blog.blogspot.com Data management is undergoing a revolution. Many businesses now depend on data sets that vastly exceed the capacity of DBMS servers. Applications operate 24x7 in complex cloud environments using small and relatively unreliable VMs. Managers need to act on new information from those systems in
    Previous 30 Newer Entries Showing entries 31 to 60 of 169 Next 30 Older Entries

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.