Showing entries 61 to 70 of 197
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: big data (reset)
Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 1

Ask ten DBAs for a definition of ‘Big Data’ and you well get more than ten replies. And the majority of those replies will lead you to Hadoop. Hadoop has been the most prominent of the big data frameworks in the open source world. Over 80% of the Hadoop instances in the world are feed their data from MySQL1. But Hadoop is made up of many parts, some confusing and many that do not play nicely with each other. It is analogous to being given a pile of automotive parts from different models and tyring to come up with a car at the end of the day. So what if you do if you are wanting to copy some of your relational data into Hadoop and want to avoid the equivilent of scraped knuckles? The answer is Bigtop and what follows is a way to get a one node does all system running so you can experiement with Hadoop, Map/Reduce, Hive, and all the other parts.

Bigtop is an Apache Project self …

[Read more]
Big Data.. So what? Part 2

Sorry for this delay in providing part 2 of this series, but stuff happened that had really high priority, and in addition I was on vacation. But now I'm back in business!

So, last time I left you with some open thought on why Big Data can be useful, but that we also need new analysis tools as well as new ways of visualizing data for this to be truly useful. As for analysis, lets have a look at text, which should be simple enough, right? And sometimes it is simple. One useful analysis tool that is often overlooked is Google. Let's give it a shot, just for fun: if I think of two fierce competitors, somehow, that we can compare, say Oracle and MySQL.. Oracle is much older, both as a technology and as a company and in addition owns the MySQL brand these days. But on the other hand, the Web is where MySQL has it's sweet spot. Just Googling for MySQL and Oracle shows that MySQL seems to be much more discussed (and no, I haven't turned …

[Read more]
Big Data with MySQL and Hadoop at MySQL Connect 2013

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

  • From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform data/Analyze data; Extract results to MySQL. It is similar to the original Data Warehouse ETL …
[Read more]
Big Data.. So what? Part 1

This is the first blog post in a series where I hope to raise a bit above the technical stuff and instead focus on how we can put Big Data to effective use. I ran a SkySQL Webinar on the subject recently that you might also want to watch, and a recording is available here:http://bit.ly/17TTQnJ

Yes, so what? Why do you need or want all that data? All data you need from your customers you have in your Data Warehouse, and all data you need on the market you are in, you can get from some analyst? Right?

Well, yes, that is one source of data, but there is more to it than that. The deal with Data is that once you have enough of it, you can start to see things you haven't seen before. Trend analysis is only relevant when you have enough data, and the more you have, the more accurate it gets.Big Data is different from the data you already have in that it is Bigger, …

[Read more]
Big Data from Space: the “Herschel” telescope.

” One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on”–Jon Brumfitt. On May 14, 2009, the European Space Agency launched an Arianne 5 rocket [...]

Big data processing with Disco

Those who deal with big data probably know about Disco – a distributed computing framework aimed to provide a MapReduce platform for big data processing Python applications. We are proud to say that we are one of the largest users of Disco in the Netherlands. As an owner of multiple high-traffic portals with lots of […]

The post Big data processing with Disco appeared first on Spil Games Engineering.

On Oracle NoSQL Database –Interview with Dave Segleau.

“We went down the path of building Oracle NoSQL database because of explicit request from some of our largest Oracle Berkeley DB installations that wanted to move away from maintaining home grown sharding implementations and very much wanted an out of box technology that can replicate the robustness of what they had built “out of [...]

A new big data structure for streaming counters - bit length encoding

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]
On PostgreSQL. Interview with Tom Kincaid.

“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with [...]

The Needle in Big Data Noise

Read the original article at The Needle in Big Data Noise

Join 5500 others and follow Sean Hull on twitter @hullsean. Also take a look at: I hacked Disqus Digests to discover new blogs Who the heck is Bayes Thomas Bayes was a scientist & thinker, Fellow of the Royal Society, and back in 1763 author of “An Essay toward Solving a Problem in the Doctrine [...]

For more articles like these go to Sean Hull's Scalable Startups

Related posts:

  1. Big Data – What is it and why is it important?
  2. NYC Tech Firms Are Hiring – Map
[Read more]
Showing entries 61 to 70 of 197
« 10 Newer Entries | 10 Older Entries »