Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 91 to 120 of 120

Displaying posts with tag: big data (reset)

NSA, Accumulo & Hadoop
+0 Vote Up -0Vote Down

Reading yesterday that the NSA has submitted a proposal to Apache to incubate their Accumulo platform.  This, according to the description, is a key/value store built over Hadoop which appears to provide similar function to HBase except it provides “cell level access labels” to allow fine grained access control.  This is something you would expect as a requirement for many applications built at government agencies like the NSA.  But this also is very important for organizations in health care and law enforcement etc where strict control is required to large volumes of privacy sensitive data.

An interesting part of this is how it highlights the acceptance of Hadoop.

  [Read more...]
Ask What Your Database Can Do for Your Country
+0 Vote Up -2Vote Down

How many in your household again?

One of President John Kennedy’s most memorable phrases is “ask not what your country can do for you –  ask what can you do for your country”.  I got to thinking about this over lunch with a fellow colleague in the big data space. After comparing named customers for a while, we realized we had forgotten one of the biggest “big data” customers whom we both have in common – the government.

Whether you believe in small or big government, one thing is

  [Read more...]
NoSQL Now 2011: Review of AdHoc Analytic Architectures
+1 Vote Up -0Vote Down

For those that weren’t able to attend the fantastic NoSQL Now Conference in San Jose last week, but are still interested in the slides about how people are doing Ad Hoc analytics on top of NoSQL data systems, here’s my slides from my presentation:

No sql now2011_review_of_adhoc_architectures

View more presentations from ngoodman We obviously continue to hear from our community that LucidDB is a great solution sitting in front of a Big Data/NoSQL  [Read more...]
Database Insights from Archimedes to the Houston Rockets
+0 Vote Up -0Vote Down

Archimedes, the first DBA

According to a recent MIT Sloan Management Review study, top performing organizations use analytics 5 times more than lower performers. That’s pretty astounding. And while we all know about the ocean/lake/waves/(your favorite water analogy) of Big Data we struggle with everyday, information is not knowledge. So how can we get insight from data? Recent articles from

  [Read more...]
Reply to The Future of the NoSQL, SQL, and RDBMS Markets
+0 Vote Up -0Vote Down

Conor O'Mahony over at IBM wrote a good post on a favorite topic of mine “The Future of the NoSQL, SQL, and RDBMS Markets”.  If this is of interest to you then I suggest you read his original post.  I replied in the comments but thought I would also repost my reply here.

-----------------------------------------------------------------------------------------------

Hi Connor, I wish it was as simple as SQL & RDBMS is good for this and NoSQL is good for that.  For me at least, the waters are much muddier than that.

The benefit of SQL & RDBMS is

  [Read more...]
IA Ventures - Jobs shout out
+0 Vote Up -0Vote Down

My friends over at IA Ventures are looking both for an Analyst and for an Associate to their team.  If Big Data, New York and start-ups is in your blood then I can’t think of a better VC to be involved in. 

From the IA blog:

"IA Ventures funds early-stage Big Data companies creating competitive advantage through data and we’re looking for two start-up junkies to join our team – one full-time associate / community manager and one full time analyst. Because there are only four of us (we’re a start-up ourselves, in fact), we’ll need you to help us investigate companies, learn about industries, develop investment theses, perform internal operations, organize

  [Read more...]
Realtime Data Pipelines
+0 Vote Up -0Vote Down

In life there are really two major types of data analytics.  Firstly, we don’t know what we want to know – so we need analytics to tell us what is interesting.  This is broadly called discovery.  Secondly, we already know what we want to know – we just need analytics to tell us this information, often repeatedly and as quickly as possible.  This is called anything from reporting or dashboarding through more general data transformation and so on.

Typically we are using the same techniques to achieve this.  We shove lots of data into a repository of some from (SQL, MPP SQL, NoSQL, HDFS etc) then run queries/ jobs/ processes across that data to retrieve the information we care about.  

Now this makes sense for data discovery.  If we don’t know what we want to know, having lots of data in a big pile that we can slice and dice

  [Read more...]
What Scales Best?
+0 Vote Up -0Vote Down

It is a constant, yet interesting debate in the world of big data.  What scales best?  OldSQL, NoSQL, NewSQL?

I have a longer post coming on this soon.  But for now, let me make the following comments.  Generally, most data technologies can be made to scale - somehow.  Scaling up tends not to be too much of an issue, scaling out is where the difficulties begin.  Yet, most data technologies can be scaled in one form or another to meet a data challenge even if the result isn’t pretty. 

What is best?  Well that comes down to the resulting complexity, cost, performance and other trade-offs.  Trade-offs are key as there are almost always significant concessions to be made as you scale up.

A recent example of mine, I was looking at scalability aspects of MySQL.  In particular, MySQL Cluster

  [Read more...]
This Weekend in Japan
+0 Vote Up -0Vote Down

We were happy to see a lot of folks from Japan on Twitter this weekend having a discussion about MySQL and Tokutek. While we always endeavor to explain ourselves as simply as possible, hearing what users and peers have to say and ask in their native language is very helpful. Here is a sampling of several of the 30+ tweets and re-tweets (translations courtesy of a colleague I know from frequent past visits to Tokyo and Yokohama):

.

First, @frsyuki provided a general overview:

“TokuDB” 新種のMySQLストレージエンジン。INSERTが20〜80倍ほど速い、パーティションなしで数TBのデータを突っ込める、MVCCサポートなど。Fractal Treeというアルゴリズムを実装しているらしい。

  [Read more...]
Don’t Thrash: How to Cache your Hash on Flash
+1 Vote Up -0Vote Down

Last week I gave a talk entitled “Don’t Thrash: How to Cache your Hash.” The talk took place at the Workshop on Algorithms and Data Structures (ADS) in a medieval castle turned conference center in Bertinoro, Italy. An earlier version of this work (with the same title) appeared at the HotStorage conference in Portland, OR. Tokutek co-founders Bradley, Martin, and I are coauthors on the work, along with students and other faculty at Stony Brook University.

The talk title is colorful and doggerel-y. Here’s what the title means. “Cache your hash”—the so-called Bloom Filter type data structure. A Bloom filter acts like

  [Read more...]
SQL access to CouchDB views : Easy Reporting
+2 Vote Up -1Vote Down

Following up on my previous blog about enabling SQL Access to CouchDB Views I thought I’d share what I think the single, biggest advantage is: The ability to connect, run of the mill, commodity BI tools to your big data system.

While the video below doesn’t show a PRPT it does show Pentaho doing Ad Hoc, drag and drop reporting on top of CouchDB with LucidDB in the middle, providing the connectivity and FULL SQL access to CouchDB. Once again, the overview:

BI Tools are commoditized; consider all the great alternatives available

  [Read more...]
0.9.4 did not hit the 1 year mark!
+1 Vote Up -0Vote Down

Our last LucidDB release was now, just more than 12 months ago on June 16, 2010. We were really really trying to beat the 1 year mark for our 0.9.4 release but we just couldn’t. A tenet of good, open source development is early and often and we need to do better. Since the 0.9.3 release we’ve:

  • Built out an entire Web Services infrastructure
  • Developed a wicked cool Admin user interface
  • Developed cool connectors to Hive, CouchDB
  • Built a whole ton of extensions (auto indexing, DDL generation, improved load routines)
  • Scriptable
  [Read more...]
HPCC vs Hadoop at a glance
+0 Vote Up -0Vote Down

Update

Since this article was written, HPCC has undergone a number of significant changes and updates. This addresses some of the critique voiced in this blog post, such as the license (updated from AGPL to Apache 2.0) and integration with other tools. For more information, refer to the comments placed by Flavio Villanustre and Azana Baksh.

The original article can be read unaltered below:

Yesterday I noticed this tweet by Andrei Savu: . This prompted me to read the related GigaOM article and then check out the  [Read more...]
SQL access to CouchDB views
+0 Vote Up -0Vote Down

Following up on my first post on an alternative, more SQL-eee metadata driven approach to doing BI on Big Data, I’d like to share an example on how we can enable easy reporting on top of BIg Data immediately for CouchDB users. We’re very keen on discussing with CouchDB/Hive/other Big Data users about their Ad Hoc and BI needs; please visit the forum thread about the connector.

We’ve been working with some new potential customers on how to leverage their investment in Big Data (specifically Big Couch provided by the fine folks at Cloudant. In

  [Read more...]
A different vision for the value of Big Data
+1 Vote Up -1Vote Down

UPDATE: Think we’re right? Think we’re wrong? Desperate to figure out a more elegant solution to self service BI on top of CouchDB, Hive, etc? Get in touch, and please let us know!

There’s a ton of swirling about Hadoop, Big Data, and NoSQL. In short, these systems have relaxed the relational model into schema(less/minimal) to do a few things:

  • Achieve massive scalability, resiliency and redundancy on commodity hardware for data processing
  • Allow for flexible evolution and disparity in content of data, at scale, over time
  • Process semi-structured data and algorithms on these (token frequencies, social graphs, etc)
  • Provide analytics and
  [Read more...]
MySQL for Big Data
+0 Vote Up -0Vote Down
An excerpt from article on mysql for big data published in Dow Jones Venture Wire by Scott Denne.

There is one possible solution to the problem that doesn't include companies having to buy new software tools or even an all-new database: With the right expertise, MySQL can be engineered to handle almost any data-intensive application. The only problem is that there's a shortage of people who have the expertise to make it work.

"There's a big time gap until we, as an industry, think we have data under control," said Frank Mashraqi, chief technology officer at MyLawsuit.com and former database chief at Fotolog Inc., a photo blogging site. "The roadmap to getting that expertise is very difficult and time doesn't allow for it."
Elephants on a Trapeze: Keeping Big Data Agile
+0 Vote Up -0Vote Down

On April 1st, the Department of Computer Science at Rutgers University, where I am a professor, held an open house. I gave a talk called “Elephants on a Trapeze: Keeping Big Data Agile”.

The talk is an introduction to performance issues related to big data without getting too technical. You’ll have to decide if I succeeded with the “not too technical” part. My take is on how to keep big data indexed — not surprising since the work in this talk is the basis for TokuDB®, Tokutek’s MySQL storage engine for keeping large data indexed. A video of my talk can be found here.

Elephants on a Trapeze: Keeping Big Data Agile from Tokutek on Vimeo.

Big Data is how big exactly?
+4 Vote Up -0Vote Down

I see that “Big Data” has become the new buzzword with a spike of hype around it. Everyone’s jumping on it. Companies are eager to promote their products as “Big Data,” just as they were eager to be associated with Web 2.0, Service-Oriented Architectures, and all the rest. Predictably, there’s basically zero agreement on what it means.

I’ve seen “Big Data” mentioned in the context of 1TB, which I think is rather moderate sized. But worse yet, I’ve seen 100GB labeled Big Data. I’ve even seen 5GB labeled Big Data. No links — I don’t want to draw attention to them.

I don’t know what Big Data is, but the stick-of-gum-sized flash drive in my pocket holds 16GB. It’s pretty Small. I mean, I forget it’s even there — it’s definitely not Big. I don’t

  [Read more...]
Outliers and coexistence are the new normal for big data
+0 Vote Up -0Vote Down

Letting data speak for itself through analysis of entire data sets is eclipsing modeling from subsets. In the past, all too often what were once disregarded as "outliers" on the far edges of a data model turned out to be the telltale signs of a micro-trend that became a major event. To enable this advanced analytics and integrate in real-time with operational processes, companies and public sector organizations are evolving their enterprise architectures to incorporate new tools and approaches.

Whether you prefer "big," "very large," "extremely large," "extreme," "total," or another adjective for the "X" in the "X Data" umbrella term, what's important is accelerated growth in three dimensions: volume, complexity and speed.

Big data is not without its limitations. Many organizations need to revisit business processes, solve data silo

  [Read more...]
Who/What to acquire next
+1 Vote Up -0Vote Down

Well as predicted, with Aster Data recently being picked up by Teradata most of the key new generation MPP distributed analytics vendors have been acquired (Aster Data, Vertica, Netezza & Greenplum).  This had to happen and was expected to happen.  The MPP Analytics startup “revolution” is over and these technologies will now be integrated into the mainstream.

So what’s next?  As we now, if you are a massive multi-national software company it is a lot less risky to incrementally innovate and leave the development of “game changing” technologies to startups that can be acquired after

  [Read more...]
What’s hot in Big Data startups?
+0 Vote Up -0Vote Down

There are so, so many big data platforms in play at the moment it can be confusing for developers to know where to start.  For startups it used to be simple, MySQL, but dust clouds were created when all the NoSQL platforms started to crash the party 18 months or so ago.  But I do see the dust begin to settle and we are starting to see some market “leaders” appear.  A very unscientific approach is to list the technologies I hear about in the “big data startup” world on a daily basis.  These are, in no particular order:

  • MySQL - yes it is still very much hanging in there despite the Oracle acquisition.  MySQL has been helped by technologies such as AWS RDS and Xeround making it more digestible for big data startups who want
  [Read more...]
Q&A with Stephen Baker of "Final Jeopardy"
+0 Vote Up -0Vote Down

IBM's Watson natural language Question & Answer system made headlines recently with its primetime debut on Jeopardy.  Despite a few embarassing answers, Watson trounced top Jeopardy players Brad Rutter and Ken Jennings.  Watson is built from 90 IBM Power 750 IBM Linux servers with 16 terabytes of memory providing 80 Teraflops of processing power.  Watson is perhaps the most famous "Big Data" systems out there.  Watson's knowledge base

  [Read more...]
Some NoSQL Myths
+6 Vote Up -0Vote Down

I have been busy travelling recently but thought I would jot down a couple of NoSQL myths that are fresh in my head from my recent discussions.

  • Twitter use Cassandra internally but have not migrated their tweet store, despite their earlier plans to.  For now tweets are still stored in MySQL.
  • Despite the widely accepted view that the use of Cassandra led to Diggs issues a couple of Digg engineers have apparently discounted this.
  • Despite the widely accepted view that NoSQL databases all use eventual consistency this is not so.  HBase, for example, offers full consistency.
  • Despite the widely accepted view that NoSQL is only
  [Read more...]
How Real is the Data Deluge?
+1 Vote Up -1Vote Down

It seems obvious that given the decreasing cost of storage and computation, there's going to be a significant increase in the volume of data that organizations accumulate over the next 10 years.  But the type of data being accumulated may be different from the areas where traditional DBMSs dominated.  It's not just about transactions; it's search patterns, on-line behavior, click-thru data, events fired off by smartphones, messages over Twitter & Facebook, log data of various kinds.

If an organization can figure out a better way identify prospects, or deliver more targeted ads, or optimize pricing decisions by analyzing terrabytes of data, they'd be crazy not to. Over the long term, companies

  [Read more...]
The problem with a full box of big data tools
+0 Vote Up -0Vote Down

NoSQL”, for lack of better name, is a generic term that describes any data management system that does not use SQL as a query interface.  Generally this means any data management system that is non-relational, but the term also has also been stretched as far to include the boundaries of what constitutes a data management system at all (such as Hadoop).

Early on (a couple of years back in NoSQL time) when the term was coined I think the positioning was much more aggressive, but more recently this has been softened so now NoSQL is commonly quoted as meaning of “Not only SQL” or “next generation databases” (whatever that means).  The common message you get now is something along the lines of NoSQL systems are

  [Read more...]
The SMAQ stack for big data
+0 Vote Up -0Vote Down

SMAQ report sections

→ MapReduce

→ Storage

→ Query

→ Conclusion

"Big data" is data that becomes large enough that it cannot be processed using conventional methods. Creators of web search engines were among the first to confront this problem. Today, social networks, mobile phones, sensors and science contribute to petabytes of data created daily.

To meet the challenge of processing such large data sets, Google created MapReduce. Google's work and Yahoo's creation of the Hadoop MapReduce implementation has spawned an ecosystem of big data processing tools.

As MapReduce has grown in  [Read more...]

Big Data innovation marches on
+0 Vote Up -0Vote Down

With IBM intending to acquire Netezza the predicted consolidation in the distributed analytics market is well underway.  Recent deals include EMC/Greenplum Teradata/Kickfire and now IBM/Netezza.  A good breakdown of this deal is on Curt’s blog.  There is still more to go of course with one of the crown jewels, Vertica, still ripe for the picking. 

What this indicates is that MPP analytics has moved from the innovative edge into the mainstream market and now the more risk

  [Read more...]
Was Stonebraker right?
+2 Vote Up -1Vote Down

Back in 2008 Stonebraker & DeWitt published a paper and associated blog post titled “MapReduce: A major step backwards”.  Their key points being Map Reduce is:

  • A giant step backward in the programming paradigm for large-scale data intensive applications
  • A sub-optimal implementation, in that it uses brute force instead of indexing
  • Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago
  • Missing most of the features that are routinely included in current DBMS
  • Incompatible with all of the tools DBMS




  •   [Read more...]
    VLDB 2010
    +0 Vote Up -0Vote Down

    I will be at VLDB 2010 next week.  If anyone on this blog is attending and wants to catch up to discuss start ups and innovation in DB, NoSQL, Big Data etc drop me a line and I will try to meet up.

    The number of Hadoop jobs continue to rise
    +2 Vote Up -2Vote Down

    While still a small fraction1 of data management job postings, the number of job posts that mention "hadoop" continue to grow steadily. Year-over-year, there were 300% more such job posts2 in the first seven months of 2010 compared to the same period in 2009:





    The fraction of "hadoop" jobs posted by California companies remain high, but is definitely lower than what it was last year:






    (1) Over the last three months, job posts that mention "hadoop" were











      [Read more...]
    Previous 30 Newer Entries Showing entries 91 to 120 of 120

    Planet MySQL © 1995, 2013, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.