Showing entries 81 to 90 of 211
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: big data (reset)
A new big data structure for streaming counters - bit length encoding

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]
A new big data structure for streaming counters - bit length encoding

One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes advantage of the fact that counters …

[Read more]
On PostgreSQL. Interview with Tom Kincaid.

“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with [...]

The Needle in Big Data Noise

Read the original article at The Needle in Big Data Noise

Join 5500 others and follow Sean Hull on twitter @hullsean. Also take a look at: I hacked Disqus Digests to discover new blogs Who the heck is Bayes Thomas Bayes was a scientist & thinker, Fellow of the Royal Society, and back in 1763 author of “An Essay toward Solving a Problem in the Doctrine [...]

For more articles like these go to Sean Hull's Scalable Startups

Related posts:

  1. Big Data – What is it and why is it important?
  2. NYC Tech Firms Are Hiring – Map
[Read more]
May 2nd Webinar: Introduction to TokuDB v7 Community & Enterprise Editions

With this version, the source code is now freely available under the GPL License v2. For more details, see our blog here. Open source pioneer Mozilla has been using TokuDB to manage its MySQL-driven Datazilla Data cluster, an open-source system for managing and visualizing performance data.

Date: May 2nd
Time: 2 PM EST / 11 AM PST
REGISTER TODAY

In the past TokuDB has been free for evaluation; the new TokuDB Community Edition extends free use to deployed environments. With this release Tokutek is also planning on making available a TokuDB Enterprise Edition, which includes technical support, initial customer onboarding services, and advanced tools for backup and recovery.

We …

[Read more]
From Oracle to 10gen, The MongoDB Company

Those who are familiar with me know I've a dream.

5 years ago I decided to leave a systems integrator where I was doing great. Why? I wanted to be in a company with the same growth prospects that Oracle had in the 80s. I dreamed to be in the Oracle of 30 years ago and, as time travel wasn't affordable, I decided to join MySQL AB to help expand the business in Europe, the Middle East and Africa.
A few years later my dream came true, but in a slightly different sense. Sun acquired MySQL and was later swallowed by Oracle giving me the opportunity to join the company I wished I could have helped build.

Oracle is an amazing …

[Read more]
Thanks to Community for Selecting Tokutek for Prestigious MySQL Award

We wanted to thank everyone for naming Tokutek the Corporate Contributor of the Year 2013 for ongoing contribution to the MySQL community.

The MySQL Community Awards are given annually to the people and companies that support the MySQL ecosystem. The MySQL Community Award for Corporate Contributor of the Year recognizes a company or other organization or entity that has made valuable contributions to the MySQL ecosystem either in terms of open source code, knowledge, funding or other resources or sponsorship. The winners are selected by an independent community panel.

“Open Source is about collaborating and contributing to build …

[Read more]
Percona Live - Keynote: How MySQL can thrive in the world of massive data hype

  Continuent CEO Robert Hodges says that NoSQL solutions are oversold, but this is no reason for MySQL fans to become complacent. He kicked off Day 2 of the Percona Live MySQL Conference and Expo with his keynote, "How MySQL can thrive in the world of massive data hype."He said there are new challenges in data management, and relational databases must solve them or risk becoming irrelevant. This

Open Source TokuDB Resources

Since we announced that TokuDB is now open source, there has been a lot of positive feedback (thanks!) and also some questions about the details. I want to take this opportunity to give a quick high level guide to describe what our repositories on Github are.

Here are the repositories:

  • ft-index. This repository is the “magic”. It contains the Fractal Tree data structures we have been talking about for years. This is also the main piece that was previously closed source. Here are some interesting directories:
    • src: This directory is a layer that implements an API that is similar to the BDB API.
    • locktree: an in-memory data structure that maintains transactions’ row-level locks. …
[Read more]
MySQL Applier For Hadoop: Real time data export from MySQL to HDFS


MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time! 
This is what the new Hadoop Applier empowers you to do.
An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive tables in real time. Data is …

[Read more]
Showing entries 81 to 90 of 211
« 10 Newer Entries | 10 Older Entries »