Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 91 to 120 of 168 Next 30 Older Entries

Displaying posts with tag: big data (reset)

Webinar: Understanding Indexing
+1 Vote Up -0Vote Down

Three rules on making indexes around queries to provide good performance

Application performance often depends on how fast a query can respond and query performance almost always depends on good indexing. So one of the quickest and least expensive ways to increase application performance is to optimize the indexes. This talk presents three simple and effective rules on how to construct indexes around queries that result in good performance.


Time: 2PM EDT / 11AM PDT

This webinar is a general discussion applicable to all databases using indexes and is not specific to any particular MySQL® storage engine


  [Read more...]
So now Hadoop's days are numbered?
+3 Vote Up -0Vote Down
Earlier this week we all read GigaOM's article with this title:
"Why the days are numbered for Hadoop as we know it"
I know GigaOM like to provoke scandals sometimes, we all remember some other unforgettable piece, but there is something behind it...

Hadoop today (after SOA not so long ago) is one of the worst case of an abused buzzword ever known to men. It's everything, everywhere, can cure illnesses and do "big-data" at the same time! Wow! Actually Hadoop is a software framework that supports data-intensive distributed applications, derived from Google's MapReduce and Google File System (GFS) papers.

My take from the article is




  [Read more...]
My Talks at MySQL Connect and Percona Live NYC
+1 Vote Up -0Vote Down


Solving the Challenges of Big Databases with MySQL

When you’re using MySQL for big data (more than ten times as large as main memory), these challenges often arise: loading data fast; maintaining indexes under insertions deletions, and updates; adding and removing columns online; adding indexes online; preventing slave lag; and compressing data effectively.

This session shows why some of these challenges are difficult to solve with storage engines based on B-trees, how Fractal Tree® data structures work, and why they can help solve these problems. Tokutek sells a transaction-safe Fractal Tree storage engine for MySQL, but the presentation is primarily about the underlying technology. It includes a discussion of both the theoretical and practical aspects of Fractal Tree indexes.

I have the privilege of being able to give


  [Read more...]
ARM based data center. Inspiring.
+1 Vote Up -1Vote Down
In a previous post I wrote ARM based servers. Since then, and thanks to all the comments and responses I got, I looked more into this ARM thing and it's absolutely fascinating...

Look at this beauty (taken from the site of Calxeda, the manufacturer):

What is it? A chip? A server? No, it's a cluster of 4 servers...

And this:







  [Read more...]
Hot Table Optimization with MySQL
+1 Vote Up -0Vote Down

Table optimization is a necessary evil; tables sometimes need to be optimized to reclaim space or to improve query performance.  Unfortunately, MySQL blocks writes to a table while it is being optimized.  Because optimization time is proportional to the table size, writes can be blocked for a long time.  Fractal Tree indexes support online optimization; however, the MySQL metadata lock gets in the way of writing while optimizing.  We will describe a simple patch to MySQL that enables online optimization of TokuDB tables.

Why do tables need to be optimized?  Here are some reasons.

  • Insertions with random keys can result in a tree with underutilized leaf blocks.  Many tree algorithms split nodes in half when they become full.  If these nodes are stored in fixed sized blocks, like many B-trees do, then there can be a lot of wasted space.  Table optimization of
  [Read more...]
The catch-22 of read/write splitting
+2 Vote Up -0Vote Down
In my previous post I covered the shard-disk paradigm's pros and cons, but the conclusion that is that it cannot really qualify as a scale-out solution, when it comes to massive OLTP, big-data, big-sessions-count and mixture of reads and writes.

Read/Write splitting is achieved when numerous replicated database servers are used for reads. This way the system can scale to cope with increase in concurrent load. This solution qualifies as a scale-out solution as it allow expansion beyond the boundaries of one DB, DB machines are shared-nothing, can be added as a slave to the replication "group" when required.



  [Read more...]
Why shared-storage DB clusters don't scale
+4 Vote Up -2Vote Down
Yesterday I was asked by a customer for the reason why he had failed to achieve scale with a state-of-the-art "shared-storage" cluster. "It's a scale-out to 4 servers, but with a shared disk. And I got, after tons of work and efforts, 130% throughput, not even close to the expected 400%" he said.

Well, scale-out cannot be achieved with a shared storage and the word "shared" is the key. Scale-out is done with absolutely nothing shared or a "shared-nothing" architecture. This what makes it linear and unlimited. Any shared resource, creates a tremendous burden on each and every database server in the cluster.

In a previous post, I identified database engine activities such as buffer management, locking, thread locks/semaphores,



  [Read more...]
Basement Nodes: Turning Big Writes into Small Reads
+1 Vote Up -0Vote Down

Executive Summary

Fast indexing requires the leaves of a Fractal Tree® Index to be big. But some queries require the leaves to be small in order to get any reasonable performance. Basements nodes are our way to achieve these conflicting goals, and here I’ll explain how.

Big Leaves

On many occasions, we at Tokutek have pointed out that TokuDB is write optimized, which means TokuDB indexes data much faster than a B-tree solution such as InnoDB. As with any write-optimized data structure, Fractal Tree indexes need to bundle up lots of small writes into a few big writes. Otherwise, there’d be no way to beat a B-tree. So the question is, how big do the writes have to be?

Consider how long it takes to write k bytes to a disk. First, there is the seek time s, which we can assume to be independent of k.

  [Read more...]
The Sound and the NoSQL Fury
+0 Vote Up -0Vote Down

The signal-to-noise ratio in the NoSQL world has made it hard to figure out what’s going on, or even who has something new. For all the talk of performance in the NoSQL world, much of the most exciting part of what’s new is really not about performance at all.

Take for example, MongoDB, which has a really great data model and MapReduce has a very handy scripting language. These are genuine and probably long-lasting contributions. Their innovation is all about finding a new language to use for interacting with data. They are about NoSQL.

The confusion comes, for me, when we get to the performance side of the equation. I explore this in detail in an article I did for Datanami recently – http://www.datanami.com/datanami/2012-05-22/the_sound_and_the_nosql_fury.html.

Review of the Tutorial on Algorithms for Memory Sensitive Computing at STOC
+1 Vote Up -0Vote Down

Martin Farach-Colton and I ran a Tutorial on Algorithms for Memory Sensitive Computing on May 18th at the 44th ACM Symposium on Theory of Computing (STOC) at NYU. Here is the program for the tutorial.

Erik Demaine (MIT) spoke on the History of I/O Models. Throughout the years, a remarkable variety of computational models have been proposed to explain the effects of caching, data locality, prefetching, and single-and multi-level memory hierarchies. Erik traced the intellectual history and connections between these models. Most approaches

  [Read more...]
Scale-out your DB on ARM-based servers
+1 Vote Up -0Vote Down
Today, I think we witnessed a small sign for a big revolution...

http://www.pcworld.com/businesscenter/article/256383/dell_reaches_for_the_cloud_with_new_prototype_arm_server.html
"Dell announced a prototype low-power server with ARM processors, following a growing demand by Web companies for custom-built servers that can scale performance while reducing financial overhead on data centers"
In short, ARM (see Wikipedia definition here) is an architecture standard for processors. ARM processors are slower compared to good old x86 processors from Intel and AMD, but have power-efficiency, density and price attributes that intrigue


  [Read more...]
Webinar: TokuDB v6 Replication Performance
+1 Vote Up -0Vote Down

TokuDB® is a proven solution that scales MySQL® and MariaDB® from GBs to TBs with unmatched insert and query speed, compression, and online schema flexibility.

Tokutek’s recently launched TokuDB v6 delivers all of these features and more, with the introduction of high performance replication for MySQL and MariaDB. TokuDB v6 eliminates the common and persistent problem of “slave lag” in which a replication server is unable to keep up with the query load borne by the master server. TokuDB v6 solves this by offering high ingestion rates at the slave.

Time: 2PM EDT / 11AM PDT

REGISTER TODAY

  [Read more...]
Challenges of Big Databases with MySQL – IOUG Presentation
+3 Vote Up -0Vote Down

 

 

Many database management tasks become difficult as you move from millions of rows and gigabytes of data to billions of rows and terabytes of data. Such tasks include ingesting data while maintaining indexes; changing schemas without downtime; and supporting connections, replication, and backup. For some scaling problems (connections and replication), MySQL® is better than most of the competition. For others, such as indexing, schema changes, and backup, MySQL has typically been harder to use. Fortunately, the tasks MySQL does well are in its core, whereas the tasks that are more difficult can be solved with storage engine plug-ins.

I recently gave a talk at

  [Read more...]
SwRI Chooses TokuDB to Tackle Machine Data for an 800M+ Record Database
+0 Vote Up -1Vote Down

Tackling machine data on the ground to ensure successful operations for NASA in space

Issues addressed:

  • Scaling MySQL to multi-terabytes
  • Insertion rates as InnoDB hit a performance wall
  • Schema flexibility to handle an evolving data model

The Company:  Southwest Research Institute (SwRI) is an independent, nonprofit applied research and development organization. The staff of more than 3,000 specializes in the creation and transfer of technology in engineering and the physical sciences. Currently, SwRI is part of an international team working on the NASA

  [Read more...]
Percona Live Slides and Video Available: The Right Read Optimization is Actually Write Optimization
+2 Vote Up -0Vote Down

In April, I got to give a talk at Percona Live, about why The Right Read Optimization is Actually Write Optimization. It was my first industry talk, so I was delighted when someone in the audience said “I feel like I just earned a college credit.”

Box offered to host everyone’s slides from the conference here (mine is here). A big thanks from me to Sheeri Cabral, for

  [Read more...]
Tokutek and PalominoDB Partner to Bring Scale, Performance to Database Deployments
+3 Vote Up -0Vote Down

MySQL storage engine provider joins forces with leading database consultants to deliver support for growing number of MySQL and MariaDB customers

Lexington, MA – (May 2, 2012) – Tokutek, the leader in high-performance and agile database storage engines, today announced a strategic partnership with PalominoDB, a premier database operations and engineering consultancy, to provide database services and support to joint customers. Tokutek’s storage engine will be complemented with PalominoDB’s operational excellence, 24×7 on-call support and access to the company’s skilled team of

  [Read more...]
TokuDB v6.0: Download Available
+2 Vote Up -0Vote Down

TokuDB v6.0 is full of great improvements, like getting rid of slave lag, better compression, improved checkpointing, and support for XA.

I’m happy to announce that TokuDB v6.0 is now generally available and can be downloaded here.

Sysbench Performance

I wanted to take this time to talk about one more under-the-hood goody we’ve added to v6.0. In

  [Read more...]
My Talk on Tuesday at IOUG COLLABORATE 12
+0 Vote Up -0Vote Down

 

 

Challenges of Big Databases with MySQL

Many database management tasks become difficult as you move from millions of rows and gigabytes of data to billions of rows and terabytes of data. Such tasks include ingesting data while maintaining indexes; changing schemas without downtime; and supporting connections, replication, and backup. For some scaling problems (connections and replication), MySQL is better than most of the competition. For others, such as indexing, schema changes, and backup, MySQL has typically been harder to use. Fortunately, the tasks MySQL does well are in its core, whereas the tasks that are more difficult can be solved with storage engine

  [Read more...]
TokuDB v6.0: Even Better Compression
+1 Vote Up -0Vote Down

A key feature of our new TokuDB v6.0 release, which I have been blogging about this week, is compression. Compression is always on in TokuDB, and the compression we’ve achieved in the past has been quite good. See a previous post on the 18x compression achieved by TokuDB v5.0 on one benchmark. In our latest release, we’ve updated the way compression works and got 50% improvement on compression.

I decided to present numbers on the same set of data as the old post, so see that post for experimental details.

But first, what are the changes? TokuDB compresses large blocks

  [Read more...]
TokuDB v6.0: Getting Rid of Slave Lag
+1 Vote Up -0Vote Down

Master/slave replication is an important tool that gets used in many ways: distributing read loads among many slaves for performance, using a slave for backups so the master can handle live load, geographically distributed disaster recovery, etc. The Achilles’ Heal of slave performance is that slave workloads are single-threaded. The master can have many clients inserting, updating, querying, whereas the slave has only one insertion client: the master. InnoDB single-client performance is much slower than its multi-client performance, which means that the bottleneck in a master/slave system is often the rate at which a slave can keep up.

If the master has an average transactions per second (tps) that is higher than what the slave can handle, the slave will fall further and further behind. If the slaves are being used to distribute read workload, for example, the

  [Read more...]
Announcing TokuDB v6.0: Less Slave Lag and More Compression
+1 Vote Up -0Vote Down

We are excited to announce TokuDB® v6.0, the latest version of Tokutek’s flagship storage engine for MySQL and MariaDB.

This version offers feature and performance enhancements over previous releases, support for XA (two-phase transactional commits), better compression, and reduced performance variability associated with checkpointing. This release also brings TokuDB support up to date on MySQL v5.1, MySQL v5.5 and MariaDB v5.2. There’s a lot of great technical stuff under the hood in this release and I’ll be reviewing the improvements one-by-one over the course of this week.

I’ll be posting more details about the new features and performance, so here’s an overview of what’s in store.

Replication Slave Lag One of the things TokuDB does well is single-threaded insertions, which translates directly into  [Read more...]
Looking for Global Collisions
+0 Vote Up -0Vote Down

On Monday, I took a break from planning for the upcoming Percona Live MySQL Conference (where we have a sessionlightning talkbooth, and other misc activities planned) to go attend the UK-Massachusetts Innovation Economies Conference at the MIT Media Lab. The event featured Gov. Deval Patrick, MIT Media Lab Director Joi Ito, industry experts such as

  [Read more...]
SkySQL is Coming to a City Near You!
+1 Vote Up -0Vote Down

Now that the snow is melting and spring is in the air, the SkySQL Team is hitting the road and making the rounds of key industry events, trade shows, and meetups around the globe.  Come meet the team, pick-up a few tips and tricks for using the MySQL database, network with your peers, and learn more about SkySQL’s products and services.  Here are some the events we’ll be at this spring:

BIG Data, A New Horizon for Data Analysis
March 20 - 21, 2012
Cité Internationale Univeritaire de Paris, Paris, France

POSSCON 2012
March 28-29, 2012
Columbia Metropolitan Convention Center, Columbia, South Carolina





  [Read more...]
O’Reilly Strata 2012: The Year of the Data Scientist
+0 Vote Up -0Vote Down

We had the privilege this past week to be invited to be part of the 2012 O’Reilly Strata “Making Data Work” Conference. Some of our photos from the event are here. At the event, we were excited to have Tokutek described in front of the approximately 2,500 attendees during the keynote sessions.

Overall, the diversity of topics discussed at the conference was impressive, spanning databases, developer tools, data visualization techniques, customer stories, and business implications. The full agenda is

  [Read more...]
Evidenzia Upgrades to TokuDB v5.2 to Address Storage Growth and Scale Performance
+2 Vote Up -0Vote Down

Ensuring sufficient disk I/O to catch copyright violations at network speed.

Evidenzia GmbH & Co. KG

Issues addressed:

  • Storage growth, including maxed-out disk I/O utilization
  • Performance issues and business impact due to slow selects
  • Inability to revise data schema on the fly

The Company: Evidenzia GmbH & Co. KG is one of the leading partners of the software, movie and music industry when it comes to tracing copyright infringements

  [Read more...]
A super-set of MySQL for Big Data. Interview with John Busch, Schooner.
+0 Vote Up -0Vote Down
“Legacy MySQL does not scale well on a single node, which forces granular sharding and explicit application code changes to make them sharding-aware and results in low utilization of severs”– Dr. John Busch, Schooner Information Technology A super-set of MySQL suitable for Big Data? On this subject, I have interviewed Dr. John Busch, Founder, Chairman, [...]
Tokutek Selected as a Finalist for O’Reilly Strata Conference
+2 Vote Up -0Vote Down

We are excited to announce that we’ve been named as one of ten finalists selected for the startup showcase at the O’Reilly Strata “Making Data Work” Conference at the end of this month in Santa Clara, California. The startup showcase will be held on February 29th, starting at 6:30 pm.

The conference offers a great overview of the big data space, with tracks on Data Science, Business and Industry,

  [Read more...]
New England’s Victory (for Big Data)
+1 Vote Up -0Vote Down

While it might not have been New England’s weekend on the Big Gridiron, it was certainly New England’s day for Big Data at the New England Database Summit on Friday at MIT.

The summit was well attended, with 350 registrants and keynotes from prominent MySQL users such as Mark Callaghan. The coverage was quite broad, with presentations running the gamut from grad students (complete with bodyguards and intimidating academic

  [Read more...]
MySQL Conference and Expo Talk on Benchmarking
+2 Vote Up -0Vote Down

I’ll be speaking on April 11th at 4:30 pm in Room 4 in at the Percona Conference and Expo Talk. The topic will be “Creating a Benchmark Infrastructure That Just Works.

Throughout my career I’ve been involved with maintaining the performance of database applications and therefore created many benchmark frameworks. At Tokutek, an important part of my role is measuring the performance of our storage engine over time and versus competing solutions. There is nothing proprietary about

  [Read more...]
Big Kettle News
+0 Vote Up -0Vote Down

Dear Kettle fans,

Today I’m really excited to be able to announce a few really important changes to the Pentaho Data Integration landscape. To me, the changes that are being announced today compare favorably to reaching Kettle version 1.0 some 9 years ago, or reaching version 2.0 with plugin support or even open sourcing Kettle itself…

First of all…

Pentaho is again open sourcing an important piece of software.  Today we’re bringing all big data related software to you as open source software.  This includes all currently available capabilities to access HDFS, MongoDB, Cassandra, HBase, the specific VFS drivers we created as well as the ability to execute work inside of Hadoop (MapReduce), Amazon EMR, Pig and so

  [Read more...]
Previous 30 Newer Entries Showing entries 91 to 120 of 168 Next 30 Older Entries

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.