Planet MySQL

Sep

2010

Integrating MySQL and Hadoop - or - A different approach on using CSV files in MySQL

Posted by Peter Romianowski on Sun 05 Sep 2010 21:46 UTC
Tags:

hadoop, MySQL

We use both MySQL and Hadoop a lot. If you utilize each system to its strengths then this is a powerful combination. One problem we are constantly facing is to make data extracted from our Hadoop cluster available in MySQL.

The problem

Look at this simple example: Let’s say we have a table customer:

CREATE TABLE customer {

id UNSIGNED INT NOT NULL,
firstname VARCHAR(100) NOT NULL,
lastname VARCHAR(100) NOT NULL,
city VARCHAR(100) NOT NULL,

PRIMARY KEY(id)
}

In addition to that we store orders customers made in Hadoop. An order includes: customerId, date, itemId, price. Note that these structures serve as a very simplified example.

Let’s say we want to find the first 50 customers, that placed at least one order sorted by firstname ascending. If both tables …

[Read more]

Sep

2010

Micro-benchmarking pthread_cond_broadcast()

Posted by Kristian Nielsen on Sun 05 Sep 2010 16:38 UTC
Tags:

Programming, freesoftware, mariadb, MySQL, Performance

In my work on group commit for MariaDB, I have the following situation:

A group of threads are going to participate in group commit. This means that one of the threads, called the group leader, will run an fsync() for all of them, while the other threads wait. Once the group leader is done, it needs to wake up all of the other threads.

The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait():

  bool wakeup= false;
  pthread_cond_t wakeup_cond;
  pthread_mutex_t wakeup_mutex

Waiter:

  pthread_mutex_lock(&wakeup_mutex);
  while (!wakeup)
    pthread_cond_wait(&wakeup_cond, &wakeup_mutex);
  pthread_mutex_unlock(&wakeup_mutex);
  // Continue processing after group commit …

[Read more]

Sep

2010

Micro-benchmarking pthread_cond_broadcast()

Posted by Kristian Nielsen on Sun 05 Sep 2010 16:38 UTC
Tags:

Programming, freesoftware, mariadb, MySQL, Performance

In my work on group commit for MariaDB, I have the following situation:

The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait():

  bool wakeup= false;
  pthread_cond_t wakeup_cond;
  pthread_mutex_t wakeup_mutex

Waiter:

  pthread_mutex_lock(&wakeup_mutex);
  while (!wakeup)
    pthread_cond_wait(&wakeup_cond, &wakeup_mutex);
  pthread_mutex_unlock(&wakeup_mutex);
  // Continue processing after group commit …

[Read more]

Sep

2010

My Opinion on NoSQL DBs

Posted by Jonathan Levin on Sun 05 Sep 2010 10:59 UTC

I'll let the following express my opinion about NoSQL

and..

Sep

2010

Why MySQL replication is better than mysqlbinlog for recovery

Posted by Baron Schwartz (xaprb) on Sat 04 Sep 2010 16:12 UTC
Tags:

sql

You have a backup, and you have the binary logs between that backup and now. You need to do point-in-time recovery (PITR) for some reason. What do you do? The traditional answer is “restore the backup and then use mysqlbinlog to apply the binary logs.” But there’s a much better way to do it.

The better way is to set up a server instance with no data, and load the binary logs into it. I call this a “binlog server.” Then restore your backup and start the server as a replication slave of the binlog server. Let the roll-forward of the binlogs happen through replication, not through the mysqlbinlog tool.

Why is this better? Because replication is a more tested way of applying binary logs to a server. The results are much more likely to be correct, in my opinion. Plus, replication is easier and more convenient to use. You can do nice things like START SLAVE UNTIL, skip statements, stop and restart without having to figure out …

[Read more]

Sep

2010

Why MySQL replication is better than mysqlbinlog for recovery

Posted by Baron Schwartz (xaprb) on Sat 04 Sep 2010 00:00 UTC

Sep

2010

Cassandra and Ganglia

Posted by Dathan Pattishall on Fri 03 Sep 2010 22:42 UTC
Tags:

cassandra, matkat

I finally got some time to do some house cleaning. One of my nagging low-hanging fruit jobs was to stop using jconsole as my monitor. I created a ganglia script to graph what is above. The image illustrated above I am showing all the Cassandra servers and their total row read stages completed in the last hour as a gauge. In essence I am graphing the delta of the change between ganglia script runs.

How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows

cass_{stat_class}_{key}

stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family …

[Read more]

Sep

2010

MySQL Cluster: 5 Steps to Getting Started, then 5 More to Scale for the Web

Posted by Oracle MySQL Group on Fri 03 Sep 2010 14:10 UTC
Tags:

MySQL Cluster, HA Database, Real-Time Database, MySQL

Join us for a live and interactive webinar session where we will demonstrate how to start an evaluation of the MySQL Cluster database in 5 easy steps, and then how to expand your deployment for web & telecoms-scale services.

Just register here:

http://www.mysql.com/news-and-events/web-seminars/display-566.html

Getting Started will describe how to:

Get the software
Install it
Configure it
Run it
Test it

Scaling for HA and the web will describe how to:

Review the requirements for a HA configuration
Install the software on more servers
Update & extend the configuration from a single host to 4
Roll out the changes
On-line scaling to add further …

[Read more]

Sep

2010

dbbenchmark.com – configuring OpenBSD for MySQL benchmarking

Posted by Matt Reid on Fri 03 Sep 2010 03:51 UTC
Tags:

Uncategorized, database, Programming, Python, benchmark, install, mysql server, dbbenchmark, MySQL

Here are some quick commands for installing the proper packages and requirements for the MySQL dbbenchmark program.

export PKG_PATH="ftp://openbsd.mirrors.tds.net/pub/OpenBSD/4.7/packages/amd64/"
pkg_add -i -v wget
wget http://dbbenchmark.googlecode.com/files/dbbenchmark-version-0.1.beta_rev26.tar.gz
pkg_add -i -v python
Ambiguous: choose package for python
 a       0: 
         1: python-2.4.6p2
         2: python-2.5.4p3
         3: python-2.6.3p1
Your choice: 2

pkg_add -i -v py-mysql
pkg_add -i -v mysql
pkg_add -i -v mysql-server
ln -s /usr/local/bin/python2.5 /usr/bin/python
gzip -d dbbenchmark-version-0.1.beta_rev26.tar.gz
tar -xvf dbbenchmark-version-0.1.beta_rev26.tar
cd dbbenchmark-version-0.1.beta_rev26
./dbbenchmark.py --print-sql
 - login to mysql and execute sql commands
./dbbenchmark.py

Sep

2010

Replication and “the lost binlog”

Posted by Open Query on Fri 03 Sep 2010 01:26 UTC
Tags:

Uncategorized, mysql mariadb replication maatkit recovery sync_binlog fsync

Unless you set sync_binlog = 1, a system crash on the master will likely fail any slave with an “Client requested master to start replication from impossible position” error. Generally, this kind of situation requires manual intervention. When we see this, we make sure things indeed failed “past the end” of a binlog (i.e. the bit that didn’t get to the physical platter before the crash), reposition the slave to the next binlog, and use the Maatkit tools to ensure the slave is properly synced.

sync_binlog=1 is a problem in itself, because it makes the server not just do one fsync per commit, but several and that’s serious overhead. sync_binlog is actually not a boolean but a “fsync binlog every N commits” where 0 meaning “never”. So you could set it to 10 (fsync every 10 commits) and thus reduce the loss a little bit while not doing too much harm to performance. But it’s not ideal and won’t always prevent the above …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links