Showing entries 22083 to 22092 of 44045
« 10 Newer Entries | 10 Older Entries »
Integrating MySQL and Hadoop - or - A different approach on using CSV files in MySQL

We use both MySQL and Hadoop a lot. If you utilize each system to its strengths then this is a powerful combination. One problem we are constantly facing is to make data extracted from our Hadoop cluster available in MySQL.

The problem

Look at this simple example: Let’s say we have a table customer:

CREATE TABLE customer {

    id UNSIGNED INT NOT NULL,
    firstname VARCHAR(100) NOT NULL,
    lastname VARCHAR(100) NOT NULL,
    city VARCHAR(100) NOT NULL,

    PRIMARY KEY(id)
}

In addition to that we store orders customers made in Hadoop. An order includes: customerId, date, itemId, price. Note that these structures serve as a very simplified example.

Let’s say we want to find the first 50 customers, that placed at least one order sorted by firstname ascending. If both tables …

[Read more]
Micro-benchmarking pthread_cond_broadcast()

In my work on group commit for MariaDB, I have the following situation:

A group of threads are going to participate in group commit. This means that one of the threads, called the group leader, will run an fsync() for all of them, while the other threads wait. Once the group leader is done, it needs to wake up all of the other threads.

The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait():

  bool wakeup= false;
  pthread_cond_t wakeup_cond;
  pthread_mutex_t wakeup_mutex

Waiter:

  pthread_mutex_lock(&wakeup_mutex);
  while (!wakeup)
    pthread_cond_wait(&wakeup_cond, &wakeup_mutex);
  pthread_mutex_unlock(&wakeup_mutex);
  // Continue processing after group commit …
[Read more]
Micro-benchmarking pthread_cond_broadcast()

In my work on group commit for MariaDB, I have the following situation:

A group of threads are going to participate in group commit. This means that one of the threads, called the group leader, will run an fsync() for all of them, while the other threads wait. Once the group leader is done, it needs to wake up all of the other threads.

The obvious way to do this is to have the group leader call pthread_cond_broadcast() on a condition that the other threads are waiting for with pthread_cond_wait():

  bool wakeup= false;
  pthread_cond_t wakeup_cond;
  pthread_mutex_t wakeup_mutex

Waiter:

  pthread_mutex_lock(&wakeup_mutex);
  while (!wakeup)
    pthread_cond_wait(&wakeup_cond, &wakeup_mutex);
  pthread_mutex_unlock(&wakeup_mutex);
  // Continue processing after group commit …
[Read more]
My Opinion on NoSQL DBs

I'll let the following express my opinion about NoSQL



and..


Why MySQL replication is better than mysqlbinlog for recovery

You have a backup, and you have the binary logs between that backup and now. You need to do point-in-time recovery (PITR) for some reason. What do you do? The traditional answer is “restore the backup and then use mysqlbinlog to apply the binary logs.” But there’s a much better way to do it.

The better way is to set up a server instance with no data, and load the binary logs into it. I call this a “binlog server.” Then restore your backup and start the server as a replication slave of the binlog server. Let the roll-forward of the binlogs happen through replication, not through the mysqlbinlog tool.

Why is this better? Because replication is a more tested way of applying binary logs to a server. The results are much more likely to be correct, in my opinion. Plus, replication is easier and more convenient to use. You can do nice things like START SLAVE UNTIL, skip statements, stop and restart without having to figure out …

[Read more]
Why MySQL replication is better than mysqlbinlog for recovery

You have a backup, and you have the binary logs between that backup and now. You need to do point-in-time recovery (PITR) for some reason. What do you do? The traditional answer is “restore the backup and then use mysqlbinlog to apply the binary logs.” But there’s a much better way to do it. The better way is to set up a server instance with no data, and load the binary logs into it.

Cassandra and Ganglia



I finally got some time to do some house cleaning. One of my nagging low-hanging fruit jobs was to stop using jconsole as my monitor. I created a ganglia script to graph what is above. The image illustrated above I am showing all the Cassandra servers and their total row read stages completed in the last hour as a gauge. In essence I am graphing the delta of the change between ganglia script runs.

How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows

cass_{stat_class}_{key}

stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family …

[Read more]
MySQL Cluster: 5 Steps to Getting Started, then 5 More to Scale for the Web

Join us for a live and interactive webinar session where we will demonstrate how to start an evaluation of the MySQL Cluster database in 5 easy steps, and then how to expand your deployment for web & telecoms-scale services.

Just register here:

http://www.mysql.com/news-and-events/web-seminars/display-566.html


Getting Started will describe how to:

  • Get the software
  • Install it
  • Configure it
  • Run it
  • Test it

Scaling for HA and the web will describe how to:

  • Review the requirements for a HA configuration
  • Install the software on more servers
  • Update & extend the configuration from a single host to 4
  • Roll out the changes
  • On-line scaling to add further …
[Read more]
dbbenchmark.com – configuring OpenBSD for MySQL benchmarking

Here are some quick commands for installing the proper packages and requirements for the MySQL dbbenchmark program.

export PKG_PATH="ftp://openbsd.mirrors.tds.net/pub/OpenBSD/4.7/packages/amd64/"
pkg_add -i -v wget
wget http://dbbenchmark.googlecode.com/files/dbbenchmark-version-0.1.beta_rev26.tar.gz
pkg_add -i -v python
Ambiguous: choose package for python
 a       0: 
         1: python-2.4.6p2
         2: python-2.5.4p3
         3: python-2.6.3p1
Your choice: 2

pkg_add -i -v py-mysql
pkg_add -i -v mysql
pkg_add -i -v mysql-server
ln -s /usr/local/bin/python2.5 /usr/bin/python
gzip -d dbbenchmark-version-0.1.beta_rev26.tar.gz
tar -xvf dbbenchmark-version-0.1.beta_rev26.tar
cd dbbenchmark-version-0.1.beta_rev26
./dbbenchmark.py --print-sql
 - login to mysql and execute sql commands
./dbbenchmark.py
Replication and “the lost binlog”

Unless you set sync_binlog = 1, a system crash on the master will likely fail any slave with an “Client requested master to start replication from impossible position” error. Generally, this kind of situation requires manual intervention. When we see this, we make sure things indeed failed “past the end” of a binlog (i.e. the bit that didn’t get to the physical platter before the crash), reposition the slave to the next binlog, and use the Maatkit tools to ensure the slave is properly synced.

sync_binlog=1 is a problem in itself, because it makes the server not just do one fsync per commit, but several and that’s serious overhead. sync_binlog is actually not a boolean but a “fsync binlog every N commits” where 0 meaning “never”. So you could set it to 10 (fsync every 10 commits) and thus reduce the loss a little bit while not doing too much harm to performance. But it’s not ideal and won’t always prevent the above …

[Read more]
Showing entries 22083 to 22092 of 44045
« 10 Newer Entries | 10 Older Entries »