Showing entries 1 to 4
Displaying posts with tag: mixed (reset)
The Fast Way to Import CSV Data Into a Tungsten Cluster

The Question Recently, a customer asked us:

After importing a new section of user data into our Tungsten cluster, we are seeing perpetually rising replication lag. We are sitting at 8.5hrs estimated convergence time after importing around 50 million rows and this lag is climbing continuously. We are currently migrating some of our users from a NoSQL database into our Tungsten cluster. We have a procedure to write out a bunch of CSV files after translating our old data into columns and then we recursively send them to the write master using the mysql client. Specifically our import SQL is doing LOAD DATA LOCAL INFILE and the reading in a large CSV file to do the import. We have 20k records per CSV file and we have 12 workers which insert them in parallel.

Simple Overview The Skinny

In cases like this, the slaves are having trouble with the database unable to keep up with the apply stage …

[Read more]
Temporary Tables and Replication

I recently wrote about non-deterministic queries in the replication stream. That’s resolved by using either MIXED or ROW based replication rather than STATEMENT based.

Another thing that’s not fully handled by STATEMENT based replication is temporary tables. Imagine the following:

  1. Master: CREATE TEMPORARY TABLE rpltmpbreak (i INT);
  2. Wait for slave to replicate this statement, then stop and start mysqld (not just STOP/START SLAVE)
  3. Master: INSERT INTO rpltmpbreak VALUES (1);
  4. Slave: SHOW SLAVE STATUS \G

If for any reason a slave server shuts down and restarts after the temp table creation, replication will break because the temporary table will no longer exist on the restarted slave server. It’s obvious when you think about it, but nevertheless it’s quite …

[Read more]
Temporary files, binlog_cache_size, and row-based binary logging

Even when the output of EXPLAIN doesn’t show “using temporary”, a temporary file may still be used in certain cases.

That’s not to say the query needs the temporary file to actually resolve the query (like what you’d see from the need for a derived table). But rather, the temporary file I’m speaking of is due to binary logging.

In particular, you can see this easily if using InnoDB, (most commonly) row-based binary logging, and you issue a large transaction, say a large UPDATE (large meaning something larger than the size of binlog_cache_size). In this case, you’ll notice a temporary file being …

[Read more]
Will you use row-based replication by default?



MySQL 5.1 introduces row based replication, a way of replicating data that fixes many inconsistencies of the statement based replication, the standard method used by MySQL so far.


The good: row based replication solves some problems when replicating the result of non deterministic functions, such as UUID() or NOW().
The bad: row-based replication may break existing applications, where you count on the quirks of statement based replication to execute conditionally (updates base on @@server_id, for example), and may perform badly on updates applied to very large tables.

[Read more]
Showing entries 1 to 4