Planet MySQL Planet MySQL: Meta Deutsch Español Français Italiano 日本語 Русский Português 中文
Parallel replication and GTID - A tale of two implementations
+7 Vote Up -0 Vote Down

MySQL 5.6 is probably the version of MySQL with the biggest bundle of new features. You may want to try it soon, since it's now released as GA, and I would like to offer some practical experience on how to use some of the advanced features.

Since replication is my main interest, I will focus on some of the new features in this field, and I will compare what's available in MySQL 5.6 with Tungsten Replicator.

The focus of the comparison is usability, manageability, and some hidden functionality. Parallel replication has been available with Tungsten Replicator for almost two years, and Global Transaction Identifiers for much longer than that. With MySQL 5.6, it seems that the MySQL team wants to close the gap. While the main feature (parallel execution threads) is available and performing well, there are some shocking differences in terms of ease of use, administration handles, and capabilities, that suggest at least poor integration into the server.

To test replication, I have enabled parallel workers and GTID in the slaves, and I ran 5 quick processes on separated databases. I repeated the operation with MySQL 5.6 and Tungsten Replicator. Let's see what I have found.

Monitoring

First of all, there is "SHOW SLAVE STATUS," which has been enhanced to include the GTID. Trouble is, this command was designed for a single replication channel, and when the GTID from several parallel threads are listed, the result is puzzling.


show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 18675
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 386003338
Relay_Log_File: mysql_sandbox18676-relay-bin.000006
Relay_Log_Pos: 315698722
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 315698559
Relay_Log_Space: 386003928
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 646
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: f6ea8d94-7147-11e2-a036-6c626da07446
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Reading event from the relay log
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: f6ea8d94-7147-11e2-a036-6c626da07446:1-1089548
Executed_Gtid_Set: f6ea8d94-7147-11e2-
a036-6c626da07446:1-891950:891952-891953:891955-891963:891965-891969:891971-891973:891976-891992:891994-8919
95:891997:891999:892001-892002:892005-892016:892019-892023:892025-892026:892028:892030-892032:892034-892039:
892041-892044:892046:892048-892051:892053-892054:892056-892063:892065-892072:892074-892084:892086-892089:892
091:892094:892096-892111:892113-892122:892124-892137:892139-892142:892144-892146:892148-892150:892152:892154
-892159:892161-892171:892173-892182:892184-892185:892187:892189-892193:892196-892198:892200-892206:892208:89
2210-892211:892213-892216:892218-892219:892222:892224-892226:892228:892230-892249:892251:892255:892257:89225
9-892260:892262-892263:892265:892268:892270-892288:892290:892292:892295-892298:892300-892307:892309:892312:8
92314-892317:892319-892320:892324-892328:892332-892336:892342:892344-892349:892352-892355:892359-892360:8923
62-892363:892369-892372:892376:892378-892384:892387:892390:892392:892394:892396-892397:892399:892401-892403:
892405-892407:892409:892413-892415:892417:892421:892423-892426:892429-892431:892433:892435:892437-892439:892
442:892445:892448

Auto_Position: 0

After this abundant display of GTID (alas, without any reference to the transactions being processed), I was expecting something from the tables dedicated to replication monitoring. However, what is missing in these tables is any reference to GTIDs, and any context about the transactions. There is no indication of which schema the changes are happening, how well or bad distributed they are, whether there was the need for serialization, or other info.


select * from mysql.slave_worker_info\G
*************************** 1. row ***************************
Id: 1
Relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Relay_log_pos: 91550519
Master_log_name: mysql-bin.000002
Master_log_pos: 91550356
Checkpoint_relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Checkpoint_relay_log_pos: 91371802
Checkpoint_master_log_name: mysql-bin.000002
Checkpoint_master_log_pos: 91371639
Checkpoint_seqno: 508
Checkpoint_group_size: 64
Checkpoint_group_bitmap: #####(unprintable characters replaced)#####
*************************** 2. row ***************************
Id: 2
Relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Relay_log_pos: 91497101
Master_log_name: mysql-bin.000002
Master_log_pos: 91496938
Checkpoint_relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Checkpoint_relay_log_pos: 91319665
Checkpoint_master_log_name: mysql-bin.000002
Checkpoint_master_log_pos: 91319502
Checkpoint_seqno: 501
Checkpoint_group_size: 64
?! ?AA@oint_group_bitmap: #####(unprintable characters replaced)#####
*************************** 3. row ***************************
Id: 3
Relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Relay_log_pos: 91540831
Master_log_name: mysql-bin.000002
Master_log_pos: 91540668
Checkpoint_relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Checkpoint_relay_log_pos: 91360379
Checkpoint_master_log_name: mysql-bin.000002
Checkpoint_master_log_pos: 91360216
Checkpoint_seqno: 511
Checkpoint_group_size: 64
Checkpoint_group_bitmap: #####(unprintable characters replaced)#####
*************************** 4. row ***************************
Id: 4
Relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Relay_log_pos: 91461999
Master_log_name: mysql-bin.000002
Master_log_pos: 91461836
Checkpoint_relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Checkpoint_relay_log_pos: 91281068
Checkpoint_master_log_name: mysql-bin.000002
Checkpoint_master_log_pos: 91280905
Checkpoint_seqno: 510
Checkpoint_group_size: 64
Checkpoint_group_bitmap: #####(unprintable characters replaced)#####
*************************** 5. row ***************************
Id: 5
Relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Relay_log_pos: 91373959
Master_log_name: mysql-bin.000002
Master_log_pos: 91373796
Checkpoint_relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Checkpoint_relay_log_pos: 91192982
Checkpoint_master_log_name: mysql-bin.000002
Checkpoint_master_log_pos: 91192819
Checkpoint_seqno: 508
Checkpoint_group_size: 64
Checkpoint_group_bitmap: #####(unprintable characters replaced)#####

The table dedicated to parallel replication doesn't give us any information about GTID. And it doesn't say in which schema the transaction is happening. This lack of information, for a feature that is based on schema isolation, is quite startling.

So, we have a look at the crash-safe slave tables.


select * from mysql.slave_master_info\G
*************************** 1. row ***************************
Number_of_lines: 23
Master_log_name: mysql-bin.000002
Master_log_pos: 385282815
Host: 127.0.0.1
User_name: rsandbox
User_password: rsandbox
Port: 18675
Connect_retry: 60
Enabled_ssl: 0
Ssl_ca:
Ssl_capath:
Ssl_cert:
Ssl_cipher:
Ssl_key:
Ssl_verify_server_cert: 0
Heartbeat: 1800
Bind:
Ignored_server_ids: 0
Uuid: f6ea8d94-7147-11e2-a036-6c626da07446
Retry_count: 86400
Ssl_crl:
Ssl_crlpath:
Enabled_auto_position: 0

select * from mysql.slave_relay_log_info\G
*************************** 1. row ***************************
Number_of_lines: 7
Relay_log_name: ./mysql_sandbox18676-relay-bin.000006
Relay_log_pos: 235324325
Master_log_name: mysql-bin.000002
Master_log_pos: 235324162
Sql_delay: 0
Number_of_workers: 5
Id: 1

What's wrong with this picture? Two main things:

  • There is no mention of GTID in these tables either. So, if you are using GTID, your only option is to look at the binary logs, and see if you get any solace when a failover occurs. But the server itself does not offer any help.
  • There is only one row available for the master and relay log info. When you are using parallel replication, this means that either every commit must lock the same row, or these tables are useless for keeping the slave crash safe. If the information used for crash safe comes from the parallel workers table, then the crash safe tables are redundant. But as you can see on your own, the parallel workers table has less information than master_info and relay_log_info.

Thus, we have a look at how GTID and parallel threads are implemented in Tungsten.

Even limiting the information to the tables found in the server, we find several differences:

  • The GTID is displayed in every row. (It's the column named seqno). The original binlog name and position are also preserved and associated with the GTID (in the column named eventid.)
  • Every row displays in which schema it is being processed. (It's the column named shard_id)
  • There is no separated table for crash-safe info and parallel threads. When Tungsten uses a single thread, the table has only one row. When it is using parallel appliers, then there are multiple rows, one for each applier. This way, there is no possibility of deadlock, as every thread locks only one row.


select * from tungsten_tsandbox.trep_commit_seqno
+---------+---------+--------+-----------+-----------+--------------+-------------------------------------+-----------------+---------------------+------------+---------------------+
| task_id | seqno | fragno | last_frag | source_id | epoch_number | eventid | applied_latency | update_timestamp | shard_id | extract_timestamp |
+---------+---------+--------+-----------+-----------+--------------+-------------------------------------+-----------------+---------------------+------------+---------------------+
| 0 | 2218566 | 0 | 1 | 127.0.0.1 | 0 | mysql-bin.000002:0000000330476211;0 | 1 | 2013-02-07 23:12:07 | evaluator1 | 2013-02-07 23:12:06 |
| 1 | 2218521 | 0 | 1 | 127.0.0.1 | 0 | mysql-bin.000002:0000000330469587;0 | 1 | 2013-02-07 23:12:07 | evaluator2 | 2013-02-07 23:12:06 |
| 2 | 2220380 | 0 | 1 | 127.0.0.1 | 0 | mysql-bin.000002:0000000330747639;0 | 1 | 2013-02-07 23:12:07 | evaluator3 | 2013-02-07 23:12:06 |
| 3 | 2219360 | 0 | 1 | 127.0.0.1 | 0 | mysql-bin.000002:0000000330594617;0 | 1 | 2013-02-07 23:12:07 | evaluator4 | 2013-02-07 23:12:06 |
| 4 | 2221220 | 0 | 1 | 127.0.0.1 | 0 | mysql-bin.000002:0000000330873205;0 | 0 | 2013-02-07 23:12:07 | evaluator5 | 2013-02-07 23:12:07 |
+---------+---------+--------+-----------+-----------+--------------+-------------------------------------+-----------------+---------------------+------------+---------------------+

You can see the same information in vertical output, to appreciate the level of detail.


select * from tungsten_tsandbox.trep_commit_seqno\G
*************************** 1. row ***************************
task_id: 0
seqno: 2257934
fragno: 0
last_frag: 1
source_id: 127.0.0.1
epoch_number: 0
eventid: mysql-bin.000002:0000000336350320;0
applied_latency: 9
update_timestamp: 2013-02-07 23:12:26
shard_id: evaluator1
extract_timestamp: 2013-02-07 23:12:17
*************************** 2. row ***************************
task_id: 1
seqno: 2257938
fragno: 0
last_frag: 1
source_id: 127.0.0.1
epoch_number: 0
eventid: mysql-bin.000002:0000000336350875;0
applied_latency: 9
update_timestamp: 2013-02-07 23:12:26
shard_id: evaluator2
extract_timestamp: 2013-02-07 23:12:17
*************************** 3. row ***************************
task_id: 2
seqno: 2257936
fragno: 0
last_frag: 1
source_id: 127.0.0.1
epoch_number: 0
eventid: mysql-bin.000002:0000000336350648;0
applied_latency: 9
update_timestamp: 2013-02-07 23:12:26
shard_id: evaluator3
extract_timestamp: 2013-02-07 23:12:17
*************************** 4. row ***************************
task_id: 3
seqno: 2257916
fragno: 0
last_frag: 1
source_id: 127.0.0.1
epoch_number: 0
eventid: mysql-bin.000002:0000000336347768;0
applied_latency: 9
update_timestamp: 2013-02-07 23:12:26
shard_id: evaluator4
extract_timestamp: 2013-02-07 23:12:17
*************************** 5. row ***************************
task_id: 4
seqno: 2257933
fragno: 0
last_frag: 1
source_id: 127.0.0.1
epoch_number: 0
eventid: mysql-bin.000002:0000000336350156;0
applied_latency: 9
update_timestamp: 2013-02-07 23:12:26
shard_id: evaluator5
extract_timestamp: 2013-02-07 23:12:17

In addition to the information that you can get from the database server, Tungsten has its own tools that provide information about the process. For example, trepctl status -name stores provides, among other things, detailed information about how many transactions were processed, accepted, and rejected by each channel, and if there was any collision that caused serialization.


trepctl status -name stores
Processing status command (stores)...
NAME VALUE
---- -----
activeSeqno : 2267132
doChecksum : false
flushIntervalMillis : 0
fsyncOnFlush : false
logConnectionTimeout : 28800
logDir : /home/tungsten/tsb2/db3/tlogs/tsandbox
logFileRetainMillis : 604800000
logFileSize : 100000000
maximumStoredSeqNo : 2267230
minimumStoredSeqNo : 0
name : thl
readOnly : false
storeClass : com.continuent.tungsten.replicator.thl.THL
timeoutMillis : 2147483647
NAME VALUE
---- -----
criticalPartition : -1
discardCount : 0
estimatedOfflineInterval: 0.0
eventCount : 2267140
headSeqno : 2267140
maxOfflineInterval : 5
maxSize : 10
name : parallel-queue
queues : 5
serializationCount : 0
serialized : false
stopRequested : false
store.0 : THLParallelReadTask task_id=0 thread_name=store-thl-0 hi_seqno=2267140 lo_seqno=1 read=2267140 accepted=460813 discarded=1806327 events=0
store.1 : THLParallelReadTask task_id=1 thread_name=store-thl-1 hi_seqno=2267140 lo_seqno=1 read=2267140 accepted=452812 discarded=1814328 events=0
store.2 : THLParallelReadTask task_id=2 thread_name=store-thl-2 hi_seqno=2267140 lo_seqno=1 read=2267140 accepted=450812 discarded=1816328 events=0
store.3 : THLParallelReadTask task_id=3 thread_name=store-thl-3 hi_seqno=2267140 lo_seqno=1 read=2267140 accepted=450397 discarded=1816743 events=0
store.4 : THLParallelReadTask task_id=4 thread_name=store-thl-4 hi_seqno=2267140 lo_seqno=1 read=2267141 accepted=452306 discarded=1814834 events=0
storeClass : com.continuent.tungsten.replicator.thl.THLParallelQueue
syncInterval : 10000
Finished status command (stores)...

The same tool can show what is happening for each shard. When you are dealing with many channels, you need to know who is doing what, and sometime with a high level of detail. This is especially useful when things go wrong. Knowing what has happened to your transaction, and at which stage it stopped working could be a great help to fix problems. You can follow every transaction using either the GTID (appliedLastSeqno), or the binlog+position (appliedLastEventId), combined with the schema where it is happening (shardId.)

trepctl status -name shards
Processing status command (shards)...
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338943258;0
appliedLastSeqno : 2275303
appliedLatency : 18.942
eventCount : 243189
shardId : evaluator1
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338940934;0
appliedLastSeqno : 2275288
appliedLatency : 18.935
eventCount : 231993
shardId : evaluator10
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338939386;0
appliedLastSeqno : 2275277
appliedLatency : 18.931
eventCount : 234687
shardId : evaluator2
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338938221;0
appliedLastSeqno : 2275269
appliedLatency : 18.928
eventCount : 231045
shardId : evaluator3
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338942040;0
appliedLastSeqno : 2275295
appliedLatency : 18.933
eventCount : 225755
shardId : evaluator4
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338942486;0
appliedLastSeqno : 2275298
appliedLatency : 18.937
eventCount : 221864
shardId : evaluator5
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338942930;0
appliedLastSeqno : 2275301
appliedLatency : 18.94
eventCount : 219574
shardId : evaluator6
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338943372;0
appliedLastSeqno : 2275304
appliedLatency : 18.931
eventCount : 220078
shardId : evaluator7
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338942600;0
appliedLastSeqno : 2275299
appliedLatency : 18.929
eventCount : 221647
shardId : evaluator8
stage : q-to-dbms
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000338940109;0
appliedLastSeqno : 2275282
appliedLatency : 18.931
eventCount : 226607
shardId : evaluator9
stage : q-to-dbms
Finished status command (shards)...

And if the info by shard is not enough, we can also see the functioning of replication in terms of tasks. For the current transactions, we can check what is happening at every stage, with a level of detail that the MySQL server cannot provide.


trepctl status -name tasks
Processing status command (tasks)...
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339749071;0
appliedLastSeqno : 2280695
appliedLatency : 22.267
applyTime : 53.543
averageBlockSize : 1.000
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339749071;0
currentLastFragno : 0
currentLastSeqno : 2280695
eventCount : 2280695
extractTime : 3053.902
filterTime : 0.194
otherTime : 2.864
stage : remote-to-thl
state : extract
taskId : 0
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339747579;0
appliedLastSeqno : 2280685
appliedLatency : 22.233
applyTime : 135.079
averageBlockSize : 9.983
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339747579;0
currentLastFragno : 0
currentLastSeqno : 2280685
eventCount : 2280685
extractTime : 2972.509
filterTime : 0.167
otherTime : 2.714
stage : thl-to-q
state : extract
taskId : 0
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339747579;0
appliedLastSeqno : 2280685
appliedLatency : 22.243
applyTime : 174.149
averageBlockSize : 9.813
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339747579;0
currentLastFragno : 0
currentLastSeqno : 2280685
eventCount : 463769
extractTime : 2930.461
filterTime : 4.553
otherTime : 1.317
stage : q-to-dbms
state : extract
taskId : 0
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339746639;0
appliedLastSeqno : 2280679
appliedLatency : 22.247
applyTime : 173.182
averageBlockSize : 9.790
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339746639;0
currentLastFragno : 0
currentLastSeqno : 2280679
eventCount : 455841
extractTime : 2931.707
filterTime : 4.374
otherTime : 1.219
stage : q-to-dbms
state : extract
taskId : 1
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339747249;0
appliedLastSeqno : 2280683
appliedLatency : 22.246
applyTime : 171.505
averageBlockSize : 9.819
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339747249;0
currentLastFragno : 0
currentLastSeqno : 2280683
eventCount : 453846
extractTime : 2933.467
filterTime : 4.28
otherTime : 1.23
stage : q-to-dbms
state : extract
taskId : 2
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339747415;0
appliedLastSeqno : 2280684
appliedLatency : 22.252
applyTime : 168.773
averageBlockSize : 9.792
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339747415;0
currentLastFragno : 0
currentLastSeqno : 2280684
eventCount : 453447
extractTime : 2936.219
filterTime : 4.283
otherTime : 1.213
stage : q-to-dbms
state : extract
taskId : 3
NAME VALUE
---- -----
appliedLastEventId: mysql-bin.000002:0000000339746311;0
appliedLastSeqno : 2280677
appliedLatency : 22.25
applyTime : 167.426
averageBlockSize : 9.804
cancelled : false
currentLastEventId: mysql-bin.000002:0000000339746311;0
currentLastFragno : 0
currentLastSeqno : 2280677
eventCount : 454922
extractTime : 2937.622
filterTime : 4.188
otherTime : 1.25
stage : q-to-dbms
state : extract
taskId : 4
Finished status command (tasks)...

Fixing an error

Let's try to see what happens with a simple, classic replication error. We will enter a record in the slave, and then do the same in the master. We will assume full knowledge of the facts, and try to repair the damage.

In MySQL 5.6, when the slave stops replication because of the duplicate key error, we get this:


show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 127.0.0.1
Master_User: rsandbox
Master_Port: 18675
Connect_Retry: 60
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 136120191
Relay_Log_File: mysql_sandbox18676-relay-bin.000006
Relay_Log_Pos: 136120097
Relay_Master_Log_File: mysql-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t1 values (1)'
Skip_Counter: 0
Exec_Master_Log_Pos: 136119934
Relay_Log_Space: 136120781
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '1' for key 'PRIMARY'' on query. Default database: 'test'. Query: 'insert into t1 values (1)'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: c3054160-7b4a-11e2-a17e-6c626da07446
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 130220 11:49:56
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: c3054160-7b4a-11e2-a17e-6c626da07446:1-386581
Executed_Gtid_Set: c3054160-7b4a-11e2-a17e-6c626da07446:1-386580,
c90226d2-7b4a-11e2-a17e-6c626da07446:1
Auto_Position: 0

In this situation, we try the old remedy of skipping a transaction.


set global sql_slave_skip_counter=1;
ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction

Oops! Now I have two problems:

  • How do I generate an empty transaction with a given GTID?
  • Which is the given GTID? I get several ones:
    show global variables like '%gtid_executed%'\G
    *************************** 1. row ***************************
    Variable_name: gtid_executed
    Value: c3054160-7b4a-11e2-a17e-6c626da07446:1-386580,
    c90226d2-7b4a-11e2-a17e-6c626da07446:1

I found this manual page about skipping transactions, but I don't get where I should run these instructions. I tried running them in the slave, but the replication error remains in place. I guess we need a clearer example that explains how to fix the error, because this surely does not look like a solution.

Let's try the same issue with Tungsten.


$ trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : NONE
appliedLastSeqno : -1
appliedLatency : -1.0
channels : -1
clusterName : default
currentEventId : NONE
currentTimeMillis : 1361356976587
dataServerHost : 127.0.0.1
extensions :
latestEpochNumber : -1
masterConnectUri : thl://127.0.0.1:12110/
masterListenUri : thl://127.0.0.1:12111/
maximumStoredSeqNo : -1
minimumStoredSeqNo : -1
offlineRequests : NONE
pendingError : Event application failed: seqno=444898 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
pendingErrorCode : NONE
pendingErrorEventId : mysql-bin.000002:0000000116727315;0
pendingErrorSeqno : 444898
pendingExceptionMessage: java.sql.SQLException: Statement failed on slave but succeeded on master
insert into t1 values (1) /* ___SERVICE___ = [tsandbox] */
pipelineSource : UNKNOWN
relativeLatency : -1.0
resourcePrecedence : 99
rmiPort : 10102
role : slave
seqnoType : java.lang.Long
serviceName : tsandbox
serviceType : unknown
simpleServiceName : tsandbox
siteName : default
sourceId : 127.0.0.1
state : OFFLINE:ERROR
timeInStateSeconds : 18.747
uptimeSeconds : 923.323
version : Tungsten Replicator 2.0.7 build 278
Finished status command...

The error tells me that GTID 444898 is at fault. The remedy that I would apply in this case is the same that I would do anyway, regardless of parallel replication.


$ trepctl online -skip-seqno 444898

$ trepctl status
Processing status command...
NAME VALUE
---- -----
appliedLastEventId : mysql-bin.000002:0000000116726621;0
appliedLastSeqno : 444893
appliedLatency : 0.0
channels : 5
clusterName : default
currentEventId : NONE
currentTimeMillis : 1361358074668
dataServerHost : 127.0.0.1
extensions :
latestEpochNumber : 0
masterConnectUri : thl://127.0.0.1:12110/
masterListenUri : thl://127.0.0.1:12111/
maximumStoredSeqNo : 444898
minimumStoredSeqNo : 0
offlineRequests : NONE
pendingError : NONE
pendingErrorCode : NONE
pendingErrorEventId : NONE
pendingErrorSeqno : -1
pendingExceptionMessage: NONE
pipelineSource : thl://127.0.0.1:12110/
relativeLatency : 1599.668
resourcePrecedence : 99
rmiPort : 10102
role : slave
seqnoType : java.lang.Long
serviceName : tsandbox
serviceType : local
simpleServiceName : tsandbox
siteName : default
sourceId : 127.0.0.1
state : ONLINE
timeInStateSeconds : 2.076
uptimeSeconds : 2021.404
version : Tungsten Replicator 2.0.7 build 278
Finished status command...

And we are back in business. Since GTID in Tungsten are well integrated in every aspect of the operations, I can use them to fix problems.

More loose ends

What we've seen above shows that the integration between existing replication features and the new ones is less than desirable. There are more issues that may come to the user's attention. One that struck me as very odd is Replicate_Ignore_Server_Ids

This value is found in 'SHOW SLAVE STATUS', and it designates the list of servers that were excluded from replication. This value is associated to the option 'IGNORE_SERVER_IDS' that you can give in CHANGE MASTER TO. What's odd with this value? It is that it only accepts server-IDs, not server-UUIDs. If you try to use a server-UUID in CHANGE MASTER TO, you get an error.

However, the reason for mentioning this variable is that, just by chance, I was able to "fix" the above problem, instead of using an empty transaction. I did the following:


stop slave;
change master to ignore_server_ids=(1);
start slave;
show slave status\G
# no errors
stop slave;
change master to ignore_server_ids=();
start slave;

And we have skipped a transaction in an undocumented way. Bug or feature?

Votes:

You must be logged in with a MySQL account to vote on Planet MySQL entries. More information on PlanetMySQL voting.

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.