max_allowed_packet and binary log corruption in MySQL

The combination of max_allowed_packet variable and replication in MySQL is a common source of headaches. In a nutshell, max_allowed_packet is the maximum size of a MySQL network protocol packet that the server can create or read. It has a default value of 1MB (<= 5.6.5) or 4MB (>= 5.6.6) and a maximum size of 1GB. This adds some constraints in our replication environment:

  • The master server shouldn’t write events to the binary log larger than max_allowed_packet
  • All the slaves in the replication chain should have the same max_allowed_packet as the master server

Sometimes, even following those two basic rules we can have problems.

For example, there are situations (also called bugs) where the master writes more data than the max_allowed_packet limit causing the slaves to stop working. In order to fix this Oracle created a new variable called slave_max_allowed_packet. This new configuration variable available from 5.1.64, 5.5.26 and 5.6.6 overrides the max_allowed_packet value for slave threads. Therefore, regardless of the max_allowed_packet value the slaves’ threads will have 1GB limit, the default value of slave_max_allowed_packet. Nice trick that works as expected.

Sometimes even with that workaround we can get the max_allowed_packet error in the slave servers. That means that there is a packet larger than 1GB, something that shouldn’t happen in a normal situation. Why? Usually it is caused by a binary log corruption. Let’s see the following example:

Slave stops working with the following message:

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master'

The important part is “got fatal error 1236 from master”. The master cannot read the event it wrote to the binary log seconds ago. To check the problem we can:

  • Use mysqlbinlog to read the binary log from the position it failed with –start-position.

This is an example taken from our Percona Forums:

#121003 5:22:26 server id 1 end_log_pos 398528
# Unknown event
# at 398528
#960218 6:48:44 server id 1813111337 end_log_pos 1835008
# Unknown event
ERROR: Error in Log_event::read_log_event(): 'Event too big', data_len: 1953066613, event_type: 8
DELIMITER ;
# End of log file

Check the size of the event, 1953066613 bytes. Or the “Unknown event” messages. Something is clearly wrong there. Another usual thing to check is the server id that sometimes doesn’t correspond with the real value. In this example the person who posted the binary log event confirmed that the server id was wrong.

  • Check master’s error log.
[ERROR] Error in Log_event::read_log_event(): 'Event too big', data_len: 1953066613, event_type: 8

Again, the event is bigger than expected. There is no way the master and slave can read/write it, so the solution is to skip that event in the slave and rotate the logs on the master. Then, use pt-table-checksum to check data consistency.

MySQL 5.6 includes replication checksums to avoid problems with log corruptions. You can read more about it in Stephan’s blog post.

Conclusion

Errors on slave servers about max_allowed_packet can be caused by very different reasons. Although binary log corruption is not a common one, it is something worth checking when you have run out of ideas.

The post max_allowed_packet and binary log corruption in MySQL appeared first on MySQL Performance Blog.