SQL Zone is brought to you in partnership with:

I am an architect at Kayak.com working mainly with Java. I focus on performance problems at work and occasionally on computer graphics as a hobby when I find the time. Derek has posted 5 posts at DZone. View Full User Profile

A Better MySQL Replication Heartbeat

02.05.2009
| 19597 views |
  • submit to reddit

If you’ve used MySQL replication you’ve probably discovered that slave machines can lag behind the master. Replication can also break completely, requiring hours (or days) for the slave hours to catch up. Monitoring is required to catch issues before the slaves get too far behind.

Jeremy Zawodny has suggested a heartbeat mechanism to monitor the delay between the master and the slave. (I’m not sure if he came up with this solution). His suggestion is to periodically insert a row into a heartbeat table on the master. Then you poll the table on the slave, waiting for the row to appear. The length of time you spend polling is a rough estimate for how far behind the slave is at that moment.

There are a few problems with this solution. Your have to write code to poll the slave. If you poll very frequently (every second) you’ll be polling too often if replication is actually hours behind. When do you stop polling? If you poll less frequently (every minute) your estimate gets that much less accurate. You also have to poll every slave if there are more than one.

A new solution

You can get MySQL to do the hard work for use by taking advantage of the difference in behavior between SYSDATE and CURRENT_TIMESTAMP. In almost all cases when a slave runs a SQL statement it temporarily sets the “current time” to the time the statement was executed on the master. If you insert NOW at 12:00:04 on the master the row will hold exactly 12:00:04 on the slave, not matter when it’s run. However, the SYSDATE function does not follow this behavior. It always uses the value of the slave’s system clock.

If you insert a row with one column holding the value of NOW or CURRENT_TIMESTAMP and the other holding the value of SYSDATE into the master, you can use the difference between the two values on the slave to see how far behind it is. If the slave is in sync the two values will be identical. If the slave is one second behind the column holding SYSDATE will be one second ahead of the column holding NOW. No polling is required to determine the current lag.

Implementation

First, create the heartbeat table on the master. master_time wil hold the time the row was inserted on the master. slave_time will hold the time was inserted on the slave.

create table heartbeat( master_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP, slave_time TIMESTAMP NOT NULL ) ENGINE=MyISAM;

Periodically (I do it every minute), insert a row into the heartbeat table on the master.

insert into heartbeat(slave_time) values(SYSDATE());

To see the current replication lag, at any time calculate the difference between the current time and the time the most recent row was inserted on the master. (This estimate can be off by up to one heartbeat period). This query is run on a slave.

select timediff(NOW(), max(master_time)) from heartbeat;

You can see how the replication delay changed over time by selecting all rows within a range. This example shows delay for every minute of the current day. The delays are accurate to within 1 second (the max resolution of MySQL).

select master_time, timediff(slave_time, master_time) from heartbeat where DATE(master_time) = DATE(NOW()) order by master_time;

From http://dmy999.com/

Published at DZone with permission of its author, Derek Young.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Dino VV replied on Thu, 2009/02/05 - 1:40pm

Why not use "show slave status" on the slave and extract Seconds_Behind_Master? (Actually there are other fields in this command to figure out problems too). Is "show slave status" unreliable?

Derek Young replied on Thu, 2009/02/05 - 3:02pm in response to: Dino VV

Yes - it can be unreliable.  It shows the difference in time between MySQL's I/O thread and its SQL thread.  If there is a network problem that makes it take a long time for the I/O thread to get the replication log then the Seconds_Behind_Master can be artificially low.  The statement takes a long time to get from the master to the slave but then is quickly written after it gets to the slave.  Searching for Seconds_Behind_Master on google there are a bunch of articles pointing out situations where it's not reliable.  The other benefit of using a heartbeat is that you get an automatic history of the delay.

Dino VV replied on Thu, 2009/02/05 - 3:20pm

From the manual pages:

http://dev.mysql.com/doc/refman/5.0/en/show-slave-status.html
http://dev.mysql.com/doc/refman/5.0/en/replication-administration-status.html

 Says that the  Seconds_Behind_Master is reliable for fast networks. 

We never had problems using "show slave status" since our slaves are on the fast network. However, the above is interesting if we run into issues like this. Actually we check the position of the relay files as well which is more accurate for us. 

 

Derek Young replied on Thu, 2009/02/05 - 3:29pm in response to: Dino VV

 We have a mix of fast and slow network connections.  In our production environment the networks are fast.  However we have some slaves in our office as well that connect over OpenVPN through the Internet to the production databases.  In this case they are actually slaves of slaves of the master.  With the heartbeat row you can see the total delay between the master and the second slave.  We used to use log positions to do estimation too, but small slow statements (alter table) mixed with large fast statements (inserts) makes the log position hard to use.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.