Tuesday, 20 December 2016

Traveling in time(lines) with pg_rewind

At 9.5 pg_rewind was introduced, with this feature it was possible to make a server that is no longer master to follow a promoted standby that has a new timeline. 
There are at least a few use cases that could benefit from this feature, to name a few :

Testing Failovers : Its pretty common for one of customers to ask for a failover simulation just to make sure that everything works, by using pg_rewind this is much easier and much faster because we don't have to re-sync the whole database that sometimes is multi-TB sized. 

Minor version upgrades : during an upgrade you have to put the new binaries in place and restart the database. By using pg_rewind if the application is flexible enough , you could upgrade the slave , switchover , upgrade the master and pg_rewind the old master to follow the new slave, possibly minimising even more the downtime.


pg_rewind has a couple of requirements :

1. data page checksums enabled (initdb -k)
2. parameter wal_log_hints has to be enabled 
3. The old-master (the one we want to re-sync) to be properly shutdown or you'll get: 
target server must be shut down cleanly 
Failure, exiting

How does it work ?

pg_rewind searches the old-master’s data directory, finds the data blocks changed during the switchover and then copies only the nesessary blocks from the promoted slave. Keep in mind that the configuration files will also be copied from the new master so take a look for mis-configurations before you start it ! Also you have to have all the wal files since the last checkpoint  from the old master. Changes are identified by comparing the state of the data blocks present in the data directory with the changes logged in the wal files. When the deferrable blocks are identified, the wals are replayed.
For more info refer to the documentation here

How to do it

Recently i've posted about how to use Docker to setup a lab environment, the link can be found here. I'm going to use two containers for this blogpost using the image created from the dockerfile that i have. Keep in mind that in my Dockerfile, i make sure that initdb is run with -k option, that all the parameters are properly set for replication and that archiving is enabled but not doing anything : 

psql -c "show archive_command";
(1 row)

before i setup my slave, i create a wal_archive directory and i change the archive_command to simply archive there :
mkdir ~/wal_archive
psql -c "alter system set archive_command to 'cp %p /home/postgres/wal_archive/%f' ;" postgres
psql -c "select pg_reload_conf() " postgres
(1 row)
psql -c "select pg_switch_xlog() ;" postgres
(1 row)
psql -c "checkpoint;" postgres

now , from the slave :

mkdir ~/wal_archive ; ~/mk_replica.sh
waiting for server to shut down.... done
server stopped
29284/29284 kB (100%), 1/1 tablespace
server starting
LOG:  redirecting log output to logging collector process
HINT:  Future log output will appear in directory "pg_log".
-[ RECORD 1 ]----+--------
lag_bytes        | 0
pid              | 106
application_name | a_slave

Next , we promote the slave :
pg_ctl promote ; sleep 5 ; psql -c "select pg_is_in_recovery();" postgres
server promoting
(1 row)

And now, time to pg_rewind, from the old master :

pg_ctl stop
waiting for server to shut down.... done
server stopped
pg_rewind --target-pgdata=/home/postgres/pgdata --source-server="host= port=5432 user=postgres dbname=postgres"

servers diverged at WAL position 0/F000098 on timeline 1
rewinding from last common checkpoint at 0/F000028 on timeline 1

put a recovery.conf on the rewinded master (now slave) and start it up, check that it got connected to the new master and you should be all good.

postgres=# SELECT pg_xlog_location_diff(pg_current_xlog_insert_location(), flush_location) AS lag_bytes,pid, application_name FROM pg_stat_replication;
 lag_bytes | pid | application_name
         0 | 139 | old_master

Most of the production database clusters that i've come across don't use database checksums which is understandable , because of the performance penalty that checksums have, (as an alternative to checksums, in OmniTI we use ZFS on OmniOS A LOT, and we love it) 
 still pg_rewind is a nice feature , maybe its not for every database but for those who will use it, it might save a ton of time of waiting to resync an old master.

Thanks for reading
-- Vasilis Ventirozos


  1. This comment has been removed by a blog administrator.

  2. Most people traveling to the country for the first time are unaware of details regarding visa and other legal documents. But now it is absolutely easy to extract Vietnam visa information from the internet. Homepage