Archive · Oleg Dulin

Configuring Master-Slave Replication With PostgreSQL

January 31, 2015

[caption id="attachment_222" align="aligncenter" width="300"]

Master/slave light rail[/caption]

Having configured PostgreSQL 9.3 master/slave replication from bits and pieces of documentation on the Internet, I feel like a post is in order documenting my experience for others.

Installing PostgreSQL

I am not going to get into too much detail here. Installation instructions are abundant and are slightly different by operating system. I am assuming you are already past this stage.

Prepare Master

Before we can enable replication, we need to lay some ground work. First, we need a replication user. On the “master” host, log on as the user running postgres (typically postgres) and do this:

psql -c "CREATE USER replication REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'replication';"

You may select a more secure password if you wish. Now edit pg_hba.conf:

vi pg_hba.conf

Add the following lines:

host replication replication 1.2.3.4/32 md5 # This is your master CIDR/IP host replication replication 1.2.3.5/32 md5 # This is your slave CIDR/IP

Next edit postgresql.conf:

vi /var/lib/pgsql/9.3/data/postgresql.conf

Modify the following lines to enable replication:

wal_level = hot_standby max_wal_senders = 5         # Recommended to allocate 5 per slave hot_standby = on wal_keep_segments = 1000    # Each segment is 16Megs.

Some explanation is in order. When I first configured replication, I set max_wal_senders=1 and wal_keep_segments=8. That was woefully inadequate and needless to say the slave could not keep up with the replication. After some investigation, I decided to raise that to the maximum I can afford in terms of storage and then monitor the backlog.

After you enable replication on master, it needs to be restarted.

Enabling Archiving

As per this Stack Overflow post, you may also want to configure the following:

When the slave can't pull the WAL segment directly from the master, it will attempt to use the restorecommand to load it. You can configure the slave to automatically remove segments using the archivecleanup_command setting.

# on master archive_mode = on archive_command = 'cp %p /path_to/archive/%f'   # on slave restore_command = 'cp /path_to/archive/%f "%p"'

I only learned of this feature recently and I have not explored it in great detail. Fortunately, the size of my master data set is reasonable and the network connection between master and slave is fast enough that I can rebuild a failed master or slave in a matter of minutes. For now, I am leaving exploration of archiving for later and I will update this post.

Prepare Slave

Before you proceed, the slave host must be able to connect to the master. You may want to use telnet command to make sure you can connect to the right port. I am leaving that as an exercise for the reader.

Before the slave is operational you need to replication the initial database. Make sure the PostgreSQL process is not running before you proceed as it may result in a corrupted initial backup. Log on to the slave machine as postgres user, go into the postgres data directory and execute the following:

pg_basebackup -h 1.2.3.4 -D .  --username=replication --password

This may run for some time depending on the size of your data set. Rember how we agreed above that 1.2.3.4 is your master IP, so replace it with the right numbers.

Now, let's enable replication:

vi  recovery.conf

This file probably does not exist yet, so you will create a blank file that should like following:

standby_mode = 'on' primary_conninfo = 'host=1.2.3.4 port=5432 user=replication password=replication' trigger_file = '/tmp/postgresql.trigger.5432'

Remember to replace 1.2.3.4 and replication with the IP of your master and the password you've actually chosen. trigger_file is just a file the slave will watch for. Should your master fail and you need to activate the slave as a new master, you simply need to create that file. Immediately, the slave will assume the master is dead and activate itself. We will discuss that topic in a moment.

Test replication

As postgres user on master using psql command:

CREATE TABLE TEST (test VARCHAR(40)); INSERT INTO test VALUES ('bar'); INSERT INTO test VALUES ('baz'); INSERT INTO test VALUES ('bat');

As postgres user on slave using psql command:

SELECT * FROM TEST;

You should see bar, baz, and bat just like you inserted them on the master.

Monitoring replication

You can tell how much data is pending to be sent to the slave by running this query against the master:

SELECT application_name, client_addr, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), sent_location)) FROM pg_stat_replication;

You can tell how far behind the slave is by running this query against the slave:

select now() - pg_last_xact_replay_timestamp() AS replication_delay;

How the failover and recovery work

The way failover works is this:

Master failure is detected.

Trigger file on the slave is created. This can be automated, but there appears to be no standard mechanism for this.

The slave is now the new master.

Your application needs to be aware that the configuration has changed and must switch to use the new master. Unfortunately, standard JDBC driver does not support this. I personally think that failover is a sensitive activity and there is no generic scenario for this.

Your old failed master is now the new slave. You must rebuild it by performing pg_basebackup as if you are configuring a new slave.

Resources

PostgreSQL 9.3 Replication

How do I fix a PostgreSQL 9.3 slave that cannot keep up with its master?

What is a process for recovering a failed master from a slave with PostgreSQL?

The Dulin Report

Results (54)