Archive

The Dulin Report

Browsable archive from the WordPress export.

2015

On Managing Stress, Multitasking and Other New Year's Resolutions Jan 1, 2015 Configuring Master-Slave Replication With PostgreSQL Jan 31, 2015 Trying to Replace Cassandra with DynamoDB ? Not so fast Feb 2, 2015 On apprenticeship Feb 13, 2015 Where AWS Elastic BeanStalk Could be Better Mar 3, 2015 Finding Unused Elastic Load Balancers Mar 24, 2015 Do not apply data science methods without understanding them Mar 25, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Two developers choose to take a class Apr 1, 2015 What can Evernote Teach Us About Enterprise App Architecture Apr 2, 2015 Exploration of the Software Engineering as a Profession Apr 8, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Apple is (or was) the Biggest User of Apache Cassandra Apr 23, 2015 My Brief Affair With Android Apr 25, 2015 Why I am not Getting an Apple Watch For Now: Or Ever Apr 26, 2015 The Clarkson School Class of 2015 Commencement May 5, 2015 The Clarkson School Class of 2015 Commencement speech May 5, 2015 We Need a Cloud Version of Cassandra May 7, 2015 Guaranteeing Delivery of Messages with AWS SQS May 9, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 Big Data is not all about Hadoop May 30, 2015 The longer the chain of responsibility the less likely there is anyone in the hierarchy who can actually accept it Jun 7, 2015 Your IT Department's Kodak Moment Jun 17, 2015 Attracting STEM Graduates to Traditional Enterprise IT Jul 4, 2015 Book Review: "Shop Class As Soulcraft" By Matthew B. Crawford Jul 5, 2015 The Three Myths About JavaScript Simplicity Jul 10, 2015 Social Media Detox Jul 11, 2015 Big Data Should Be Used To Make Ads More Relevant Jul 29, 2015 On Maintaining Personal Brand as a Software Engineer Aug 2, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 We Live in a Mobile Device Notification Hell Aug 22, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 Setting Up Cross-Region Replication of AWS RDS for PostgreSQL Sep 12, 2015 I Stand With Ahmed Sep 19, 2015 Banking Technology is in Dire Need of Standartization and Openness Sep 28, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 Operations costs are the Achille's heel of NoSQL Nov 23, 2015 Our civilization has a single point of failure Dec 16, 2015

Configuring Master-Slave Replication With PostgreSQL

January 31, 2015

[caption id="attachment_222" align="aligncenter" width="300"]Master/slave light rail Master/slave light rail[/caption]

Having configured PostgreSQL 9.3 master/slave replication from bits and pieces of documentation on the Internet, I feel like a post is in order documenting my experience for others.

Installing PostgreSQL


I am not going to get into too much detail here. Installation instructions are abundant and are slightly different by operating system. I am assuming you are already past this stage.

Prepare Master


Before we can enable replication, we need to lay some ground work. First, we need a replication user. On the “master” host, log on as the user running postgres (typically postgres) and do this:
psql -c "CREATE USER replication REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'replication';" 

You may select a more secure password if you wish. Now edit pg_hba.conf:
vi pg_hba.conf 

Add the following lines:
host replication replication 1.2.3.4/32 md5 # This is your master CIDR/IP host replication replication 1.2.3.5/32 md5 # This is your slave CIDR/IP 

Next edit postgresql.conf:
vi /var/lib/pgsql/9.3/data/postgresql.conf 

Modify the following lines to enable replication:
wal_level = hot_standby max_wal_senders = 5         # Recommended to allocate 5 per slave hot_standby = on wal_keep_segments = 1000    # Each segment is 16Megs.  

Some explanation is in order. When I first configured replication, I set max_wal_senders=1 and wal_keep_segments=8. That was woefully inadequate and needless to say the slave could not keep up with the replication. After some investigation, I decided to raise that to the maximum I can afford in terms of storage and then monitor the backlog.

After you enable replication on master, it needs to be restarted.

Enabling Archiving


As per this Stack Overflow post, you may also want to configure the following:
When the slave can't pull the WAL segment directly from the master, it will attempt to use the restorecommand to load it. You can configure the slave to automatically remove segments using the archivecleanup_command setting.

# on master archive_mode = on archive_command = 'cp %p /path_to/archive/%f'   # on slave restore_command = 'cp /path_to/archive/%f "%p"'  

I only learned of this feature recently and I have not explored it in great detail. Fortunately, the size of my master data set is reasonable and the network connection between master and slave is fast enough that I can rebuild a failed master or slave in a matter of minutes. For now, I am leaving exploration of archiving for later and I will update this post.

Prepare Slave


Before you proceed, the slave host must be able to connect to the master. You may want to use telnet command to make sure you can connect to the right port. I am leaving that as an exercise for the reader.

Before the slave is operational you need to replication the initial database. Make sure the PostgreSQL process is not running before you proceed as it may result in a corrupted initial backup. Log on to the slave machine as postgres user, go into the postgres data directory and execute the following:
pg_basebackup -h 1.2.3.4 -D .  --username=replication --password 

This may run for some time depending on the size of your data set. Rember how we agreed above that 1.2.3.4 is your master IP, so replace it with the right numbers.

Now, let's enable replication:
vi  recovery.conf 

This file probably does not exist yet, so you will create a blank file that should like following:
standby_mode = 'on' primary_conninfo = 'host=1.2.3.4 port=5432 user=replication password=replication' trigger_file = '/tmp/postgresql.trigger.5432' 

Remember to replace 1.2.3.4 and replication with the IP of your master and the password you've actually chosen. trigger_file is just a file the slave will watch for. Should your master fail and you need to activate the slave as a new master, you simply need to create that file. Immediately, the slave will assume the master is dead and activate itself. We will discuss that topic in a moment.

Test replication


As postgres user on master using psql command:
CREATE TABLE TEST (test VARCHAR(40)); INSERT INTO test VALUES ('bar'); INSERT INTO test VALUES ('baz'); INSERT INTO test VALUES ('bat'); 

As postgres user on slave using psql command:
SELECT * FROM TEST; 

You should see bar, baz, and bat just like you inserted them on the master.

Monitoring replication


You can tell how much data is pending to be sent to the slave by running this query against the master:
SELECT application_name, client_addr, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), sent_location)) FROM pg_stat_replication; 

You can tell how far behind the slave is by running this query against the slave:
select now() - pg_last_xact_replay_timestamp() AS replication_delay; 

How the failover and recovery work


The way failover works is this:

  1. Master failure is detected.

  2. Trigger file on the slave is created. This can be automated, but there appears to be no standard mechanism for this.

  3. The slave is now the new master.

  4. Your application needs to be aware that the configuration has changed and must switch to use the new master. Unfortunately, standard JDBC driver does not support this. I personally think that failover is a sensitive activity and there is no generic scenario for this.

  5. Your old failed master is now the new slave. You must rebuild it by performing pg_basebackup as if you are configuring a new slave.


Resources