Archive

The Dulin Report

Browsable archive from the WordPress export.

Results (54)

On the role of Distinguished Engineer and CTO Mindset Apr 27, 2025 Software Engineering is here to stay Mar 3, 2024 Some thoughts on recent RTO announcements Jun 22, 2023 Some thoughts on the latest LastPass fiasco Mar 5, 2023 Working from home works as well as any distributed team Nov 25, 2022 If we stop feeding the monster, the monster will die Nov 20, 2022 Why I am a poll worker since 2020 Nov 11, 2022 Using GNU Make with JavaScript and Node.js to build AWS Lambda functions Sep 4, 2022 Scripting languages are tools for tying APIs together, not building complex systems Jun 8, 2022 Automation and coding tools for pet projects on the Apple hardware May 28, 2022 Am I getting old or is it really ok now to trash your employer on social media? May 25, 2022 Peloton could monetize these ideas if they only listen May 15, 2022 Most terrifying professional artifact May 14, 2022 Good idea fairy strikes when you least expect it May 2, 2022 A year of COVID taught us all how to work remotely Feb 10, 2021 Should we abolish Section 230 ? Feb 1, 2021 This year I endorse Joe Biden for President Aug 26, 2020 Making the best of remote work - Coronavirus blues Mar 16, 2020 The passwords are no longer a necessity. Let’s find a good alternative. Mar 2, 2020 All emails are free -- except they are not Feb 9, 2019 Returning security back to the user Feb 2, 2019 Which AWS messaging and queuing service to use? Jan 25, 2019 Using Markov Chain Generator to create Donald Trump's state of union speech Jan 20, 2019 Adobe Creative Cloud is an example of iPad replacing a laptop Jan 3, 2019 A conservative version of Facebook? Aug 30, 2018 Fixing the Information Marketplace Aug 26, 2018 On Facebook and Twitter censorship Aug 20, 2018 What does a Chief Software Architect do? Jun 23, 2018 Facebook is the new Microsoft Apr 14, 2018 Quick guide to Internet privacy for families Apr 7, 2018 Leaving Facebook and Twitter: here are the alternatives Mar 25, 2018 When politics and technology intersect Mar 24, 2018 The technology publishing industry needs to transform in order to survive Jun 30, 2017 Architecting API ecosystems: my interview with Anthony Brovchenko of R. Culturi Jun 5, 2017 Don't trust your cloud service until you've read the terms Sep 27, 2016 I am addicted to Medium, and I am tempted to move my entire blog to it Sep 9, 2016 Amazon Alexa is eating the retailers alive Jun 22, 2016 In search for the mythical neutrality among top-tier public cloud providers Jun 18, 2016 In Support Of Gary Johnson Jun 13, 2016 LinkedIn needs a reset Feb 13, 2016 In memory of Ed Yourdon Jan 23, 2016 We Live in a Mobile Device Notification Hell Aug 22, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 On Maintaining Personal Brand as a Software Engineer Aug 2, 2015 Social Media Detox Jul 11, 2015 Book Review: "Shop Class As Soulcraft" By Matthew B. Crawford Jul 5, 2015 We Need a Cloud Version of Cassandra May 7, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 On apprenticeship Feb 13, 2015 Configuring Master-Slave Replication With PostgreSQL Jan 31, 2015 Cassandra: Lessons Learned Jun 6, 2014 Thoughts on Wall Street Technology Aug 11, 2012 Scripting News: After X years programming Jun 5, 2012

Configuring Master-Slave Replication With PostgreSQL

January 31, 2015

[caption id="attachment_222" align="aligncenter" width="300"]Master/slave light rail Master/slave light rail[/caption]

Having configured PostgreSQL 9.3 master/slave replication from bits and pieces of documentation on the Internet, I feel like a post is in order documenting my experience for others.

Installing PostgreSQL


I am not going to get into too much detail here. Installation instructions are abundant and are slightly different by operating system. I am assuming you are already past this stage.

Prepare Master


Before we can enable replication, we need to lay some ground work. First, we need a replication user. On the “master” host, log on as the user running postgres (typically postgres) and do this:
psql -c "CREATE USER replication REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'replication';" 

You may select a more secure password if you wish. Now edit pg_hba.conf:
vi pg_hba.conf 

Add the following lines:
host replication replication 1.2.3.4/32 md5 # This is your master CIDR/IP host replication replication 1.2.3.5/32 md5 # This is your slave CIDR/IP 

Next edit postgresql.conf:
vi /var/lib/pgsql/9.3/data/postgresql.conf 

Modify the following lines to enable replication:
wal_level = hot_standby max_wal_senders = 5         # Recommended to allocate 5 per slave hot_standby = on wal_keep_segments = 1000    # Each segment is 16Megs.  

Some explanation is in order. When I first configured replication, I set max_wal_senders=1 and wal_keep_segments=8. That was woefully inadequate and needless to say the slave could not keep up with the replication. After some investigation, I decided to raise that to the maximum I can afford in terms of storage and then monitor the backlog.

After you enable replication on master, it needs to be restarted.

Enabling Archiving


As per this Stack Overflow post, you may also want to configure the following:
When the slave can't pull the WAL segment directly from the master, it will attempt to use the restorecommand to load it. You can configure the slave to automatically remove segments using the archivecleanup_command setting.

# on master archive_mode = on archive_command = 'cp %p /path_to/archive/%f'   # on slave restore_command = 'cp /path_to/archive/%f "%p"'  

I only learned of this feature recently and I have not explored it in great detail. Fortunately, the size of my master data set is reasonable and the network connection between master and slave is fast enough that I can rebuild a failed master or slave in a matter of minutes. For now, I am leaving exploration of archiving for later and I will update this post.

Prepare Slave


Before you proceed, the slave host must be able to connect to the master. You may want to use telnet command to make sure you can connect to the right port. I am leaving that as an exercise for the reader.

Before the slave is operational you need to replication the initial database. Make sure the PostgreSQL process is not running before you proceed as it may result in a corrupted initial backup. Log on to the slave machine as postgres user, go into the postgres data directory and execute the following:
pg_basebackup -h 1.2.3.4 -D .  --username=replication --password 

This may run for some time depending on the size of your data set. Rember how we agreed above that 1.2.3.4 is your master IP, so replace it with the right numbers.

Now, let's enable replication:
vi  recovery.conf 

This file probably does not exist yet, so you will create a blank file that should like following:
standby_mode = 'on' primary_conninfo = 'host=1.2.3.4 port=5432 user=replication password=replication' trigger_file = '/tmp/postgresql.trigger.5432' 

Remember to replace 1.2.3.4 and replication with the IP of your master and the password you've actually chosen. trigger_file is just a file the slave will watch for. Should your master fail and you need to activate the slave as a new master, you simply need to create that file. Immediately, the slave will assume the master is dead and activate itself. We will discuss that topic in a moment.

Test replication


As postgres user on master using psql command:
CREATE TABLE TEST (test VARCHAR(40)); INSERT INTO test VALUES ('bar'); INSERT INTO test VALUES ('baz'); INSERT INTO test VALUES ('bat'); 

As postgres user on slave using psql command:
SELECT * FROM TEST; 

You should see bar, baz, and bat just like you inserted them on the master.

Monitoring replication


You can tell how much data is pending to be sent to the slave by running this query against the master:
SELECT application_name, client_addr, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), sent_location)) FROM pg_stat_replication; 

You can tell how far behind the slave is by running this query against the slave:
select now() - pg_last_xact_replay_timestamp() AS replication_delay; 

How the failover and recovery work


The way failover works is this:

  1. Master failure is detected.

  2. Trigger file on the slave is created. This can be automated, but there appears to be no standard mechanism for this.

  3. The slave is now the new master.

  4. Your application needs to be aware that the configuration has changed and must switch to use the new master. Unfortunately, standard JDBC driver does not support this. I personally think that failover is a sensitive activity and there is no generic scenario for this.

  5. Your old failed master is now the new slave. You must rebuild it by performing pg_basebackup as if you are configuring a new slave.


Resources