the documentation has gone swimming.
In this section we will orient ourselves. Below is a hand-traced cross-section of the entire tutorial, drawn as a coral-reef wall. Each label is a module; click one to swim there. This schematic is the only navigation on miris.xyz.
Hover a node for a guide-line; click to tide-swing down. — fig. 1, reef cross-section
Before the dive we check our gear. miris.xyz expects four tools on your slate. Each is a creature card — read the name lettered along its dorsal fin.
HTTP transport for fetching the bootstrap script and probing endpoints. v7.74+ — verify with curl --version.
PostgreSQL 15 client & server. The whole tutorial deploys a streaming replica — without psql you are diving without a regulator.
Container runtime 24.0+. The replica node runs in a container so we can sink and re-float it at will. docker compose version should answer.
JSON wrangler 1.6+. Needed to read the cluster status payloads in §7. Optional but recommended — jq --version.
← if psql is missing on macOS, brew install libpq and link it; do not fight Homebrew about this.
docker desktop counts. so does colima. so does podman with the docker shim. the tutorial does not care which raft you're on.
← this is the part that always bites you: $PRIMARY must be reachable from inside the container, not just from your laptop.
surface tip: run the whole block in a fresh shell. stale env vars are the silt at the bottom of every failed deploy.
In this section we will configure the primary node to accept a replica. We do this once. Everything afterward is just water flowing through the channel we open here.
Open the primary's configuration and widen its listen address. The replica lives in a container; the loopback alone will not reach it.
# /etc/postgresql/15/main/postgresql.conf
listen_addresses = '*'
wal_level = replica
max_wal_senders = 8
wal_keep_size = 512MB
Grant a replication role and tell pg_hba.conf to trust the container subnet. Reload — no restart needed for these.
# on the primary
sudo -u postgres psql -c "CREATE ROLE replicant WITH REPLICATION LOGIN PASSWORD 'tide';"
echo "host replication replicant 172.28.0.0/16 scram-sha-256" \
| sudo tee -a /etc/postgresql/15/main/pg_hba.conf
sudo systemctl reload postgresql
Confirm the primary is listening and a slot can be created. If this answers, the channel is open. We will swim through it in §6.
psql -h "$PRIMARY" -U replicant -c "SELECT pg_create_physical_replication_slot('replica_a');"
# expected output:
# pg_create_physical_replication_slot
# ------------------------------------
# (replica_a,)
Caution. scram-sha-256 in pg_hba.conf requires the role's password to be stored as SCRAM. If you created the role under an older password_encryption, re-run \password replicant after switching the setting.
slot lag is shown in pg_replication_slots.restart_lsn vs the primary's current LSN — keep that gap small or WAL piles up like driftwood.
In this section we will read a database schema the way a marine biologist reads a tide pool: rows are sand-bars, columns are currents, foreign keys are the channels water takes between pools at the turn of the tide.
← the events table is the deepest pool; everything drains toward it.
The replica we are building does not own these tables — it mirrors them. Still, you should know the floor you are walking. Here is the logbook schema rendered as a plan-view of three connected pools.
The replica subscribes to the WAL stream and replays every insert into events within milliseconds. From the apprentice's perspective the second pool is simply always full.
-- the floor of the logbook schema
CREATE TABLE divers (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
handle text NOT NULL UNIQUE,
cert_level int NOT NULL DEFAULT 1
);
CREATE TABLE dives (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
diver_id uuid REFERENCES divers(id),
site text, depth_m numeric(5,1)
);
CREATE TABLE events (
id bigserial PRIMARY KEY,
dive_id uuid REFERENCES dives(id),
kind text, at timestamptz NOT NULL DEFAULT now()
);
a tide pool only has the water the last tide left it. a read replica only has the WAL the primary last shipped it. same patience.
← do this calmly. the page got darker on purpose; so should you.
a promoted replica cannot un-promote. if you fired the trigger by mistake, you're rebuilding from the primary. that's fine. driftwood floats back.
In this section we will recover. Something has gone wrong on the primary — a bad migration, a fat-fingered DELETE. We will promote the replica to take its place, then rebuild a fresh follower. This is the rollback ritual; perform it slowly.
Stop writes to the wounded primary. Fence it so nothing new lands while you decide.
# on the primary host — fence it
sudo systemctl stop postgresql
# confirm it is down
pg_isready -h localhost # > no response; good
Promote the replica. After this it is the primary; the trigger is one-way, like the morphs on this page.
# inside the replica container
docker compose exec replica_a \
su postgres -c "pg_ctl promote -D /var/lib/postgresql/data"
# server logs: "database system is ready to accept connections"
# it is now writable. point your app's $PRIMARY here.
Re-float a new follower from the new primary using pg_basebackup. The dive continues with a fresh tank.
pg_basebackup -h "$NEW_PRIMARY" -U replicant -D /var/lib/postgresql/data \
-R -X stream -C -S replica_b --checkpoint=fast
# -R writes standby.signal + primary_conninfo for you
# start it; it will catch up from the WAL stream
Caution. Never re-attach the old primary as a follower without pg_rewind — its timeline diverged the moment it accepted any write the replica never saw. Rewind first, or rebuild from base. There is no third option.
the abyss stratum is the only place miris.xyz lets the lights down. it is not a warning color. it is a depth.
In this section we will release a change to production by letting it migrate through the cluster the way a school of fish crosses a thermocline — all together, in one shimmering motion, with no individual left behind.
← the trick of a zero-downtime deploy is that nothing is ever "switched". traffic just drifts.
Run the migration on the primary while both versions of the app can read the schema. Additive changes only here — new columns, new tables, never a DROP yet.
# expand phase — safe with old + new code running
psql -h "$PRIMARY" -f migrations/0042_add_dive_buddy.up.sql
# the replica replays it within a heartbeat:
psql -h "$REPLICA" -c "\d dives" | grep buddy_id # > present
Drift traffic to the new version. The load balancer shifts weight gradually — like the school crossing the thermocline, no abrupt cut-over.
# gateway weights — applied every 90s by the deploy script
routes:
- service: logbook-api
splits:
- target: v41
weight: 90 # → 60 → 30 → 0
- target: v42
weight: 10 # → 40 → 70 → 100
Once 100% of traffic is on the new version and stable, run the contract migration that drops the old column. Now — and only now — is it safe.
# contract phase — only after the whole school has crossed
psql -h "$PRIMARY" -f migrations/0042_add_dive_buddy.cleanup.sql
# tag the release; the migration has landed
git tag -a v42 -m "dive buddy column live, old path drained"
expand · migrate · contract. three tides. never one. a deploy that "switches" is a deploy that breaks somebody mid-request.
In this section we will set out buoys. A buoy does not steer the boat — it tells you where the boat is. These are the three signals worth bobbing on the surface for: replication lag, connection saturation, and WAL retention.
← if lag climbs past your statement_timeout, the replica is no longer a useful read source. cut to primary.
-- on the replica
SELECT now() - pg_last_xact_replay_timestamp()
AS replica_lag;
-- both nodes
SELECT count(*) * 100
/ current_setting('max_connections')::int
FROM pg_stat_activity;
-- on the primary
SELECT pg_size_pretty(pg_wal_lsn_diff(
pg_current_wal_lsn(), restart_lsn))
FROM pg_replication_slots;
Wire all three into whatever you already run — Prometheus, a cron that pages you, a sticky note. The buoy does not care. It just needs to bob somewhere you will see it.
three buoys is the right number. one and you're blind to a class of failure. ten and you stop reading them. three you can hold in your head while you dive.
A shoal of footnotes — scattered pebbles, each a reference you may want when the tide goes out. Pick up the ones you need; leave the rest on the sand.
postgresql.conf WAL settings reference.
2pg_hba.conf — host-based authentication and the replication pseudo-database.
3Logical vs. physical replication slots — when each is the right channel.
4pg_rewind — re-attaching a diverged ex-primary without a full base backup.
5pg_ctl promote & standby.signal — the one-way trigger.
6Expand / migrate / contract — the three-phase schema change pattern.
7pg_basebackup -R — auto-writing primary_conninfo for a fresh follower.
8pg_stat_replication & pg_replication_slots — the observable surface.
9pg_last_xact_replay_timestamp() — measuring replica lag in seconds.
10Container networking for Postgres — keeping $PRIMARY reachable from inside.
surface slowly. you may pop your ears.
— end of the manual. the replica is following; the buoys are bobbing.
surface