Read‑your‑writes on replicas: PostgreSQL WAIT FOR LSN and MongoDB Causal Consistency
In databases designed for high availability and scalability, secondary nodes can fall behind the primary. Typically, a quorum of nodes is updated synchronously to guarantee durability while maintaining availability, while remaining standby instances are eventually consistent to handle partial failures. To balance availability with performance, synchronous replicas acknowledge a write only when it is durable and recoverable, even if it is not yet readable.
As a result, if your application writes data and then immediately queries another node, it may still see stale data.
Here’s a common anomaly: you commit an order on the primary and then try to retrieve it from a reporting system. The order is missing because the read replica has not yet applied the write.
PostgreSQL and MongoDB tackle this problem in different ways:
-
PostgreSQL 19 introduces a
WAIT FOR LSNcommand, allowing applications to explicitly coordinate reads after writes. -
MongoDB provides causal consistency within sessions using the
afterClusterTimeread concern.
Both approaches track when your write occurred and ensure subsequent reads observe at least that point. Let’s look at how each database does this.
PostgreSQL: WAIT FOR LSN (PG19)
PostgreSQL records every change in the Write‑Ahead Log (WAL). Each WAL record has a Log Sequence Number (LSN): a 64‑bit position, typically displayed as two hexadecimal halves such as 0/40002A0 (high/low 32 bits).
Streaming replication ships WAL records from the primary to standbys, which then:
- Write WAL records to disk
- Flush them to durable storage
- Replay them, applying changes to data files
The write position determines what can be recovered after a database crash. The flush position defines the recovery point for a compute instance failure. The replay position determines what queries can see on a standby.
WAIT FOR LSN allows a session to block until one of these points reaches a target LSN:
-
standby_write→ WAL written to disk on the standby (not yet flushed) -
standby_flush→ WAL flushed to durable storage on the standby -
standby_replay(default) → WAL replayed into data files and visible to readers -
primary_flush→ WAL flushed on the primary (useful whensynchronous_commit = offand a durability barrier is needed)
A typical flow is to write on the primary, commit, and then fetch the current WAL insert LSN:
pg19rw=*# BEGIN;
BEGIN
pg19rw=*# INSERT INTO orders VALUES (123, 'widget');
INSERT 0 1
pg19rw=*# COMMIT;
COMMIT
pg19rw=# SELECT pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/18724C0
(1 row)
That LSN is then used to block reads on a replica until it has caught up:
pg19ro=# WAIT FOR LSN '0/18724C0'
WITH (MODE 'standby_replay', TIMEOUT '2s');
This LSN‑based read‑your‑writes pattern in PostgreSQL requires extra round‑trips: capturing the LSN on the primary and explicitly waiting on the standby. For many workloads, reading from the primary is simpler and faster.
The pattern becomes valuable when expensive reads must be offloaded to replicas while still preserving read‑your‑writes semantics, or in event‑driven and CQRS designs where the LSN itself serves as a change marker for downstream consumers.
MongoDB: Causal Consistency
While PostgreSQL reasons in WAL positions, MongoDB tracks causality using oplog timestamps and a hybrid logical clock.
In a replica set, each write on the primary produces an entry in local.oplog.rs, a capped collection. These entries are rewritten to be idempotent (for example, $inc becomes $set) so they can be safely reapplied. Each entry carries a Hybrid Logical Clock (HLC) timestamp that combines physical time with a logical counter, producing a monotonically increasing cluster time. Replica set members apply oplog entries in timestamp order.
Because MongoDB allows concurrent writes, temporary “oplog holes” can appear: a write with a later timestamp may commit before another write with an earlier timestamp. A naïve reader scanning the oplog could skip the earlier operation.
MongoDB prevents this by tracking an oplogReadTimestamp, the highest hole‑free point in the oplog. Secondaries are prevented from reading past this point until all prior operations are visible, ensuring causal consistency even in the presence of concurrent commits.
Causal consistency in MongoDB is enforced by attaching an afterClusterTime to reads:
- Drivers track the
operationTimeof the last operation in a session. - When a session is created with
causalConsistency: true, the driver automatically includes anafterClusterTimeequal to the highest known cluster time on subsequent reads. - The server blocks the read until its cluster time has advanced beyond
afterClusterTime.
With any read preference that allows reading from secondaries as well as the primary, this guarantees read‑your‑writes behavior:
// Start a causally consistent session
const session = client.startSession({ causalConsistency: true });
const coll = db.collection("orders");
// Write in this session
await coll.insertOne({ id: 123, product: "widget" }, { session });
// The driver automatically injects afterClusterTime into the read concern
const order = await coll.findOne({ id: 123 }, { session });
Causal consistency is not limited to snapshot reads. It applies across read concern levels. The key point is that the session ensures later reads observe at least the effects of earlier writes, regardless of which replica serves the read.
Conclusion
Here is a simplified comparison:
| Feature | PostgreSQL WAIT FOR LSN
|
MongoDB Causal Consistency |
|---|---|---|
| Clock type | Physical byte offset in the WAL (LSN) | Hybrid Logical Clock (HLC) |
| Mechanism | Block until replay/write/flush LSN reached | Block until afterClusterTime is visible |
| Tracking | Application captures LSN | Driver tracks operationTime
|
| Granularity | WAL record position | Oplog timestamp |
| Replication model | Physical streaming | Logical oplog application |
| Hole handling | N/A (serialized WAL) | oplogReadTimestamp |
| Failover handling | Error unless NO_THROW
|
Session continues, bounded by replication state |
Both PostgreSQL’s WAIT FOR LSN and MongoDB’s causal consistency ensure reads can observe prior writes, but at different layers:
- PostgreSQL offers manual, WAL‑level precision.
- MongoDB provides automatic, session‑level guarantees.
If you want read‑your‑writes semantics to “just work” without additional coordination calls, MongoDB’s session‑based model is a strong fit. Despite persistent myths about consistency, MongoDB delivers strong consistency in a horizontally scalable system with a simple developer experience.