a curated list of database news from authoritative sources

July 18, 2025

Sequences in MongoDB

In a previous post about No-gap sequence in PostgreSQL and YugabyteDB, I mentioned that sequences in SQL databases are not transactional, which can lead to gaps. MongoDB does not require a special sequence object. A collection can be used thanks to incremental update ($inc) and returning the updated value in a single atomic operation with findOneAndUpdate().

Here is an example. I create a "sequence" collection to hold sequence numbers and a "demo" collection to insert values with an auto incremented identifier:

db.createCollection("demo");
db.createCollection("sequence");

An insert can simply fetch the next value while incrementing it:

db.demo.insertOne({
 _id: db.sequences.findOneAndUpdate(         
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'm the first"
});

db.demo.find();

[ { _id: { _id: 'demo_id', seq: 1 }, value: "I'm the first" } ]

I start two transactions:

sessionA = db.getMongo().startSession();
sessionA.startTransaction();
dbA = sessionA.getDatabase(db.getName());

sessionB = db.getMongo().startSession();
sessionB.startTransaction();
dbB = sessionB.getDatabase(db.getName());

Scalable sequence (accepting gaps on rollback)

The two transactions insert a document into "demo" with an "_id" fetched from the "sequences":

dbA.demo.insertOne({
 _id: db.sequences.findOneAndUpdate(         // non-transactional
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll abort"
});

dbB.demo.insertOne({
 _id: db.sequences.findOneAndUpdate(         // non-transactional
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll commit"
});

It is important to note that I increment ($inc) and fetch the value (returnDocument: "after") with db, out of the dbA or dbB transactions. The sequence operation is atomic, but non-transactional (not part of a multi-document transaction). This simulates the behavior of sequences in SQL databases.

The first transaction aborts (rollback) and the second one commits:


sessionA.abortTransaction();
sessionA.endSession();

sessionB.commitTransaction();
sessionB.endSession();

I check the result:

db.demo.find()

[
  { _id: { _id: 'demo_id', seq: 1 }, value: "I'm the first" },
  { _id: { _id: 'demo_id', seq: 3 }, value: "I'll commit" }
]

I have a gap in the numbers because "_id: 1" has been used by a transaction that aborted. The transaction has been rolled back, but because I incremented the sequence out of the transaction (using db instead of dbA) the incrementat was not rolled back.

Note that all updates to a single document are atomic, but I used findOneAndUpdate() so that it returns the updated document in the same atomic operation that updated it. It can return the before or after value and I used returnDocument: "after" to get the next value. I used upsert: true to initialize the sequence if no value exists, and $inc sets the field to the specified value when it doesn't exist.

No-gap Sequences (and optimistic locking)

If you want a no-gap sequence, you can fetch the sequence number in the multi-document transaction:


sessionA = db.getMongo().startSession();
sessionA.startTransaction();
dbA = sessionA.getDatabase(db.getName());

sessionB = db.getMongo().startSession();
sessionB.startTransaction();
dbB = sessionB.getDatabase(db.getName());

dbA.demo.insertOne({
 _id: dbA.sequences.findOneAndUpdate(      // part of the transaction
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll abort"
});

dbB.demo.insertOne({
 _id: dbB.sequences.findOneAndUpdate(      // part of the transaction
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll commit"
});

When two transactions try to increment the same sequence, an optimistic locking error is raised:


MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction.

This is a retriable error and the application should have implemented a retry logic:

sessionA.abortTransaction();


// retry the insert
sessionB.abortTransaction();
sessionB.startTransaction();
dbB.demo.insertOne({
 _id: dbB.sequences.findOneAndUpdate(      // part of the transaction
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll commit"
});

sessionB.commitTransaction();
sessionB.endSession();
sessionA.endSession();

I check the result:

db.demo.find()

[
  { _id: { _id: 'demo_id', seq: 1 }, value: "I'm the first" },
  { _id: { _id: 'demo_id', seq: 3 }, value: "I'll commit" },
  { _id: { _id: 'demo_id', seq: 4 }, value: "I'll commit" }
]


Here, the first session rollback didn't introduce another gap because the sequence increment was part of the insert transaction. To achieve the same in a SQL database, you must use a table and UPDATE ... SET seq=seq+1 RETURNING seq;. With pessimistic locking, it acquires a lock and waits for the other transaction to complete. To be scalable, SQL databases provide a non-transactional SEQUENCE that does the same without waiting, but with gaps. It still has some scalability issues, and distributed databases may discourage (CockroachDB raises a warning) or even not support sequences (like Google Spanner or Amazon Aurora DSQL).

Incrementing identifiers for the primary key

MongoDB provides developers with greater control, allowing them to apply the same logic to a collection while deciding whether to include it in a transaction. It also supports optimized atomic operations. You can also use Atlas triggers to deploy the logic into the managed database, like demonstrated in MongoDB Auto-Increment

It's important to note that generating an incremented sequence is typically rare, primarily occurring during migrations from MySQL's AUTO_INCREMENT or PostgreSQL's BIGSERIAL. The default "_id" field is a globally unique and scalable ObjectId. You can also use a UUID generated by the application and choose the format (UUIDv4 or UUIDv7) to distribute or collocate the documents inserted at the same time.

Sequences in MongoDB

In a previous post about No-gap sequence in PostgreSQL and YugabyteDB, I mentioned that sequences in SQL databases are not transactional, which can lead to gaps. MongoDB does not require a special sequence object. A collection can be used thanks to incremental update ($inc) and returning the updated value in a single atomic operation with findOneAndUpdate().

Here is an example. I create a "sequence" collection to hold sequence numbers and a "demo" collection to insert values with an auto incremented identifier:

db.createCollection("demo");
db.createCollection("sequence");

An insert can simply fetch the next value while incrementing it:

db.demo.insertOne({
 _id: db.sequences.findOneAndUpdate(         
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'm the first"
});

db.demo.find();

[ { _id: { _id: 'demo_id', seq: 1 }, value: "I'm the first" } ]

I start two transactions:

sessionA = db.getMongo().startSession();
sessionA.startTransaction();
dbA = sessionA.getDatabase(db.getName());

sessionB = db.getMongo().startSession();
sessionB.startTransaction();
dbB = sessionB.getDatabase(db.getName());

Scalable sequence (accepting gaps on rollback)

The two transactions insert a document into "demo" with an "_id" fetched from the "sequences":

dbA.demo.insertOne({
 _id: db.sequences.findOneAndUpdate(         // non-transactional
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll abort"
});

dbB.demo.insertOne({
 _id: db.sequences.findOneAndUpdate(         // non-transactional
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll commit"
});

It is important to note that I increment ($inc) and fetch the value (returnDocument: "after") with db, out of the dbA or dbB transactions. The sequence operation is atomic, but non-transactional (not part of a multi-document transaction). This simulates the behavior of sequences in SQL databases.

The first transaction aborts (rollback) and the second one commits:


sessionA.abortTransaction();
sessionA.endSession();

sessionB.commitTransaction();
sessionB.endSession();

I check the result:

db.demo.find()

[
  { _id: { _id: 'demo_id', seq: 1 }, value: "I'm the first" },
  { _id: { _id: 'demo_id', seq: 3 }, value: "I'll commit" }
]

I have a gap in the numbers because {_id: 2} has been used by a transaction that aborted. The transaction has been rolled back, but because I incremented the sequence out of the transaction (using db instead of dbA) the increment was not rolled back.

Note that all updates to a single document are atomic, but I used findOneAndUpdate() so that it returns the updated document in the same atomic operation that updated it. It can return the before or after value and I used returnDocument: "after" to get the next value. I used upsert: true to initialize the sequence if no value exists, and $inc sets the field to the specified value when it doesn't exist.

No-gap Sequences (and optimistic locking)

If you want a no-gap sequence, you can fetch the sequence number in the multi-document transaction:


sessionA = db.getMongo().startSession();
sessionA.startTransaction();
dbA = sessionA.getDatabase(db.getName());

sessionB = db.getMongo().startSession();
sessionB.startTransaction();
dbB = sessionB.getDatabase(db.getName());

dbA.demo.insertOne({
 _id: dbA.sequences.findOneAndUpdate(      // part of the transaction
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll abort"
});

dbB.demo.insertOne({
 _id: dbB.sequences.findOneAndUpdate(      // part of the transaction
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll commit"
});

When two transactions try to increment the same sequence, an optimistic locking error is raised:


MongoServerError[WriteConflict]: Caused by :: Write conflict during plan execution and yielding is disabled. :: Please retry your operation or multi-document transaction.

This is a retriable error and the application should have implemented a retry logic:

sessionA.abortTransaction();


// retry the insert
sessionB.abortTransaction();
sessionB.startTransaction();
dbB.demo.insertOne({
 _id: dbB.sequences.findOneAndUpdate(      // part of the transaction
  { _id: "demo_id" },
  { $inc: { seq: 1 } },
  { upsert: true, returnDocument: "after" }
),
 value: "I'll commit"
});

sessionB.commitTransaction();
sessionB.endSession();
sessionA.endSession();

I check the result:

db.demo.find()

[
  { _id: { _id: 'demo_id', seq: 1 }, value: "I'm the first" },
  { _id: { _id: 'demo_id', seq: 3 }, value: "I'll commit" },
  { _id: { _id: 'demo_id', seq: 4 }, value: "I'll commit" }
]


Here, the first session rollback didn't introduce another gap because the sequence increment was part of the insert transaction. To achieve the same in a SQL database, you must use a table and UPDATE ... SET seq=seq+1 RETURNING seq;. With pessimistic locking, it acquires a lock and waits for the other transaction to complete. To be scalable, SQL databases provide a non-transactional SEQUENCE that does the same without waiting, but with gaps. It still has some scalability issues, and distributed databases may discourage (CockroachDB raises a warning) or even not support sequences (like Google Spanner or Amazon Aurora DSQL).

Incrementing identifiers for the primary key

MongoDB provides developers with greater control, allowing them to apply the same logic to a collection while deciding whether to include it in a transaction. It also supports optimized atomic operations. You can also use Atlas triggers to deploy the logic into the managed database, like demonstrated in MongoDB Auto-Increment

It's important to note that generating an incremented sequence is typically rare, primarily occurring during migrations from MySQL's AUTO_INCREMENT or PostgreSQL's BIGSERIAL. The default "_id" field is a globally unique and scalable ObjectId. You can also use a UUID generated by the application and choose the format (UUIDv4 or UUIDv7) to distribute or collocate the documents inserted at the same time.

How Can AI Talk to My Database Part Two: MySQL and Gemini

My first experiments creating an MCP Server to provide AI access to a PostgreSQL database using the FastMCP Python framework and Anthropic’s and OpenAI’s APIs highlighted an important requirement: for now, these two APIs can only communicate with an MCP server through HTTPS over a public URL. While researching how to make this work (which […]

July 17, 2025

How Can AI Talk to My (PostgreSQL) Database?

I admittedly have some work to do to catch up with the AI “trend”. It’s been around (as in, easily accessible) for a few years now, but I can probably still count on my fingers the number of times I’ve used a prompt to ask it anything. That is, discounting the mostly frustrating and usually […]

July 16, 2025

July 15, 2025

Volatility classification in PostgreSQL

In this post, we discuss different ways you can use volatility classification with functions in PostgreSQL and provide best practices to help you keep your database optimized and develop efficient and reliable database applications.

July 14, 2025

Amazon Aurora DSQL for gaming use cases

In this post, we show you how Amazon Aurora DSQL powers modern gaming use cases from real-time multiplayer interactions to globally consistent leaderboards by delivering seamless scalability, strong consistency and built-in multi-region availability.

MyDumper Refactors Locking Mechanisms

In my previous blog post, Understanding trx-consistency-only on MyDumper Before Removal, I talked about --trx-consistency-only removal, in which I explained that it acts like a shortcut, reducing the amount of time we have to block the write traffic to the database by skipping to check if we are going to backup any non-transactional tables. Now, […]

Introducing JWT Signing Keys

A new JWT signing keys system based on public key cryptography to improve your project's security and performance.

Supabase UI: Platform Kit

We are launching new components in the Supabase UI Library that makes it incredibly easy to build platforms on top of Supabase.

July 10, 2025

ATC/OSDI’25 Technical Sessions

ATC and OSDI ran in parallel. As is tradition, OSDI was single-track; ATC had two parallel tracks. The schedules and papers are online as linked above.

USENIX is awesome: it has been open access for its conference proceedings since 2008. So you can access all the paper pdfs through the links above now. I believe the presentation videos will be made available soon as well. Kudos to USENIX!

I attended the OSDI opening remarks delivered by the PC chairs, Lidong Zhou (Microsoft) and Yuan Yuan Zhou (UCSD). OSDI saw 339 submissions this year, which is up 20% from last year. Of those, 53 were accepted, for an acceptance rate of 16%. The TPC worked through Christmas to keep the publication machine running. We really are a bunch of workaholics. Who needs family time when you have rebuttals to respond to?

OSDI gave two best paper awards:

  • Basilisk: Using Provenance Invariants to Automate Proofs of Undecidable Protocols. Tony Nuda Zhang and Keshav Singh, University of Michigan; Tej Chajed, University of Wisconsin-Madison; Manos Kapritsos, University of Michigan; Bryan Parno, Carnegie Mellon University
  • Building Bridges: Safe Interactions with Foreign (programming) Languages through OmniGlot. Leon Schuermann and Jack Toubes, Princeton University; Tyler Potyondy and Pat Pannuto, University of California San Diego; Mae Milano and Amit Levy, Princeton University

The Distinguished Artifact award went to: PoWER Never Corrupts: Tool-Agnostic Verification of Crash Consistency and Corruption Detection. Hayley LeBlanc, University of Texas at Austin; Jacob R. Lorch and Chris Hawblitzel, Microsoft Research; Cheng Huang and Yiheng Tao, Microsoft; Nickolai Zeldovich, MIT CSAIL and Microsoft Research; Vijay Chidambaram, University of Texas at Austin

2026 OSDI' will be in Seattle organized by Eddie Kohler (Harvard)  and Amar Phanishayee (Meta).


Emery Berger’s Keynote: Accelerating software development, the LLM revolution

Deniz Altinbuken (ATC cochair) introduced the OSDI/ATC-joint keynote speaker: Professor Emery Berger of Umass Amherst, who is also an Amazon scholar. Emery's Plasma lab is a birthplace of many interesting projects: Scalene, Hoard, DieHard. Emery is sigma. I mean, he has an ACM Sigma distinguished service award, and he is an ACM fellow. He is also the creator of cs-ranking, which he keeps getting occasional hate mail every now and then. 

Emery gives great talks. I think this was the best presentation across all OSDI and ATC sessions. What is Emery's secret? Like other good speakers, rehearsing crazy number of times and preparing really well.

His talk was about the coming Cambrian explosion of LLM-augmented developer tools. Here’s the gist: "Traditional software development is the dinosaur. LLMs is the asteroid. Cambrian explosion is coming for tooling, not just apps. So let’s evolve in order not to go extinct."

He first introduced the framework he used repeatedly for the new generation of LLM-enhanced tools coming out of his lab.

  • Evolve.
  • Exploit a niche.
  • Ensure fitness.

He argued throughout the talk that AI-assisted tools can deliver the best of both worlds precision from PL, and flexibility from LLMs.

Scalene: Evolving the Profiler. Emery's group LLM-augmented Scalene their Python profiler tool, to not only pinpoint inefficiencies in your code but also to explain why your code is slow and provide suggestions on how to fix them. Common suggestions include: replace interpreted loops with native libraries, vectorize, use GPU. LLMs help generate optimization suggestions with surprising performance boosts (sometimes 90x). That was evolve (adopt LLMs), and exploit a niche (profiler knows where code is inefficient, why it is inefficient). The ensure fitness comes from running the optimized code against the original. 

chatDBG: Evolving a Debugger. Most developers don’t use debuggers. Print statements are more popular. Why? Because debuggers are clunky and stagnant. chatDBG changes this. It turns the debugger into a conversational assistant. You can type "why" at the prompt, and it gives you the root cause and a proposed fix. You can query code slices, symbol declarations, even ask for documentation. The LLM has access to dynamic program state and source context provided by the debugger, and external knowledge provided by huge real world training. Success rates improved with more context: 25% (default stack), 50% (with "why"), 75–80% (with targeted queries). Safety is enforced using containers and command whitelists. Again: evolve the interface, exploit the niche (source + state + knowledge), ensure fitness (bounded functionality and testable fixes).

cwhy: Evolving a Compiler. C++ compiler errors are notoriously incomprehensible. cwhy wraps around Clang++ and gives human-readable explanations with concrete fix suggestions. It uses git diffs to localize the cause and context of the error. The assumption: if it compiled yesterday and not today, something broke in the diff. cwhy figures out what and suggests ways to fix it. It even handles things like regex errors. This tool impressed a library author so much that they adopted it after cwhy fixed a bug in their freshly published code.

coverup: Evolving a Coverage Tool. Writing good tests is hard and thankless. Coverage reports only cause guilt-trips. LLMs can write tests, but cannot methodically increase coverage. Coverup is a next-gen testing assistant.  It builds on slipcover (Plasma's coverage analysis tool), then uses LLMs to generate new test cases that increase branch coverage methodically.

flowco: Rethinking Notebooks. Jupyter notebooks are the lingua franca of data science. But they're also a mess: brittle, unstructured, and difficult to maintain. Flowco reimagines notebooks as dataflow graphs. Each step (load, clean, wrangle, visualize) is a node in the graph/pipeline. LLMs guide code generation at each step. Combined with pre/post condition checks, you get a smart notebook that requires no code but ensures correct workflows. The metaphor shift from script to graph is what enables notebooks to evolve.


The talk was a tour-de-force across many tools showing how LLMs can help them evolve and improve significantly. Emery is doing a big service building these tools to help all developers. Based on my interactions with talented coders, I have come to conclude that LLMs actually boost performance way more for experts than beginners. I think these tools will help all kinds of developers, not just experts. 


Using Provenance Invariance to automate proofs

This paper was one of the standout papers from OSDI'25, and winner of a Best Paper Award. The lead author, Tony Zhang, inow an engineer at Databricks, was a PhD student at University of Michigan and a MongoDB PhD Fellow. He delivered a sharp, polished talk. Here is my reconstruction from my notes.

Distributed protocols are notoriously hard to get right. Testing can show the presence of bugs but never their absence. Formal verification can give stronger correctness guarantees, but only if you can manage to prove your protocol satisfies a desired safety property. This involves crafting an inductive invariant: one that holds initially, implies the safety property, and is closed under protocol transitions.

Coming up with a suitable inductive invariant is hard. It's an iterative process: guess, check, fail, refine, repeat. You often start with a property you care about (say, agreement), only to find it isn’t inductive. Then you strengthen it with auxiliary lemmas and conditions. If you are skilled and somewhat lucky, you eventually get there. I wrote a bit about this loop back in 2019, after SOSP’19 on an earlier effort on this problem.

Basilisk’s goal is to short-circuit this painful invariant discovery loop using provenance invariants. Provenance invariants relate a local variable at a node to its provenance: the causal step or message that caused the variable to have its current value. For example, instead of guessing an invariant like "If replica A has voted yes, then replica B must also have voted yes," Basilisk works backwards to discover why A voted yes and then connects that cause to B’s state. Sort of history variables on steroids. By tracing data dependencies across steps and messages, Basilisk derives inter-host invariants that explain not just what the state is, but how it came to be.

Basilisk builds on the authors' prior work Kondo, which had two types of invariants:

  • Protocol invariants (expressive but hard to generate automatically)
  • Regular invariants (mechanically derivable but low-level)

Basilisk generalizes this by introducing host provenance (HP) and network provenance (NP). HP traces a variable's value to a local decision; NP traces it to a message received. Together, these form a causal chain, or witness, which Basilisk uses to justify why a state is safe.

The provenance invariants replace the original inductive invariant. Then Basilisk proves that these provenance invariants imply the desired property. All of this is implemented in a toolchain that extends the Dafny language and verifier. Protocols are modeled as async state machines in Dafny, Basilisk analyzes them, and outputs inductive invariants along with machine-checkable proofs. Basilisk was evaluated on 16 distributed protocols, including heavyweights like Paxos, MultiPaxos, and various consensus and replication variants.


Tigon: A Distributed Database for a CXL Pod

Traditional distributed databases synchronize over the network. This means network overhead, message exchange complexity, and coordination cost. Tigon replaces that network with a CXL memory pod. Instead of using sockets and RPCs, nodes coordinate using inter-host atomic instructions and hardware cache coherence. The only downside is this is only a single-rack-scale database design, which also comes with unavailability disadvantages.

There are still challenges remaining with this architecture. The CXL memory has higher latency (250–400ns) and lower bandwidth than local DRAM. It also provides limited hardware cache coherence capacity. Tigon addresses this with a hybrid architecture: partitioned and shared. All control-plane synchronization is done via CXL-coherent memory. And messages are only exchanged for data movement. 

To minimize coherence pressure, they compact metadata into 8-byte words and piggyback coordination info. They also show how to eliminate two-phase commit by taking advantage of reconstructability during crash recovery. The system is single-rack and does not replicate. It's built on a fail-stop assumption, and scaling (adding/removing nodes) requires restart. Evaluated on TPC-C and a YCSB variant using simulation experiments (since the hardware is not there), Tigon outperforms existing CXL-based shared-nothing DBs by up to 2.5x, and RDMA-based systems by up to 18.5x.


Mako: Speculative Distributed Transactions with Geo-Replication

Geo-replicated transactions are hard. The cost of consensus across data centers kills latency and throughput. Mako sidesteps this by decoupling execution from replication. Transactions execute speculatively using two-phase commit (2PC, Tupac) locally, without waiting for cross-region consensus. Replication happens in the background to achieve fault-tolerance.

The core idea is to allow transactions to commit optimistically and only roll back if replication later fails. This opens the door to cascading aborts, and Mako tries to alleviate unbounded rollback problem by tracking transactional dependencies using vector clocks.

Ok, but there is another problem. How do you get consensus among nodes on roll back when the speculative/optimistic execution fails. This was not explained during the talk, and I brought this up during Q&A. The answer seems to be using epochs and sealing, and doing this in a delayed manner. This could open more problems. I haven't read the paper to understand how this works.   


Skybridge: Bounded Staleness in Distributed Caches

From Meta (I guess we are not calling it Facebook anymore), this paper addresses the pain of eventual consistency in global cache layers. Eventual is fine for most cases, until it's not. Products like Facebook and Instagram rely on caches being mostly up-to-date. Inconsistencies may cause some occasional weird bugs, user confusion, and degraded experience.

Skybridge is an out-of-band replication stream layered on top of the main async replication pipeline. It adds redundancy without relying on the same code paths, avoiding correlated failures. Skybridge focuses only on timely delivery of updates, not durability.

Skybridge itself is also eventual consistency, but being lightweight it gives you bounded consistency almost always. By leveraging Bloom filter-based synchronization, Skybridge provides a 2-second bounded staleness window for 99.99998% of writes (vs. 99.993% with the baseline). All that, at just 0.54% of the cache deployment size because of reduced scope in this superposed/layered system.


SpecLog: Low End-to-End Latency atop a Speculative Shared Log

Shared logs are foundational in modern distributed systems (e.g., Corfu, Scalog, Boki). They are deployed in Meta (formerly Facebook) as I discussed here earlier. But their latency is often too high for real-time workloads because coordination delays application progress. SpecLog proposes speculative delivery: let applications start processing records before the final global order is fixed.

To make this safe, they introduce Fix-Ante Ordering: a mechanism that deterministically assigns quotas to each shard ahead of time. If each shard sticks to its quota, the global cut is predictable. If not, speculation may fail and need rollback. Their implementation, Belfast, shows 3.5x faster delivery latency and 1.6x improvement in end-to-end latency over existing shared logs.

This is conceptually similar to moving Scalog’s ordering step upfront and letting applications run optimistically. As I noted back in my Scalog post, removing coordination from the critical path is the holy grail. SpecLog pushes this further by betting on speculative execution + predictable quotas. Again, I haven't read the papers to analyze disadvantages. 


Wildcard Indexes ($**)

In SQL databases, we sometimes encounter an Entity-Attribute-Value (EAV) model to work around the rigidity of the relational model when different attributes are used among documents. In MongoDB, you can do the same with the Attribute Pattern, and index the attribute name and value, but it is not needed as documents can simply include multiple fields, and wildcard indexes can index each attribute. You can think of it like an UNPIVOT but applied only to the index entries.

The Youtube video statistics dataset imported in the first post of this series is a collection of one million videos. They embed an "accessControl" sub-object that stores a list of attributes (like 'command', 'rate', or 'syndicate') with a permission ('allowed', or 'moderated'):

Here is an example of document:

{  
  "_id": "---ALs2MJb8",  
  "accessControl": {  
    "comment":       { "permission": "allowed" },  
    "list":          { "permission": "allowed" },  
    "videoRespond":  { "permission": "moderated" },  
    "rate":          { "permission": "allowed" },  
    "syndicate":     { "permission": "allowed" },  
    "embed":         { "permission": "allowed" },  
    "commentVote":   { "permission": "allowed" },  
    "autoPlay":      { "permission": "allowed" }  
  },  
  "category": "Music",  
  "author":string"TriumphantPromotions",
  "publishedDate":string"2013-06-04T05:14:58Z",
...
}  

In this dataset, all attributes belong to a known list, but for the purpose of this example, we will treat them as unknown, refraining from creating individual indexes. The attribute pattern would have transformed it to the following, with a single field name that can be indexed:

{  
  "_id": "---ALs2MJb8",  
  "accessControl": [  
    { "type": "comment",      "permission": "allowed"    },  
    { "type": "list",         "permission": "allowed"    },  
    { "type": "videoRespond", "permission": "moderated"  },  
    { "type": "rate",         "permission": "allowed"    },  
    { "type": "syndicate",    "permission": "allowed"    },  
    { "type": "embed",        "permission": "allowed"    },  
    { "type": "commentVote",  "permission": "allowed"    },  
    { "type": "autoPlay",     "permission": "allowed"    }  
  ],  
  "category": "Music",  
  "author":string"TriumphantPromotions",
  "publishedDate":string"2013-06-04T05:14:58Z",
...
}  

Wildcard indexes function similarly, with an index key compound with the field name and the value, without modifying the document itself.

It is created like a regular index except that it can include a $** wildcard:

db.youstats.createIndex(
   { "author": 1, "accessControl.$**" : 1, "category": 1 }
)

In my data set, I have 68 videos from "Paramount Movies" and 3 of them have rate permission denied:

db.youstats.aggregate([  
  { $match: { author: "Paramount Movies" } },  
  {  
    $group: {  
      _id: "$accessControl.rate.permission",  
      count: { $sum: 1 }  
    }  
  }  
])  

[ { _id: 'allowed', count: 3 }, { _id: 'denied', count: 65 } ]

If I want to find only those with rate permission denied, I would have to create an index with "accessControl.rate.permission" in the key. Without it, it would have to find the 68 documents, and then filter out to eliminate 65 of them. Such an index would serve only the "rate" permission, and I would have to create many indexes for all permissions I might query, and that might be a lot with a flexible schema.

With my wildcard index, all fields under a path are automatically indexed. This allows queries to access the three relevant documents directly, even without prior knowledge of which permissions will be in the query filter:

db.youstats.find({
  author: "Paramount Movies",
  "accessControl.rate.permission": "allowed"
}).explain("executionStats").executionStats
;

{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 0,
  totalKeysExamined: 3,
  totalDocsExamined: 3,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 3,
    executionTimeMillisEstimate: 0,
    works: 5,
    advanced: 3,
...
    docsExamined: 3,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 3,
      executionTimeMillisEstimate: 0,
      works: 4,
      advanced: 3,
...
      keyPattern: {
        author: 1,
        '$_path': 1,
        'accessControl.rate.permission': 1,
        category: 1
      },
      indexName: 'author_1_accessControl.$**_1_category_1',
      isMultiKey: false,
      multiKeyPaths: {
        author: [],
        '$_path': [],
        'accessControl.rate.permission': [],
        category: []
      },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        author: [ '["Paramount Movies", "Paramount Movies"]' ],
        '$_path': [
          '["accessControl.rate.permission", "accessControl.rate.permission"]'
        ],
        'accessControl.rate.permission': [ '["allowed", "allowed"]' ],
        category: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 3,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

The number of index entries read, keysExamined: 3, matches the number of documents returned, nReturned: 3, indicating optimal access.

The index bounds reveal insights about the indexed keys:

  • The value searched in the first field of the key, author, is [ '"Paramount Movies", "Paramount Movies"' ]
  • The value searched in the $_path, which contains the field name, is [ '"accessControl.rate.permission", "accessControl.rate.permission"' ]
  • The value searched in the this field is [ '"allowed", "allowed"' ].
  • The last field of the key category has no filter applied, resulting in a scan of all values, represented as [ '[MinKey, MaxKey]' ].

Examining the index bounds provides valuable insights into the access patterns it can serve efficiently. For instance, if you see category: [ '[MinKey, MaxKey]' ], you can confidently add .sort({category:1}) to your query without increasing costs, as the index entries will already be in the required order.

If you have filters on multiple fields under a wildcard, the index might read more entries, but the filter is still covered before fetching the documents, like these:

db.youstats.find(  
  {  
    author: "Paramount Movies",  
    "accessControl.rate.permission": "allowed",  
    "accessControl.comment.permission": "denied"  
  }
)

db.youstats.find({  
  author: "Paramount Movies",  
  $or: [  
    { "accessControl.rate.permission": "allowed" },  
    { "accessControl.comment.permission": "denied" }  
  ]  
})

Wildcard indexes offer significant flexibility when dealing with documents that have evolving, dynamic, or unpredictable attribute sets. They prove especially valuable in various scenarios beyond access control permissions:

  1. User-Defined Content & Metadata: In applications that allow users to add custom fields—like tagging, profile properties, or annotation systems—there’s no need to anticipate and index every potential custom attribute in advance.

  2. IoT and Telemetry Data: Devices frequently send sensor readings or status fields that may vary over time or between models. Wildcard indexes enable efficient indexing of any combination of measurements or state fields within the same collection, accommodating unforeseen future fields without needing schema changes.

  3. Catalogs and Product Data: E-commerce platforms often manage products with differing attribute sets based on category (e.g., size, color, voltage, brand, material). Wildcard indexes eliminate the necessity for separate indexes for each potential attribute.

  4. Multi-Tenant or Extensible Systems: SaaS platforms and extensible business applications allow tenants or partners to define their own custom fields. Wildcard indexes facilitate efficient querying, regardless of the unique attributes present in each tenant’s data.

  5. Audit Logs and Event Sourcing: Log entries may feature arbitrary keys based on event type or source system. Wildcard indexes permit efficient filtering and retrieval of records, even as event schemas evolve.

Wildcard indexes exist in MongoDB, because flexible documents are native, and not in emulations on top of SQL databases.

I tested on Oracle with the MongoDB compatible API:

oracle> db.youstats.createIndex(
...    { "author": 1, "accessControl.$**" : 1, "category": 1 }
... )
MongoServerError[MONGO-67]: Wildcard indexes are not supported.

I also tested on FerretDB/PostgreSQL which uses the DocumentDB extension that powers CosmosDB in Azure:

ferretdb> db.youstats.createIndex(
...        { "author": 1, "accessControl.$**" : 1, "category": 1 }
...     )
MongoServerError[CannotCreateIndex]: Error in specification { "name" : "author_1_accessControl.$**_1_category_1", "key" : { "author" : 1, "accessControl.$**" : 1, "category" : 1 } } 
:: caused by 
:: wildcard indexes do not allow compounding

Wildcard indexes significantly simplify operational complexity for developers and DBAs. Instead of managing an expanding array of single-field indexes or restructuring data into rigid attribute patterns, a single wildcard index can adapt to accommodate various query patterns as requirements evolve.

Wildcard Indexes: MongoDB flexibility joins query performance

In SQL databases, we sometimes encounter an Entity-Attribute-Value (EAV) model to work around the rigidity of the relational model when different attributes are used among documents. In MongoDB, you can do the same with the Attribute Pattern, and index the attribute name and value, but it is not needed as documents can simply include multiple fields, and wildcard indexes can index each attribute. You can think of it like an UNPIVOT but applied only to the index entries.

The Youtube video statistics dataset imported in the first post of this series is a collection of one million videos. They embed an "accessControl" sub-object that stores a list of attributes (like 'comment', 'rate', or 'syndicate') with a permission ('allowed', or 'moderated'):

Here is an example of document:

{  
  "_id": "---ALs2MJb8",  
  "accessControl": {  
    "comment":       { "permission": "allowed" },  
    "list":          { "permission": "allowed" },  
    "videoRespond":  { "permission": "moderated" },  
    "rate":          { "permission": "allowed" },  
    "syndicate":     { "permission": "allowed" },  
    "embed":         { "permission": "allowed" },  
    "commentVote":   { "permission": "allowed" },  
    "autoPlay":      { "permission": "allowed" }  
  },  
  "category": "Music",  
  "author":string"TriumphantPromotions",
  "publishedDate":string"2013-06-04T05:14:58Z",
...
}  

In this dataset, all attributes belong to a known list, but for the purpose of this example, we will treat them as unknown, refraining from creating individual indexes. The attribute pattern would have transformed it to the following, with a single field name that can be indexed:

{  
  "_id": "---ALs2MJb8",  
  "accessControl": [  
    { "type": "comment",      "permission": "allowed"    },  
    { "type": "list",         "permission": "allowed"    },  
    { "type": "videoRespond", "permission": "moderated"  },  
    { "type": "rate",         "permission": "allowed"    },  
    { "type": "syndicate",    "permission": "allowed"    },  
    { "type": "embed",        "permission": "allowed"    },  
    { "type": "commentVote",  "permission": "allowed"    },  
    { "type": "autoPlay",     "permission": "allowed"    }  
  ],  
  "category": "Music",  
  "author":string"TriumphantPromotions",
  "publishedDate":string"2013-06-04T05:14:58Z",
...
}  

Wildcard indexes function similarly, with an index key compound with the field name and the value, without modifying the document itself.

It is created like a regular index except that it can include a $** wildcard:

db.youstats.createIndex(
   { "author": 1, "accessControl.$**" : 1, "category": 1 }
)

In my data set, I have 68 videos from "Paramount Movies" and 3 of them have rate permission denied:

db.youstats.aggregate([  
  { $match: { author: "Paramount Movies" } },  
  {  
    $group: {  
      _id: "$accessControl.rate.permission",  
      count: { $sum: 1 }  
    }  
  }  
])  

[ { _id: 'allowed', count: 3 }, { _id: 'denied', count: 65 } ]

If I want to find only those with rate permission denied, I would have to create an index with "accessControl.rate.permission" in the key. Without it, it would have to find the 68 documents, and then filter out to eliminate 65 of them. Such an index would serve only the "rate" permission, and I would have to create many indexes for all permissions I might query, and that might be a lot with a flexible schema.

With my wildcard index, all fields under a path are automatically indexed. This allows queries to access the three relevant documents directly, even without prior knowledge of which permissions will be in the query filter:

db.youstats.find({
  author: "Paramount Movies",
  "accessControl.rate.permission": "allowed"
}).explain("executionStats").executionStats
;

{
  executionSuccess: true,
  nReturned: 3,
  executionTimeMillis: 0,
  totalKeysExamined: 3,
  totalDocsExamined: 3,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 3,
    executionTimeMillisEstimate: 0,
    works: 5,
    advanced: 3,
...
    docsExamined: 3,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 3,
      executionTimeMillisEstimate: 0,
      works: 4,
      advanced: 3,
...
      keyPattern: {
        author: 1,
        '$_path': 1,
        'accessControl.rate.permission': 1,
        category: 1
      },
      indexName: 'author_1_accessControl.$**_1_category_1',
      isMultiKey: false,
      multiKeyPaths: {
        author: [],
        '$_path': [],
        'accessControl.rate.permission': [],
        category: []
      },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: {
        author: [ '["Paramount Movies", "Paramount Movies"]' ],
        '$_path': [
          '["accessControl.rate.permission", "accessControl.rate.permission"]'
        ],
        'accessControl.rate.permission': [ '["allowed", "allowed"]' ],
        category: [ '[MinKey, MaxKey]' ]
      },
      keysExamined: 3,
      seeks: 1,
      dupsTested: 0,
      dupsDropped: 0
    }
  }
}

The number of index entries read, keysExamined: 3, matches the number of documents returned, nReturned: 3, indicating optimal access.

The index bounds reveal insights about the indexed keys:

  • The value searched in the first field of the key, author, is [ '"Paramount Movies", "Paramount Movies"' ]
  • The value searched in the $_path, which contains the field name, is [ '"accessControl.rate.permission", "accessControl.rate.permission"' ]
  • The value searched in the this field is [ '"allowed", "allowed"' ].
  • The last field of the key category has no filter applied, resulting in a scan of all values, represented as [ '[MinKey, MaxKey]' ].

Examining the index bounds provides valuable insights into the access patterns it can serve efficiently. For instance, if you see category: [ '[MinKey, MaxKey]' ], you can confidently add .sort({category:1}) to your query without increasing costs, as the index entries will already be in the required order.

If you have filters on multiple fields under a wildcard, the index might read more entries, but the filter is still covered before fetching the documents, like these:

db.youstats.find(  
  {  
    author: "Paramount Movies",  
    "accessControl.rate.permission": "allowed",  
    "accessControl.comment.permission": "denied"  
  }
)

db.youstats.find({  
  author: "Paramount Movies",  
  $or: [  
    { "accessControl.rate.permission": "allowed" },  
    { "accessControl.comment.permission": "denied" }  
  ]  
})

Wildcard indexes offer significant flexibility when dealing with documents that have evolving, dynamic, or unpredictable attribute sets. They prove especially valuable in various scenarios beyond access control permissions:

  1. User-Defined Content & Metadata: In applications that allow users to add custom fields—like tagging, profile properties, or annotation systems—there’s no need to anticipate and index every potential custom attribute in advance.

  2. IoT and Telemetry Data: Devices frequently send sensor readings or status fields that may vary over time or between models. Wildcard indexes enable efficient indexing of any combination of measurements or state fields within the same collection, accommodating unforeseen future fields without needing schema changes.

  3. Catalogs and Product Data: E-commerce platforms often manage products with differing attribute sets based on category (e.g., size, color, voltage, brand, material). Wildcard indexes eliminate the necessity for separate indexes for each potential attribute.

  4. Multi-Tenant or Extensible Systems: SaaS platforms and extensible business applications allow tenants or partners to define their own custom fields. Wildcard indexes facilitate efficient querying, regardless of the unique attributes present in each tenant’s data.

  5. Audit Logs and Event Sourcing: Log entries may feature arbitrary keys based on event type or source system. Wildcard indexes permit efficient filtering and retrieval of records, even as event schemas evolve.

The wildcard index is a feature of MongoDB where the flexibility of documents is native. In contrast, MongoDB emulations built on top of SQL databases struggle to offer the same level of flexibility, as they are constrained by the limitations of RDBMS engines.

I tested on Oracle with the MongoDB compatible API:

oracle> db.youstats.createIndex(
...    { "author": 1, "accessControl.$**" : 1, "category": 1 }
... )
MongoServerError[MONGO-67]: Wildcard indexes are not supported.

I also tested on FerretDB which uses PostgreSQL with the DocumentDB extension that powers CosmosDB in Azure:

ferretdb> db.youstats.createIndex(
...        { "author": 1, "accessControl.$**" : 1, "category": 1 }
...     )
MongoServerError[CannotCreateIndex]: Error in specification { "name" : "author_1_accessControl.$**_1_category_1", "key" : { "author" : 1, "accessControl.$**" : 1, "category" : 1 } } 
:: caused by 
:: wildcard indexes do not allow compounding

AWS has a service compatible with old versions of MongoDB, Amazon DocumentDB, and this fails:

docdb> db.youstats.createIndex(
...        { "author": 1, "accessControl.$**" : 1, "category": 1 }
...     )
MongoServerError: Index type not supported : $**

Google Firestore also provides a MongoDB emulation, which doesn't support wildcard indexes either. If you try other databases that pretend being MongoDB compatible, please tell me if it works.

Wildcard indexes significantly simplify operational complexity for developers and DBAs. Instead of creating more indexes or restructuring documents with the attribute pattern, a single wildcard index can adapt to accommodate various query patterns as requirements evolve.

ATC/OSDI 2025 impressions

This week I was in Boston for ATC/OSDI’25. Downtown Boston is a unique place where two/three-hundred-year-old homes and cobblestone streets are mixed with sleek buildings and biotech towers. The people here look wicked smart and ambitious (although lacking the optimism/cheer of Bay area people). It’s a sharp contrast from Buffalo, where the ambition is more about not standing out.

Boston was burning. 90°F and humid. I made the mistake of booking late, so I got the DoubleTree Boston-Downtown instead of the conference hotel. The mile-long walk to the Sheraton felt like a hike through a sauna. By the time I got there, my undershirt was soaked, and stuck to my back cold under the conference hall’s AC.  Jane Street's fitted t-shirt swag saved the day.

The Sheraton looked ragged from the outside, aged on the inside, but it was functional. The conference felt underfilled, with many empty seats. Later, I learned that the total ATC+OSDI attendance was under 500. That's a big drop from even ATC/OSDI 2022 attendance, which I discussed here.  

The conference was also low energy. Few questions after talks. People felt tired. Where were the 20+ strong MIT systems profs? ATC/OSDI happened in their backyard, but there was only one of them, and that only for the first day/morning.

The presentation quality was disappointing. A couple speakers looked like they were seeing the slides for the first time. Very demoralizing. Many ATC talks were just low-quality recorded videos. Visa issues accounted for many of the no-shows, which sucks. I can’t believe  we are dealing with this in 2025. Apparently, some European faculty are skipping U.S. conferences altogether now because of the political climate.

Missing speakers were somehow 10x more prevalent in ATC than OSDI. Two out of five talks were prerecorded (with no Q&A later) in the ATC session I chaired, as with several of the other sessions. I guess in a couple cases some US-based co-authors didn’t even bother to show up and present. I just saw one OSDI talk being prerecorded. And the chair just told people to watch the recording later rather than playing the recording, which honestly felt like the right call. At the end of the day, I can still understand the prerecorded talks, but the low presentation quality in many of the talks are unexcusable. Boring talks meant empty seats, and the people in the room checking emails in their laptops rather than listening. 

The conference attendees this year had a striking bimodal distribution. On the one end, there were very young PhD students, and even some undergraduates. On the other: old-timers from the first USENIX days (70+ or 80+ years old), in town for USENIX's 50th anniversary and what feels like ATC’s last rites, as USENIX Annual Technical Conference was ended this year.

Conferences live and die by the community around them. When the community around the conference weakens, the quality and energy degrades. ATC is ended, and it seems like OSDI needs some TLC (tender loving care) to build up the community around it. This is a very hard thing to do, and I suspect there are no quick/easy hacks.

I don't know. Maybe people are tired and overwhelmed. The conference submissions keep going up by about 30% each year, the program committee reviewing is hard and thankless. It is getting harder and harder to get papers accepted. Maybe people are getting fed up with the paper publishing game, and as a result don't find conferences as useful or sincere. Oh, well...

Coincidentally, both NSDI and SOSP PC meetings start this Thursday, and there was a flurry of online discussion about the papers on Monday and Tuesday. Monday night I had to look back on papers I reviewed to respond to the rebuttals and discussion comments. I had to stay up late till after midnight, and I had Squid Games final season opened on hotel TV to get some background noise. Let's just say there are a lot of parallels with Squid Games and the publishing game.



I ran into many interesting folks in the hallway track: folks working on ML infrastructure, teaching LLMs to code, running AI infrastructure at OpenAI, researching new hardware for distributed systems. I met Amplify VC people, Sunil and Arjun, both very smart and technical. They fund distributed systems work and infrastructure for AI.

One pattern I noticed was there were lots of young folks skipping the PhD pipeline entirely. They went straight from school (with some undergraduate research work under their belt, and I presume good coding skills) to Anthropic or OpenAI. 

Hallway conversations should be easier to start. I always enjoy them once they get going. It’s the starting that's hard. But it’s worth pushing through the awkwardness.


Swag Rankings

I will talk about some intesting papers in a later blog post. Now, let’s get to the real reason we’re all here: the swag.

Databricks: A sticker. I swear that was it. They came with 20 databricks stickers to pass around. Are you serious?

Amazon: A shopping bag. Thanks, but no thanks.

Google: Stickers. And hats, but they only display the hats on the table, and don't give them to you. When I asked for a hat, they said it is for the students only (how did they know I am not a student?). And I don't know if they let students sign their soul out, just to pass them a hat.

Meta: Three ballpoint pens packaged/branded neatly. You give 100 millions a year to poach AI researchers, but you only pass around ballpoint pens at ATC/OSDI? (Well, on testing, the pens write real smooth, and my kids like them. Still beats the shopping bag, which I didn't bother picking up.)

Why send two staff to conferences just to hand out crap? You burned four days of salary and travel just to say, "Apply on our website" to people approaching the booth. This is not outreach. If you’re not going to hand out decent swag, don’t put up a booth. At least have some dignity.

Jane Street, on the other hand was pure class. Their t-shirts are so soft and form-fitting, feeling like they were spun from distilled 401K pensions. You know what? I no longer feel bad about them recruiting top talent from research.

Working with Geospatial Data? PostGIS Makes PostgreSQL Enterprise-Ready

Do you find yourself struggling with geospatial data in your database? You know the feeling: you need quick answers about locations, distances, and relationships between points on a map, but your database just wasn’t built for these questions. The problem? While fantastic for traditional data, PostgreSQL on its own doesn’t natively handle the complexities of […]

July 08, 2025

How Aqua Security automates fast clone orchestration on Amazon Aurora at scale

Aqua Security is a leading provider of cloud-based security solutions, trusted by global enterprises to secure their applications from development to production. In this post, we explore how Aqua Security automates the use of Amazon Aurora fast clones to support read-heavy operations at scale, simplify their data workflows, and maintain operational efficiency.

How TalentNeuron optimized data operations and cut costs and modernized with Amazon Aurora I/O-Optimized

For years, TalentNeuron, a leader in talent intelligence and workforce planning, has been empowering organizations with data-driven insights by collecting and processing vast amounts of job board data. In this post, we share three key benefits that TalentNeuron realized by using Amazon Aurora I/O-Optimized as part of their new data platform: reduced monthly database costs by 29%, improved data validation performance, and accelerated innovation through modernization.

Transparent Data Encryption: The Best Way to Secure Your Data in PostgreSQL

Welcome to the open source implementation of PostgreSQL Transparent Data Encryption! This question was posed on the PostgreSQL forum, and the good news is that it’s actually pretty easy to do! Q: Is it possible to automate the steps to enable pg_tde for all new databases?A: Yes! Here’s the routine: Part I: Download Percona Distribution […]

ALTER TABLE ... ADD COLUMN

MongoDB’s flexible schema allows each document within a collection to have a unique structure, a feature known as polymorphism. This contrasts with SQL databases, where every row in a table must adhere to a predefined set of columns. To support polymorphic data models without multiplying tables, SQL schemas often include many optional columns, which are frequently left null. These null values usually signify "not applicable" rather than "unknown". Furthermore, unused columns must be explicitly defined in the database schema, and modifying this structure typically requires locking the table, complicating maintenance.

This rigidity is often a point of comparison, favoring MongoDB, where applications can introduce new schemas without affecting existing objects. A common example was the ALTER TABLE ADD COLUMN statement, which locks the table because DDL operations must modify the catalog information shared by all table rows. It was often mentionned to illustrate the rigidity of RDBMS. When updating existing rows, this lock could last a long time, causing significant delays.
However, while many ALTER TABLE operations still require rewriting the table, such as changing a data type, adding a column is no longer one of them. Most SQL databases now optimize adding a column as a metadata-only operation, making it faster and more efficient than before. The main difference with MongoDB is how this change is controlled by the database administrator or the application developer.

This blog post explains that mentioning ALTER TABLE ADD COLUMN to showcase schema flexibility is not ideal because it has been optimized in many RDBMS. It is the occasion to explain how it works internally in PostgreSQL, and that it is similar to what developers do with MongoDB.

Example on PostgreSQL

Adding a column requires an exclusive lock, but since PostgreSQL 11 this lock duration is now very short (if it doesn't have itself to wait on another lock).

For example, I created a table with ten million rows:

postgres=# \timing on
Timing is on.

postgres=# create table iloveddl ( id bigint );
CREATE TABLE
Time: 2.026 ms

postgres=# insert into iloveddl select generate_series(1,1e7);
INSERT 0 10000000
Time: 31328.019 ms (00:31.328)

PostgreSQL updates are more costly than inserts, so rewriting a table can take several minutes. I use this to verify that my ALTER TABLE statements do not trigger a re-write of the rows.

Here is the first row in the table:

postgres=# select * from iloveddl where id <2;
 id 
----
  1
(1 row)

Time: 839.856 ms

The column information is stored in the catalog and can be viewed in the pg_attribute table. To retrieve details such as the column name, number, nullability, and nullable or missing value, you can run the following SQL query:

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;                                         

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
(7 rows)

Time: 0.569 ms

In addition to the system columns, there is an entry for the single column I defined in CREATE TABLE: "id". All flags are set to false: this column is nullable, has no default value, and lacks a value to replace a missing one. In SQL databases, a missing value, indicating no value exists, is different from a null, which signifies an unknown value. The missing value flag is used to optimize adding columns with default values without needing to rewrite all rows.

NULL with no DEFAULT

To add a nullable column without a default value in PostgreSQL, the following SQL command adds a new column named "x" of type integer to the "iloveddl" table. Since no default value is specified and the column is nullable by default, it will contain NULL for existing rows:

postgres=# alter table iloveddl add column x int;

ALTER TABLE
Time: 1.760 ms

This was a quick operation, involving only metadata, without rewriting the rows. Existing rows have no information about this new column, but on query, PostgreSQL adds this column to the result with a null value:

postgres=# select * from iloveddl where id <2;
 id | x 
----+---
  1 |  
(1 row)

Time: 206.647 ms

This is a common scenario where the SQL semantics are straightforward: when a new column is added, it logically exists for all rows, but its value is unknown for those created before the column addition. Unknown values are represented as NULL. Prior to the ALTER TABLE command, the column did not exist. Afterward, it exists with an unknown value.

Although the ALTER TABLE ADD COLUMN operation affects all rows logically, it doesn't need to modify physical rows. Instead, the database catalog marks the existence of the new column for all rows, and at runtime, a physical non-existence is interpreted as a logical NULL.

PostgreSQL has simply added the column in the dictionary, as nullable (attnotnull):

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;  

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
 x        |      2 | f          | f         | f             | 
(8 rows)

Time: 0.510 ms

With a simple insert into the catalog, PostgreSQL changed the definition of all rows, from inexisting "x" to existing with unknown value.
For example, in a small business scenario, you may have stored only customers' names and emails. If you then decide to add a 'date of birth' column, this information was likely always existing but previously unrecorded. After the addition, the date of birth for existing customers will appear as NULL, indicating that the value is unknown.

NULL with DEFAULT

SQL allows an insert to omit a column, setting its value to NULL by default, which indicates the value is unknown at insert time. However, SQL developers can specify that omitted columns should instead take a default value. The column remains nullable, meaning it can be explicitly set to NULL in an insert statement, but the absence of a value is different than a null assignment, and must set the default value.

When adding a column with a default value in PostgreSQL, existing rows are treated as if they were inserted on the table with the new column, but unspecified by the insert statement. The new column must now return the default value for the existing rows. Before PostgreSQL 11, the ALTER TABLE command had to write this default value into every row, which could be slow. This limitation was used to illustrate the rigidity of schema changes in SQL databases.

Currently, this particular case is optimized and performs quickly in the latest versions:

postgres=# alter table iloveddl add column y int null default 42;

ALTER TABLE
Time: 2.802 ms

The default value used for future inserts is stored in the pg_attrdef table. You can retrieve this information with the following SQL query:

select * from pg_attrdef 
 where adrelid='iloveddl'::regclass
;

  oid   | adrelid | adnum |                                                                        adbin                                                      

--------+---------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------
 346434 |  346431 |     3 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 42 0 0 0 0 0 0 0 ]}
(1 row)

Time: 0.413 ms

This is used for future inserts, but existing rows show the same value, which was virtually set in the catalog to avoid re-writing all rows during the ALTER TABLE statement:

postgres=# select * from iloveddl where id <2;

 id | x | y  
----+---+----
  1 |   | 42
(1 row)

Time: 189.657 ms

You can change the default value for future inserts without altering the existing data or the current value of the column. Here is an example:

postgres=# alter table iloveddl alter column y set default 100;

ALTER TABLE
Time: 2.039 ms
postgres=#

postgres=# insert into iloveddl(id) values (-1);

INSERT 0 1
Time: 1.516 ms

postgres=# select * from iloveddl where id <2;

 id | x |  y  
----+---+-----
  1 |   |  42
 -1 |   | 100
(2 rows)

Time: 207.727 ms

This change affects only new rows. For example, after setting the default to 100, inserting a row without specifying "y" will automatically assign 100. It does not alter existing rows. For instance, the rows that existed before adding the column will still show their previous "y" values, like 42.

The default value has been changed in pg_attrdef, to 100, which proves that the value for existing rows, 42, is stored elewhere:

postgres=# select * from pg_attrdef where adrelid='iloveddl'::regclass;

  oid   | adrelid | adnum |                                                                         adbin                                                                         
--------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------
 346435 |  346431 |     3 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 100 0 0 0 0 0 0 0 ]}
(1 row)

Time: 0.568 ms

PostgreSQL appears to set columns with default values as if all rows have that value, but this is done logically rather than physically. It stores extra information in pg_attribute to assign a default value for columns that are missing when reading a table row, ensuring consistent behavior without physically altering each row:

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;  

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
 x        |      2 | f          | f         | f             | 
 y        |      3 | f          | t         | t             | {42}
(9 rows)

Time: 0.529 ms

When PostgreSQL reads a row from a table and finds that the column "y" does not exist, it checks the null bitmap. If "y" is absent there as well, PostgreSQL adds the column to the result with the default value stored in the catalog as attmissingval.

This optimization is only applicable when the default value is a constant. For scenarios like adding an expiration date to passwords—such as forcing users to change their password annually—it makes sense to set a default for existing users to the next year from the current date. This works only if there's the same expiration date for all users.

NOT NULL

It is possible to add a non-nullable column, but a default value is required. Otherwise, it would result in a null in a non-nullable column:

alter table iloveddl add column z int not null
;

ERROR:  column "z" of relation "iloveddl" contains null values
Time: 1.024 ms

alter table iloveddl add column z int not null default 42
;

ALTER TABLE
Time: 2.322 ms

This was quick, a metadata-only change. With NOT NULL DEFAULT, either the value is set physically in the row, or it is absent and the value comes from the catalog:

select * from pg_attrdef 
where adrelid='iloveddl'::regclass
;

  oid   | adrelid | adnum |                                                                         adbin                                                                         
--------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------
 346435 |  346431 |     3 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 100 0 0 0 0 0 0 0 ]}
 346436 |  346431 |     4 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 42 0 0 0 0 0 0 0 ]}
(2 rows)

Time: 0.503 ms

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
 x        |      2 | f          | f         | f             | 
 y        |      3 | f          | t         | t             | {42}
 z        |      4 | t          | t         | t             | {42}
(10 rows)

Time: 0.465 ms

If I insert a new row without mentionning the new column, it is set with the current default value from pg_attrdef, and stored with it. If I query a row that was inserted before, it shows the value from pg_attribute.attmissingval:

insert into iloveddl(id,y) values (-2,null)
;
INSERT 0 1
Time: 8.692 ms

select * from iloveddl where id <2
;
 id | x |  y  | z  
----+---+-----+----
  1 |   |  42 | 42
 -1 |   | 100 | 42
 -2 |   |     | 42
(3 rows)

Time: 195.648 ms

Summary of PostgreSQL behavior

The DEFAULT clause used in an ALTER TABLE ADD COLUMN statement serves two different purposes:

  • Schema on write: When new rows are inserted without specifying a value for this column, the DEFAULT value is automatically assigned, functioning similarly to a trigger or a stored generated column.
  • Schema on read: When querying rows that lack a value in this column, the DEFAULT value appears in the result set, similar to a view or a virtual column.

In PostgreSQL, there are three cases when reading a row. First, if a column is present in the null bitmap and flagged as null, the value returned is NULL. Second, if the column is present and flagged as not null, the actual value is returned. Lastly, if the column is not present in the null bitmap, the system returns a predefined value called attmissingval.

Comparison with MongoDB

In MongoDB, some part of the document structure can be defined using indexes and schema validation (schema-on-write). However, the application is free to add new fields to documents without impacting other objects, and interpret it in the application code (schema-on-read).

When a default value is needed, it is the application's responsibility to write it, rather than relying on a rule deployed in the database. If the absence of a field should be interpreted as another value, the application handles this with application code. In general, the logic is more complex than a single constant for all existing data, and may depend on other fields. Aggregation pipeline can code the logic so that it runs efficiently in the database.

I create a collection similar to my PostgreSQL table:

for (let i = 1; i <= 10_000_000; i += 1000)
 db.iloveddl.insertMany(Array.from(
  {length: 1000},
  (_,k) => ({_id: i+k})
 ))
;

Here is how the first document looks like:

db.iloveddl.find({_id:{$lt:2}})

[ { _id: 1 } ]

The equivalent of ADD COLUMN NULL with no default is simply defining the field for new documents when needed, with nothing to do on existing documents because in MongoDB an inexisting field is the same as null.

New documents can set the field and if there is a default value, the application will set it:

db.iloveddl.insertOne({ _id: -1, y: 100 })  

{ acknowledged: true, insertedId: -1 }

db.iloveddl.find({_id:{$lt:2}})

[ { _id: -1, y: 100 }, { _id: 1 } ]

SQL was designed primarily for non-programmers using the command line to manipulate data. In this context, it is important to deploy defaults and check constraints directly into the database. MongoDB is designed for databases where data manipulation is handled through application code, and developers prefer to centralize all logic within the application, where it can be reviewed and tested more easily. Typically, the document inserted into MongoDB comes from an application object, and defaults were set in the constructor.

The equivalent of ADD COLUMN NULL with DEFAULT is like in PostgreSQL, adding the field on read, except that it is done by the application code rather than a declaration in the database catalog:

db.iloveddl.aggregate([
  { $sort: { _id:1 } },
  { $project: {
       _id: 1,
        y: { $ifNull: [ "$y", 42 ] }
  } }
])

[
  { _id: -1, y: 100 }, { _id: 1, y: 42 },
  { _id: 2, y: 42 },   { _id: 3, y: 42 },
  { _id: 4, y: 42 },   { _id: 5, y: 42 },
  { _id: 6, y: 42 },   { _id: 7, y: 42 },
  { _id: 8, y: 42 },   { _id: 9, y: 42 },
  { _id: 10, y: 42 },  { _id: 11, y: 42 },
  { _id: 12, y: 42 },  { _id: 13, y: 42 },
  { _id: 14, y: 42 },  { _id: 15, y: 42 },
  { _id: 16, y: 42 },  { _id: 17, y: 42 },
  { _id: 18, y: 42 },  { _id: 19, y: 42 }
]
Type "it" for more

If the logic for existing documents is more complex, like depending on other fields, it can be done the same way:

db.iloveddl.aggregate([  
  { $sort: { _id: 1 } },  
  { $project: {  
      _id: 1,  
      y: {  
        $ifNull: [  
          "$y",  
          {  
            $cond: [  
              { $eq: [ { $mod: [ "$_id", 2 ] }, 0 ] },   
              42,  // set 42 to "y" when "_id" is even
              99   // set 42 to "y" when "_id" is odd
            ]  
          }  
        ]  
      }  
    }  
  }  
])  

Like in PostgreSQL, such schema-on-read transformation can be implemented in the database as a view:

db.createView(  
  "iloveddl_conditional_y",   // Name of the view  
  "iloveddl",                 // Source collection  
                                    
                                    
                                    
                                    
                                

ALTER TABLE ... ADD COLUMN

MongoDB’s flexible schema allows each document within a collection to have a unique structure, a feature known as polymorphism. This contrasts with SQL databases, where every row in a table must adhere to a predefined set of columns. To support polymorphic data models without multiplying tables, SQL schemas often include many optional columns, which are frequently left null. These null values usually signify "not applicable" rather than "unknown". Furthermore, unused columns must be explicitly defined in the database schema, and modifying this structure typically requires locking the table.

This rigidity is often a point of comparison, favoring MongoDB, where applications can introduce new schemas without affecting existing objects. A common example was the ALTER TABLE ADD COLUMN statement, which locks the table because DDL operations must modify the catalog information shared by all table rows. It was often mentionned to illustrate the rigidity of RDBMS. When updating existing rows, this lock could last a long time, causing significant delays.
However, while many ALTER TABLE operations still require rewriting the table, such as changing a data type, adding a column is no longer one of them. Most SQL databases now optimize adding a column as a metadata-only operation, making it faster and more efficient than before. The main difference with MongoDB is how this change is controlled by the database administrator or the application developer.

This blog post explains that mentioning ALTER TABLE ADD COLUMN to showcase schema flexibility is not ideal because it has been optimized in many RDBMS. It is the occasion to explain how it works internally in PostgreSQL, and that it is similar to what developers do with MongoDB.

Example on PostgreSQL

Adding a column requires an exclusive lock, but since PostgreSQL 11 this lock duration is now very short (if it doesn't have itself to wait on another lock).

For example, I created a table with ten million rows:

\timing on
Timing is on.

create table iloveddl ( id bigint );
CREATE TABLE
Time: 2.026 ms

insert into iloveddl select generate_series(1,1e7);
INSERT 0 10000000
Time: 31328.019 ms (00:31.328)

PostgreSQL updates are more costly than inserts, so rewriting a table can take several minutes. I use this to verify that my ALTER TABLE statements do not trigger a re-write of the rows.

Here is the first row in the table:

select * from iloveddl where id <2;
 id 
----
  1
(1 row)

Time: 839.856 ms

The column information is stored in the catalog and can be viewed in the pg_attribute table. To retrieve details such as the column name, number, nullability, and nullable or missing value, you can run the following SQL query:

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;                                         

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
(7 rows)

Time: 0.569 ms

In addition to the system columns, there is an entry for the single column I defined in CREATE TABLE: "id". All flags are set to false: this column is nullable, has no default value, and lacks a value to replace a missing one. In SQL databases, a missing value, indicating no value exists, is different from a null, which signifies an unknown value. The missing value flag is used to optimize adding columns with default values without needing to rewrite all rows.

NULL with no DEFAULT

To add a nullable column without a default value in PostgreSQL, the following SQL command adds a new column named "x" of type integer to the "iloveddl" table. Since no default value is specified and the column is nullable by default, it will contain NULL for existing rows:

alter table iloveddl add column x int;

ALTER TABLE
Time: 1.760 ms

This was a quick operation, involving only metadata, without rewriting the rows. Existing rows have no information about this new column, but on query, PostgreSQL adds this column to the result with a null value:

select * from iloveddl where id <2;

 id | x 
----+---
  1 |  
(1 row)

Time: 206.647 ms

This is a common scenario where the SQL semantics are straightforward: when a new column is added, it logically exists for all rows, but its value is unknown for those created before the column addition. Unknown values are represented as NULL. Prior to the ALTER TABLE command, the column did not exist. Afterward, it exists with an unknown value.

Although the ALTER TABLE ADD COLUMN operation affects all rows logically, it doesn't need to modify physical rows. Instead, the database catalog marks the existence of the new column for all rows, and at runtime, a physical non-existence is interpreted as a logical NULL.

PostgreSQL has simply added the column in the dictionary, as nullable (attnotnull):

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;  

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
 x        |      2 | f          | f         | f             | 
(8 rows)

Time: 0.510 ms

With a simple insert into the catalog, PostgreSQL changed the definition of all rows, from inexisting "x" to existing with unknown value.
For example, in a small business scenario, you may have stored only customers' names and emails. If you then decide to add a 'date of birth' column, this information was likely always existing but previously unrecorded. After the addition, the date of birth for existing customers will appear as NULL, indicating that the value is unknown.

NULL with DEFAULT value

SQL allows an insert to omit a column, setting its value to NULL by default, which indicates the value is unknown at insert time. However, SQL developers can specify that omitted columns should instead take a default value. The column remains nullable, meaning it can be explicitly set to NULL in an insert statement, but the absence of a value is different than a null assignment, and must set the default value.

When adding a column with a default value in PostgreSQL, existing rows are treated as if they were inserted on the table with the new column, but unspecified by the insert statement. The new column must now return the default value for the existing rows. Before PostgreSQL 11, the ALTER TABLE command had to write this default value into every row, which could be slow. This limitation was used to illustrate the rigidity of schema changes in SQL databases.

Currently, this particular case is optimized and performs quickly in the latest versions:

alter table iloveddl add column y int null default 42;

ALTER TABLE
Time: 2.802 ms

The default value used for future inserts is stored in the pg_attrdef table. You can retrieve this information with the following SQL query:

select * from pg_attrdef 
 where adrelid='iloveddl'::regclass
;

  oid   | adrelid | adnum |                                                                        adbin                                                      

--------+---------+-------+------------------------------------------------------------------------------------------------------------------------------------------------------
 346434 |  346431 |     3 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 42 0 0 0 0 0 0 0 ]}
(1 row)

Time: 0.413 ms

This is used for future inserts, but existing rows show the same value, which was virtually set in the catalog to avoid re-writing all rows during the ALTER TABLE statement:

select * from iloveddl where id <2;

 id | x | y  
----+---+----
  1 |   | 42
(1 row)

Time: 189.657 ms

You can change the default value for future inserts without altering the existing data or the current value of the column. Here is an example:

alter table iloveddl alter column y set default 100;

ALTER TABLE
Time: 2.039 ms

insert into iloveddl(id) values (-1);

INSERT 0 1
Time: 1.516 ms

select * from iloveddl where id <2;

 id | x |  y  
----+---+-----
  1 |   |  42
 -1 |   | 100
(2 rows)

Time: 207.727 ms

This change affects only new rows. For example, after setting the default to 100, inserting a row without specifying "y" will automatically assign 100. It does not alter existing rows. For instance, the rows that existed before adding the column will still show their previous "y" values, like 42.

The default value has been changed in pg_attrdef, to 100, which proves that the value for existing rows, 42, is stored elewhere:

select * from pg_attrdef 
 where adrelid='iloveddl'::regclass
;

  oid   | adrelid | adnum |                                                                         adbin                                                                         
--------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------
 346435 |  346431 |     3 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 100 0 0 0 0 0 0 0 ]}
(1 row)

Time: 0.568 ms

PostgreSQL appears to set columns with default values as if all rows have that value, but this is done logically rather than physically. It stores extra information in pg_attribute to assign a default value for columns that are missing when reading a table row, ensuring consistent behavior without physically altering each row:

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;  

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
 x        |      2 | f          | f         | f             | 
 y        |      3 | f          | t         | t             | {42}
(9 rows)

Time: 0.529 ms

When PostgreSQL reads a row from a table and finds that the column "y" does not exist, it checks the null bitmap. If "y" is absent there as well, PostgreSQL adds the column to the result with the default value stored in the catalog as attmissingval.

This optimization is only applicable when the default value is a constant. For scenarios like adding an expiration date to passwords—such as forcing users to change their password annually—it makes sense to set a default for existing users to the next year from the current date. This works only if there's the same expiration date for all users.

NOT NULL

It is possible to add a non-nullable column, but a default value is required. Otherwise, it would result in a null in a non-nullable column:

alter table iloveddl add column z int not null
;

ERROR:  column "z" of relation "iloveddl" contains null values
Time: 1.024 ms

alter table iloveddl add column z int not null default 42
;

ALTER TABLE
Time: 2.322 ms

This was quick, a metadata-only change. With NOT NULL DEFAULT, either the value is set physically in the row, or it is absent and the value comes from the catalog:

select * from pg_attrdef 
where adrelid='iloveddl'::regclass
;

  oid   | adrelid | adnum |                                                                         adbin                                                                         
--------+---------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------
 346435 |  346431 |     3 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 100 0 0 0 0 0 0 0 ]}
 346436 |  346431 |     4 | {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue 4 [ 42 0 0 0 0 0 0 0 ]}
(2 rows)

Time: 0.503 ms

select attname, attnum, attnotnull, atthasdef, atthasmissing, attmissingval
 from pg_attribute
 where attrelid='iloveddl'::regclass
;

 attname  | attnum | attnotnull | atthasdef | atthasmissing | attmissingval 
----------+--------+------------+-----------+---------------+---------------
 tableoid |     -6 | t          | f         | f             | 
 cmax     |     -5 | t          | f         | f             | 
 xmax     |     -4 | t          | f         | f             | 
 cmin     |     -3 | t          | f         | f             | 
 xmin     |     -2 | t          | f         | f             | 
 ctid     |     -1 | t          | f         | f             | 
 id       |      1 | f          | f         | f             | 
 x        |      2 | f          | f         | f             | 
 y        |      3 | f          | t         | t             | {42}
 z        |      4 | t          | t         | t             | {42}
(10 rows)

Time: 0.465 ms

If I insert a new row without mentionning the new column, it is set with the current default value from pg_attrdef, and stored with it. If I query a row that was inserted before, it shows the value from pg_attribute.attmissingval:

insert into iloveddl(id,y) values (-2,null)
;
INSERT 0 1
Time: 8.692 ms

select * from iloveddl where id <2
;
 id | x |  y  | z  
----+---+-----+----
  1 |   |  42 | 42
 -1 |   | 100 | 42
 -2 |   |     | 42
(3 rows)

Time: 195.648 ms

Summary of PostgreSQL behavior

The DEFAULT clause used in an ALTER TABLE ADD COLUMN statement serves two different purposes:

  • Schema on write: When new rows are inserted without specifying a value for this column, the DEFAULT value is automatically assigned, functioning similarly to a trigger or a stored generated column.
  • Schema on read: When querying rows that lack a value in this column, the DEFAULT value appears in the result set, similar to a view or a virtual column.

In PostgreSQL, there are three cases when reading a row. First, if a column is present in the null bitmap and flagged as null, the value returned is NULL. Second, if the column is present and flagged as not null, the actual value is returned. Lastly, if the column is not present in the null bitmap, the system returns a predefined value called attmissingval.

Comparison with MongoDB

In MongoDB, some part of the document structure can be defined using indexes and schema validation (schema-on-write). However, the application is free to add new fields to documents without impacting other objects, and interpret it in the application code (schema-on-read).

When a default value is needed, it is the application's responsibility to write it, rather than relying on a rule deployed in the database. If the absence of a field should be interpreted as another value, the application handles this with application code. In general, the logic is more complex than a single constant for all existing data, and may depend on other fields. Aggregation pipeline can code the logic so that it runs efficiently in the database.

I create a collection similar to my PostgreSQL table:

for (let i = 1; i <= 10_000_000; i += 1000)
 db.iloveddl.insertMany(Array.from(
  {length: 1000},
  (_,k) => ({_id: i+k})
 ))
;

Here is how the first document looks like:

db.iloveddl.find({_id:{$lt:2}})

[ { _id: 1 } ]

The equivalent of ADD COLUMN NULL with no default is simply defining the field for new documents when needed, with nothing to do on existing documents because in MongoDB an inexisting field is the same as null.

New documents can set the field and if there is a default value, the application will set it:

db.iloveddl.insertOne({ _id: -1, y: 100 })  

{ acknowledged: true, insertedId: -1 }

db.iloveddl.find({_id:{$lt:2}})

[ { _id: -1, y: 100 }, { _id: 1 } ]

SQL was designed primarily for non-programmers using the command line to manipulate data. In this context, it is important to deploy defaults and check constraints directly into the database. MongoDB is designed for databases where data manipulation is handled through application code, and developers prefer to centralize all logic within the application, where it can be reviewed and tested more easily. Typically, the document inserted into MongoDB comes from an application object, and defaults were set in the constructor.

The equivalent of ADD COLUMN NULL with DEFAULT is like in PostgreSQL, adding the field on read, except that it is done by the application code rather than a declaration in the database catalog:

db.iloveddl.aggregate([
  { $sort: { _id:1 } },
  { $project: {
       _id: 1,
        y: { $ifNull: [ "$y", 42 ] }
  } }
])

[
  { _id: -1, y: 100 }, { _id: 1, y: 42 },
  { _id: 2, y: 42 },   { _id: 3, y: 42 },
  { _id: 4, y: 42 },   { _id: 5, y: 42 },
  { _id: 6, y: 42 },   { _id: 7, y: 42 },
  { _id: 8, y: 42 },   { _id: 9, y: 42 },
  { _id: 10, y: 42 },  { _id: 11, y: 42 },
  { _id: 12, y: 42 },  { _id: 13, y: 42 },
  { _id: 14, y: 42 },  { _id: 15, y: 42 },
  { _id: 16, y: 42 },  { _id: 17, y: 42 },
  { _id: 18, y: 42 },  { _id: 19, y: 42 }
]
Type "it" for more

If the logic for existing documents is more complex, like depending on other fields, it can be done the same way:

db.iloveddl.aggregate([  
  { $sort: { _id: 1 } },  
  { $project: {  
      _id: 1,  
      y: {  
        $ifNull: [  
          "$y",  
          {  
            $cond: [  
              { $eq: [ { $mod: [ "$_id", 2 ] }, 0 ] },   
              42,  // set 42 to "y" when "_id" is even
              99   // set 42 to "y" when "_id" is odd
            ]  
          }  
        ]  
      }  
    }  
  }  
])  

Like in PostgreSQL, such schema-on-read transformation can be implemented in the database as a view:

db.createView(  
  "iloveddl_conditional_y",   // Name of the view  
  "iloveddl",                 // Source collection  
  [  
    { $sort: { _id: 1 } },  
    { $project: {   
        _id: 1,  
        y: {  
          $ifNull: [  
            "$y
                                    
                                    
                                    
                                    
                                

Caching

Every time you use a computer, the cache is working to ensure your experience is fast.