October 31, 2025
Down with template (or not)!
Down with template (or not)!
If you are one of the few people in the world who, like me, actively follow the progress of the C++ standardization, you know that the C++ standard evolves significantly from each version to the next. The proposals often even have funny names like “I Stream, You Stream, We All Stream for istream_iterator”, “More trailing commas”, “std::array is a wrapper for an array!”, or “Down with typename!”.
When I first saw the paper “Down with typename!”, I thought that made a lot of sense. This change in C++20 allows you to write static_cast<std::vector<T>::value_type>(...) instead of static_cast<typename std::vector<T>::value_type>(...). Nice!
Recently I stumbled upon a similar weird-looking and unnecessary keyword in our code base:
template <typename T, typename F>
struct JoinFilterHasher {
template <typename S>
static uint64_t hashDict(const byte* data, unsigned index, const byte* arg) {
const T* dict = unalignedLoad<const T*>(arg);
const S pos = *reinterpret_cast<const S*>(data + index * sizeof(S));
return F::template hash<type>(dict + pos, nullptr);
// ^^^^^^^^ what is this???
}
};
When I saw this, I immediately thought that maybe the time had come for a new paper titled “Down with template!”. Unfortunately, after I fell into the rabbit hole of better understanding the template keyword, I understood why removing it isn’t (easily) possible.
However, I found an absolutely crazy workaround. Don’t try this at home kids! Read on and follow me through this journey of C++ template madness!
The template Keyword
The template keyword itself is not something that will surprise any C++ programmer. Every time you define a template, you’ll have to use the template keyword. Pretty obvious.
However, there are also cases where you’ll have to tell the compiler whether a name refers to a template by prefixing the name with the template keyword. Usually, you don’t do that and just write std::vector<int> but if you really wanted, you could also write std::template vector<int>!
Why would you ever need that? To explain this, let’s look at this simplified example:
struct Foo {
template <int I>
static int myfunc(int a) { return a + I; }
};
template <typename T>
int genericFoo(int a) {
return T::myfunc<42>(a);
}
int test() {
return genericFoo<Foo>(1);
}
This code snippet does not compile! The reason is that in genericFoo the type T is generic, i.e., T is only known when you instantiate genericFoo with a specific type. So, the compiler can’t know in advance whether T::myfunc ends up referring to a member variable, a member function, a nested type specification, or a template (in C++ standardese this is called a “dependent name”). But it needs to know what T::myfunc is in order to correctly parse the body of genericFoo. So, you have to use the template keyword to make this unambiguous.
Any good compiler will tell you that you should write T::template myfunc to fix the compilation. If the compiler already knows, why can’t we just get rid of the template keyword for this altogether?
Down with template?
In order to understand why we need the template keyword, let’s take a closer look at the compiler output:
<source>: In function 'int genericFoo(int)':
<source>:8:15: warning: expected 'template' keyword before dependent template name [-Wmissing-template-keyword]
8 | return T::myfunc<42>(a);
| ^~~~~~
| template
Ok, as I said, the compiler knows exactly that we need to add the template keyword. But wait a minute: This is just a compiler warning and not an error. Huh? If it’s just a warning, why doesn’t this code compile anyway? Let’s continue reading the compiler messages:
<source>:8:21: error: invalid operands of types '<unresolved overloaded function type>' and 'int' to binary 'operator<'
8 | return T::myfunc<42>(a);
| ~~~~~~^~~
Here we found the actual compilation error. But it doesn’t say anything about templates at all. Instead, it mentions an “unresolved operator< ”? What is going on here?!
It turns out, if you don’t prefix such a dependent name with the template keyword, the compiler treats it as a regular variable. So, what the compiler actually sees is this, broken down by tokens:
return // return keyword
T // name referring to a type
:: // scope resolution operator
// !!! Here's the interesting part !!!
myfunc // name referring to a variable
< // operator<
42 // integer literal
> // operator>
// !!! end of interesting part !!!
( // opening parenthesis (not a function call!)
a // name referring to a variable
) // closing parenthesis
; // semicolon
So, the compiler doesn’t see any template at all, just the variables myfunc and a and the constant 42 that are being compared using < and >. This is an equivalent statement, just formatted and parenthesized differently:
return (T::myfunc < 42) > a;
Since the compiler can’t know whether you just wanted a weirdly formatted comparison or an actual template, you need to specify that you’re actually dealing with a template using the template keyword.
Case closed. Or, is it?
Template Madness
We know that syntactically the code without the template keyword is correct. It just parses it as comparison operators instead of template arguments. So, can we somehow find a way to make this compile without changing the genericFoo function or the Foo class?
The answer is: Yes, but only if you’re willing to give up your sanity, sacrifice all your RAM to your compiler, and generally don’t care about runtime performance at all. Sounds good?
To solve this, we need to write code that can handle being compared using < and > and somehow translates this again into an application of template arguments.
Since we don’t want to change Foo , we’ll create a new class that will contain our evil hacks called DownWithTemplate. Our goal is the following: Write DownWithTemplate so that calling genericFoo<DownWithTemplate>(1) is equivalent to a fixed version of genericFoo<Foo>(1) where we add the template keyword as suggested by the compiler.
Operator Overloading
Let’s start with the easy part: overloading operator< and operator>. We know that genericFoo wants to access a value called myfunc , so myfunc will be a static member variable whose type overloads operator<:
struct DownWithTemplate {
/* ... */
struct MyFunc {
OtherHalf operator<(int i) const {
return OtherHalf{i};
}
};
static constexpr MyFunc myfunc{};
};
The value that operator< returns needs to implement the other half of our hack, namely it needs to overload operator>:
struct DownWithTemplate {
struct OtherHalf {
int value;
int operator>(int i) const {
return /* ??? */;
}
};
/* ... */
};
In OtherHalf::operator> we now have the two values of our expression: value contains the “template argument” (42 in our implementation of genericFoo ) and i gets the value of a in genericFoo.
Now, how do we call Foo::myfunc? Ideally we would like to just write Foo::myfunc<value>(i). If we do that, we’ll get the following compiler error:
<source>:8:16: note: template argument deduction/substitution failed:
<source>:27:32: error: '*(const DownWithTemplate::OtherHalf*)this' is not a constant expression
27 | return Foo::myfunc<value>(i);
| ^~~~~
Since value is not a compile time constant, we can’t pass it as a template argument which must be known at compile time.
Runtime Template Arguments (or: Inner Circle of C++ Template Hell)
How do we bridge the gap between a value only known at runtime and a template that needs to know the value at compile time? There is no easy fix. Conceptually, a template must know its template arguments at compile time but a runtime value obviously is only known at runtime.
So, if we don’t know the template argument, we’ll just have to select the correct template at runtime. For this to work, we’ll have to generate all possible template instantiations with all possible values as template arguments.
This is what we want to implement (I hope you never want to do this):
switch (value) {
case 0: return Foo::myfunc<0>(i);
case 1: return Foo::myfunc<1>(i);
/* ... */
case 2147483647: return Foo::myfunc<2147483647>(i);
case -1: return Foo::myfunc<-1>(i);
case -2: return Foo::myfunc<-2>(i);
/* ... */
case -2147483648: return Foo::myfunc<-2147483648>(i);
}
Obviously, we don’t want to write this code manually. So, let’s see what the tool box of C++ templates gives us to help us out here: We’ll use std::integer_sequence to generate all possible integer values at compile time and we’ll use something called “Fold expressions” that work on “Parameter packs” to generate the code for all cases automatically.
If you’ve never heard of these three C++ features, that’s probably a good sign. Your colleagues will be thankful to never have to review code using them!
Anyway, there’s no easy way to prepare you for this, so I’ll just show you the code in all it’s C++ template hack ugliness and explain afterwards:
struct DownWithTemplate {
template <int... Is>
static int callMyFunc(std::integer_sequence<int, Is...>, int a, int b) {
return (((Is == a) ? Foo::myfunc<Is>(b) : 0) + ...) +
(((-Is-1 == a) ? Foo::myfunc<-Is-1>(b) : 0) + ...);
}
struct OtherHalf {
int value;
int operator>(int i) const {
return callMyFunc(std::make_integer_sequence<int, std::numeric_limits<int>::max()>{}, value, i);
}
};
/* ... */
};
First, we create all possible positive integers using std::make_integer_sequence. There is no equivalent template that gives you all negative integers, so we’ll just go through the range of all positive integers twice and negate the values once.
Unfortunately, fold expressions are just that — expressions —, and not statements. So, we can’t write a real switch case statement. What we’ll do instead is to write a long list of additions like this:
((value == 0) ? Foo::myfunc<0>(i) : 0) +
((value == 1) ? Foo::myfunc<1>(i) : 0) +
/* ... */
So, we add zero for all “cases” that don’t match and only call Foo::myfunc for the correct value. The exact syntax of fold expressions is very weird so you’ll just have to trust me that the code above is equivalent to this sum.
The Fallout
Let us roughly estimate how much work the compiler will have to do: We have 2^32 different possible templates. Each template instantiation contains at least a new Foo::myfunc expression. In Clang, a C++ function template instantiation (the class FunctionTemplateSpecializationInfo) uses at least 32 bytes of memory. So, at a minimum, the compiler would need 2^32*32 bytes = 128 GiB of memory to compile our code!
You can quickly confirm this by trying to compile this program:
clang++ -c -std=c++23 -O0 ./down_with_template.cpp
On my machine, this quickly leads to furious swapping of memory and eventually the OOM killer killing the compiler process (and a bunch of other processes as well). Don’t try this at home!
If you try to compile the same example with a 16-bit integer, Clang will not eat all your RAM but it will refuse to compile the program citing a “maximum nesting level for fold expressions”.
I tried compiling it with gcc on our largest machine as well: After consuming over 300 GiB of RAM, the OOM killer also got to gcc.
For now, our code only compiles if you are using 8-bit integers. You can find the full code at the bottom.
Conclusion
If you see the template keyword in an unexpected location in C++, you now know that it’s there to disambiguate between template arguments and comparison operators!
Still, C++ allows you to employ evil template hacks to work around this. If you are willing to sacrifice all of your sanity and RAM, you’ll be able to get rid of the template keyword!
Appendix
Benchmark Code: Vectorized Sums
#include <limits>
#include <utility>
// DANGER: changing this to any larger type will make the compiler eat all of your RAM!
using Int = signed char;
struct Foo {
template <Int I>
static Int myfunc(Int a) { return a + I; }
};
template <typename T>
Int genericFoo(Int a) {
return T::myfunc<42>(a);
}
struct DownWithTemplate {
template <Int... Is>
static Int callMyFunc(std::integer_sequence<Int, Is...>, Int a, Int b) {
return (((Is == a) ? Foo::myfunc<Is>(b) : 0) + ...) +
(((-Is-1 == a) ? Foo::myfunc<-Is-1>(b) : 0) + ...);
}
struct OtherHalf {
Int value;
Int operator>(Int i) const {
return callMyFunc(std::make_integer_sequence<Int, std::numeric_limits<Int>::max()>{}, value, i);
}
};
struct MyFunc {
OtherHalf operator<(Int i) const {
return OtherHalf{i};
}
};
static constexpr MyFunc myfunc{};
};
Int test() {
return genericFoo<DownWithTemplate>(1);
}
October 30, 2025
$5 PlanetScale
October 29, 2025
AI-powered tuning tools for Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL databases: PI Reporter
October 28, 2025
Migrate Oracle reference partitioned tables to Amazon RDS or Aurora PostgreSQL with AWS DMS
Practical Data Masking in Percona Server for MySQL 8.4
October 27, 2025
Optimize and troubleshoot database performance in Amazon Aurora PostgreSQL by analyzing execution plans using CloudWatch Database Insights
Troubleshooting PostgreSQL Logical Replication, Working with LSNs
October 23, 2025
How efficient is RocksDB for IO-bound, point-query workloads?
How efficient is RocksDB for workloads that are IO-bound and read-only? One way to answer this is to measure the CPU overhead from RocksDB as this is extra overhead beyond what libc and the kernel require to perform an IO. Here my focus is on KV pairs that are smaller than the typical RocksDB block size that I use -- 8kb.
By IO efficiency I mean: (storage read IOPs from RocksDB benchmark / storage read IOPs from fio)
And I measure this in a setup where RocksDB doesn't get much benefit from RocksDB block cache hits (database size > 400G, block cache size was 16G).
This value will be less than 1.0 in such a setup. But how much less than 1.0 will it be? On my hardware the IO efficiency was ~0.85 at 1 client and 0.88 at 6 clients. Were I to use storage that had a 2X larger storage latency then the IO efficiency would be closer to 0.95.
Note that:
- IO efficiency increases (decreases) when SSD read latency increases (decreases)
- IO efficiency increases (decreases) when the RocksDB CPU overhead decreases (increases)
- RocksDB QPS increases by ~8% for IO-bound workloads when --block_align is enabled
- about 11 microseconds from libc + kernel
- between 6 and 10 microseconds from RocksDB
- between 100 and 150 usecs of IO latency from SSD per iostat
Q and A
A: About 10 microseconds on this CPU.
A: Yes, you can
A: It depends on how many features you need and the opportunity cost in spending time writing that code vs doing something else.
A: That is for them to answer. But all projects have a complexity budget. Code can become too expensive to maintain when that budget is exceeded. There is also the opportunity cost to consider as working on this delays work on other features.
I ran tests on a Beelink SER7 with a Ryzen 7 7840HS CPU that has 8 cores and 32G of RAM. The storage device a Crucial is CT1000P3PSSD8 (Crucial P3, 1TB) using ext-4 with discard enabled. The OS is Ubuntu 24.04.
From fio, the average read latency for the SSD is 102 microseconds using O_DIRECT with io_depth=1 and the sync engine.
CPU frequency management makes it harder to claim that the CPU runs at X GHz, but the details are:
$ cpupower frequency-info
driver: acpi-cpufreq
CPUs which run at the same hardware frequency: 5
CPUs which need to have their frequency coordinated by software: 5
maximum transition latency: Cannot determine or is not supported.
hardware limits: 1.60 GHz - 3.80 GHz
available frequency steps: 3.80 GHz, 2.20 GHz, 1.60 GHz
available cpufreq governors: conservative ... powersave performance schedutil
current policy: frequency should be within 1.60 GHz and 3.80 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware
current CPU frequency: 3.79 GHz (asserted by call to kernel)
boost state support:
Supported: yes
Active: no
Results from fio
I started with fio using a command-line like the following for NJ=1 and NJ=6 to measure average IOPs and the CPU overhead per IO.
--buffered=0 --direct=1 \
--bs=8k \
--size=400G \
--randrepeat=0 \
--runtime=600s --ramp_time=1s \
--filename=G_1:G_2:G_3:G_4:G_5:G_6:G_7:G_8 \
--group_reporting
Results are:
* iops - average reads/s reported by fio
* usPer, syPer - user, system CPU usecs per read
* lat.us - average read latency in microseconds
* numjobs - the value for --numjobs with fio
iops usPer syPer cpuPer lat.us numjobs
9884 1.351 9.565 10.916 101.61 1
43782 1.379 10.642 12.022 136.35 6
Results from RocksDB
I used an edited version of my benchmark helper scripts that run db_bench. In this case the sequence of tests was:
- fillseq - loads the LSM tree in key order
- revrange - I ignore the results from this
- overwritesome - overwrites 10% of the KV pairs
- flush_mt_l0 - flushes the memtable, waits, compacts L0 to L1, waits
- readrandom - does random point queries when LSM tree has many levels
- compact - compacts LSM tree into one level
- readrandom2 - does random point queries when LSM tree has one level, bloom filters enabled
- readrandom3 - does random point queries when LSM tree has one level, bloom filters disabled
- IO efficiency is approximately 0.84 at 1 client and 0.88 at 6 clients
- With 1 user RocksDB adds between 6.534 and 8.330 usecs of CPU time per query compared to fio depending on the amount of work it has to do.
- With 6 users RocksDB adds between 7.287 to 9.914 usecs of CPU time per query
- IO latency as reported by RocksDB is ~20 usecs larger than as reported by iostat. But I have to re-read the RocksDB source code to understand where and how it is measured.
- IO latency is ~91 usecs per fio
- libc+kernel CPU overhead is ~11 usecs per fio
- RocksDB CPU overhead is 8.330, 7.607 and 6.534 usecs for readrandom, *2 and *3
- 9063 IOPs for readrandom, when it actually did 8350
- 9124 IOPs for readrandom2, when it actually did 8327
- 9214 IOPs for readrandom3, when it actually did 8400
Barbarians at the Gate: How AI is Upending Systems Research
This recent paper from the Berkeley Sky Computing Lab has been making waves in systems community. Of course, Aleksey and I did our live blind read of it, which you can watch below. My annotated copy of the paper is also available here.
This is a fascinating and timely paper. It raises deep questions about how LLMs will shape the research process, and how that could look like. Below, I start with a short technical review, then move to the broader discussion topics.
Technical review
The paper introduces AI-Driven Research for Systems (ADRS) framework. By leveraging the OpenEvolve framework, ADRS integrates LLMs directly into the systems research workflow to automate much of the solution-tweaking and evaluation process. As shown in Figure 3, ADRS operates as a closed feedback loop in which the LLM ensemble iteratively proposes, tests, and refines solutions to a given systems problem. This automation targets the two most labor-intensive stages of the research cycle, solution tweaking and evaluation, leaving the creative areas (problem formulation, interpreting results, and coming up with insights) untouched.
Within the inner loop, four key components work together. The Prompt Generator creates context-rich prompts that seed the LLM ensemble (Solution Generator), which outputs candidate designs or algorithms. These are then assessed by the Evaluator, a simulator or benchmark written by humans, for gathering quantitative feedback. The Solution Selector identifies the most promising variants, which are stored along with their scores in the Storage module to inform subsequent iterations. This automated loop runs rapidly and at scale, and enables exploration of large design spaces within hours rather than weeks! They applied ADRS to several systems problems, including cloud job scheduling, load balancing, transaction scheduling, and LLM inference optimization. In each case, the AI improved on prior human-designed algorithms, often within a few hours of automated search. Reported gains include up to 5x faster performance or 30–50% cost reductions compared to published baselines, which are achieved in a fraction of the time and cost of traditional research cycles.
Outside the optimization loop, the creative and difficult work happens. The scientist identifies the research problem, directs the search, and decides which hills are worth climbing. Machines handle the iterative grunt work of tweaking and testing solutions, while humans deal with abstraction, framing, and insight.
There are several other important limitations for the framework's effectiveness as well. The paper's examples mostly involve trivial correctness, and also no concurrency, security, or fault-tolerance concerns. These domains require reasoning beyond performance tuning. Another limitation is that these LLMs focus/update one component only, and can't handle system-wide interactions yet.
Simulator-based evaluation makes this approach feasible, but the systems field undervalues simulation work and this leads to limited infrastructure for automated testing. Similarly, evaluators also pose risks: poorly designed ones invite reward hacking, where LLMs exploit loopholes rather than learn real improvements. If AI-driven research is to scale, we need richer evaluators, stronger specifications, and broader respect for simulation as a first-class research tool.
Discussion topics
Here I wax philosophical on many interesting questions this work raises.
LLMs provide breadth, but research demands depth
LLMs excel at high-throughput mediocrity. By design, they replicate what has already been done, and optimize across the surface of knowledge. Research, however, advances through novelty, depth, and high-value insight.
"Research is to see what everybody else has seen, and to think what nobody else has thought."
-- Albert Szent-Györgyi (Nobel laureate)
In this sense, LLMs are not as dangerous as the "Barbarians" at the gates. They are more like "Barbies" at the gates, with gloss, confidence, and some hollowness. They may dazzle with presentation but they will lack the inner substance/insights/value that mastery, curiosity, and struggle bring.
LLMs address only the tip of the iceberg
LLMs operate on the visible tip of the research iceberg I described earlier. They cannot handle the deep layers that matter: Curiosity, Clarity, Craft, Community, Courage.
Worse, they may even erode those qualities. The danger in the short-term is not invasion, but imitation: the replacement of thought with performance, and depth with polish. We risk mistaking synthetic polish with genuine understanding.
In the long term though, I am not worried. In the long term, we are all dead.
I'm kidding, ok. In the long term, we may be screwed as well. The 2004 movie "Idiocracy" rings more true every day. I am worried that due to the inherent laziness of our nature, we may end up leaning more and more on AI to navigate literature, frame questions, or spin hypotheses, that we may not get enough chances to exercise our curiosity or improve our clarity of understanding.
LLMs are bad researchers, but can they still make good collaborators?
In our academic chat follow-up to the iceberg post, I wrote about what makes a bad researcher:
Bad research habits are easy to spot: over-competition, turf-guarding, incremental work, rigidity, and a lack of intellectual flexibility. Bad science follows bad incentives such as benchmarks over ideas, and performance over understanding. These days the pressure to run endless evaluations has distorted the research and publishing process. Too many papers now stage elaborate experiments to impress reviewers instead of illuminating them with insights. Historically, the best work always stood on its own, by its simplicity and clarity.
LLMs are bad researchers. The shoe fits.
But can they still be good collaborators? Is it still worth working with them? The hierarchy is simple:
Good collaborators > No collaborators > Bad collaborators
Used wisely, LLMs can climb high enough to reach the lowest range of the good collaborator category. If you give them bite-sized well defined work, they can reduce friction, preserve your momentum, and speed up parts of your work significantly. In a sense, they can make you technically fearless. I believe that when used for rapid prototyping, LLMs can help improve the design. And, through faster iteration, you may uncover some high-value insights.
But speed cuts both ways, because premature optimization is the root of all evil. If doing evaluations and optimizations becomes very cheap and effortless, we will more readily jump to this step, without nothing forcing us to think harder. Human brains are lazy by design. They don't want to think hard, and they will take the quick superficial route out, and we don't get to go deep.
So, we need to tread carefully here as well.
Can we scale human oversight?
The worst time I ever had as an advisor was when I had to manage 6-7 (six-seveeeen!) PhD students at once. I would much rather work with 2 sharp creative students I support myself than 50 mediocre ones handed to me for free. The former process of working is more productive and it results in deep work and valuable research. Focus is the key, and it does not scale.
The same holds for LLM-augmented research. Validation (via human focus) remains as the bottleneck. They can generate endless results, but without distilling those results into insight or wisdom, they all remain as AI slop in abundance.
Can clear insights distill without dust, tear, and sweat?
One may argue that with machines handling the grunt work, the researchers would finally get more time for thinking. Our brains are --what?-- yes, lazy. Left idle, they will scroll Reddit/Twitter rather than solve concurrency bugs.
I suspect we need some friction/irritation to nudge us to think in the background. And I suspect this is what happens when we are doing the boring work and working in the trenches. While writing a similar code snippet for the fifth time in our codebase, an optimization opportunity or an abstraction would occur to us. Very hard problems are impossible to tackle head on. Doing the legwork, I suspect we approach the problem sideways, and have a chance to make some leeway.
Yes, doing evaluation work sucks. But it is often necessary to generate the friction and space to get you think about the performance, and more importantly the logic/point of your system. Through that suffering, you gradually get transformed and enlightened. Working in the trenches, you may even realize your entire setup is flawed, and your measurements are garbage due to using closed loop clients instead of open loop ones.
What happens when we stop getting our hands dirty? We risk distilling nothing at all. Insights don't bubble up while we are sitting in comfort and scrolling cat videos. In an earlier post, Looming Liability Machines (LLMs), I argued that offloading root-cause analysis to AI misses the point. RCA isn't about assigning blame to a component. It is an opportunity to think about the system holistically, and understand it better, and improve. Outsourcing this to LLMs strike me as a very stupid thing to do. We need to keep exercising those muscles, otherwise they would atrophy alongside our understanding of the system.
What will happen to the publication process?
In his insightful blog post on this paper, Brooker concludes:
Which leads systems to a tough spot. More bottlenecked than ever on the most difficult things to do. In some sense, this is a great problem to have, because it opens the doors for higher quality with less effort. But it also opens the doors for higher volumes of meaningless hill climbing and less insight (much of which we’re already seeing play out in more directly AI-related research). Conference organizers, program committees, funding bodies, and lab leaders will all be part of setting the tone for the next decade. If that goes well, we could be in for the best decade of systems research ever. If it goes badly, we could be in for 100x more papers and 10x less insight.
Given my firm belief in human laziness, I would bet on the latter. I have been predicting the collapse of the publishing system for a decade, and the flood of LLM-aided research may finally finish the job. That might not be a bad outcome either. We are due for a better model/process anyways.
Humblytics cuts infrastructure costs 10x by migrating from Timescale to Tinybird
October 22, 2025
Advanced Query Capabilities 👉🏻 aggregation pipelines
Although MongoDB has supported ACID transactions and sophisticated aggregation features for years, certain publications still promote outdated misconceptions, claiming that only SQL databases provide robust data consistency and powerful querying capabilities. The “Benefits of Migrating” section in a spreadsheet company’s article is a recent example. It's yet another chance to learn from—and correct—misleading claims.
The claims ignore MongoDB’s advanced querying and multi-document transaction support. Written to market migration tools, this overlooks that MongoDB’s simple CRUD API is efficient for single-document tasks, and as a general-purpose database, it also offers explicit transactions and strong aggregation queries like SQL.
Enhanced Data Consistency and Reliability
The migration tool company justifies migrating by stating:
PostgreSQL’s ACID compliance ensures that all transactions are processed reliably, maintaining data integrity even in the event of system failures. This is particularly important for applications that require strong consistency, such as financial systems or inventory management.
Yes, PostgreSQL does provide ACID transactions and strong consistency, but this is mainly true for single-node deployments. In high-availability and sharded settings, achieving strong consistency and ACID properties is more complicated (see an example, and another example).
Therefore, highlighting ACID compliance as a reason to migrate from another database—when that alternative also supports ACID transactions—is not correct. For instance, single-node MongoDB has offered ACID compliance for years, and since v4.2, it supports multi-document transactions across replica sets and sharded clusters. Let's provide some syntax examples for the domains they mentioned.
Example: Financial system
Transfer $100 from Alice’s account to Bob’s account
// Initialize data
db.accounts.insertMany([
{ account_id: "A123", name: "Alice", balance: 500 },
{ account_id: "B456", name: "Bob", balance: 300 }
]);
// Start a transaciton in a session
const session = db.getMongo().startSession();
try {
accounts = session.getDatabase(db.getName()).accounts
session.startTransaction();
// Deduct $100 from Alice
accounts.updateOne(
{ account_id: "A123" },
{ $inc: { balance: -100 } }
);
// Add $100 to Bob
accounts.updateOne(
{ account_id: "B456" },
{ $inc: { balance: 100 } }
);
session.commitTransaction();
} catch (error) {
session.abortTransaction();
console.error("Transaction aborted due to error:", error);
} finally {
session.endSession();
}
Why ACID matters in MongoDB here:
- Atomicity: Deduct and credit, either both happen or neither happens.
- Consistency: The total balance across accounts remains accurate.
- Isolation: Other concurrent transfers won’t interfere mid-flight.
- Durability: Once committed, changes survive crashes.
Example: Inventory management
Selling a product and recording that sale.
try {
inventory = session.getDatabase(db.getName()).inventory
session.startTransaction();
// Reduce inventory count
inventory.updateOne(
{ product_id: "P100" },
{ $inc: { quantity: -1 } }
);
// Add a record of the sale
sales.insertOne(
{ product_id: "P100", sale_date: new Date(), quantity: 1 }
);
session.commitTransaction();
} catch (error) {
session.abortTransaction();
console.error("Transaction aborted due to error:", error);
} finally {
session.endSession();
}
ACID guarantees in MongoDB:
- No partial updates
- Inventory stays synchronized with sales records
- Safe for concurrent orders
- Durable once committed
Advanced Query Capabilities
The migration tool vendor justifies migrating by stating:
PostgreSQL offers powerful querying capabilities, including:
- Complex joins across multiple tables
- Advanced aggregations and window functions
- Full-text search with features like ranking and highlighting
- Support for geospatial data and queries These allow for more sophisticated data analysis and reporting compared to MongoDB’s more limited querying capabilities.
This completely overlooks MongoDB’s aggregation pipeline.
Complex joins
MongoDB’s $lookup stage joins collections, even multiple times if you want.
Example: Join orders with customers to get customer names.
db.orders.aggregate([
{
$lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer_info"
}
},
{ $unwind: "$customer_info" },
{
$project: {
order_id: 1,
product: 1,
"customer_info.name": 1
}
}
]);
Advanced aggregations
Operators like $group, $sum, $avg, $count handle numeric calculations with ease.
Example: Total sales amount per product.
db.sales.aggregate([
{
$group: {
_id: "$product_id",
totalRevenue: { $sum: "$amount" },
avgRevenue: { $avg: "$amount" }
}
},
{ $sort: { totalRevenue: -1 } }
]);
Window-like functions
MongoDB has $setWindowFields for operations akin to SQL window functions.
Running total of sales, sorted by date:
db.sales.aggregate([
{ $sort: { sale_date: 1 } },
{
$setWindowFields: {
sortBy: { sale_date: 1 },
output: {
runningTotal: {
$sum: "$amount",
window: { documents: ["unbounded", "current"] }
}
}
}
}
]);
Full-text search with ranking & highlighting
MongoDB supports both simple text indexes and Atlas Search (powered by Apache Lucene).
Example with Atlas Search: Search in articles and highlight matches.
db.articles.aggregate([
{
$search: {
index: "default",
text: {
query: "machine learning",
path: ["title", "body"]
},
highlight: { path: "body" }
}
},
{
$project: {
title: "1,"
score: { $meta: "searchScore" },
highlights: { $meta: "searchHighlights" }
}
}
]);
Geospatial queries
Native geospatial indexing with operators like $near.
Example: Find restaurants within 1 km of a point.
db.restaurants.createIndex({ location: "2dsphere" });
db.restaurants.find({
location: {
$near: {
$geometry: { type: "Point", coordinates: [-73.97, 40.77] },
$maxDistance: 1000
}
}
});
Conclusion
MongoDB and PostgreSQL have equivalent capabilities for ACID transactions and “advanced” queries — the difference lies in syntax and data model.
MongoDB transactions don’t rely on blocking locks. They detect conflicts and let the application wait and retry if necessary.
And instead of SQL in text strings sent to the database server to be interpreted at runtime, MongoDB uses a staged aggregation pipeline, fully integrated in your application language.
Migrating to PostgreSQL doesn’t magically grant you ACID or advanced analytics — if you’re already using MongoDB’s features, you already have them.