November 07, 2025
November 06, 2025
PostgreSQL 13 Is Reaching End of Life. The Time to Upgrade is Now!
Query Compilation Isn't as Hard as You Think
Query compilation based on the produce/consume model has a reputation for delivering high query performance, but also for being more difficult to implement than interpretation-based engines using vectorized execution. While it is true that building a production-grade query compiler with low compilation latency requires substantial infrastructure and tooling effort what is sometimes overlooked is that the vectorization paradigm is not easy to implement either, but for different reasons.
Vectorization relies on tuple-at-a-time rather than vector-at-a-time processing. High-level operations (e.g., inserting tuples into a relation) must be decomposed into per-attribute vector kernels that are then executed successively. This requires a specific way of thinking and can be quite challenging, depending on the operator. In practice, the consequence is that the range of algorithms that can be realistically implemented in a vectorized engine is limited.
In contrast, compilation is fundamentally simpler and more flexible in terms of the resulting code. It supports tuple-at-a-time processing, which makes it easier to implement complex algorithmic ideas. Developing a query processing algorithm typically involves three steps: First, implement the algorithm manually as a standalone program, i.e., outside the compiling query engine. Second, explore different algorithms, benchmark, and optimize this standalone implementation. Finally, once a good implementation has been found, modify the query compiler to generate code that closely resembles the manually developed version.
At TUM, we use a simple pedagogical query compilation framework called p2c ("plan to C") for teaching query processing. It implements a query compiler that, given a relational operator tree as input, generates the C++ code that computes the query result. The core of the compiler consists of about 600 lines of C++ code and supports the TableScan, Selection, Map, Sort, Aggregation, and Join operators.
The generated code is roughly as fast as single-threaded DuckDB and can be easily inspected and debugged, since it is simply straightforward C++ code. This makes it easy for students to optimize (e.g., by adding faster hash tables or multi-core parallelization) and to extend (e.g., with window functions).
To keep p2c simple, it does not support NULL values or variable-size data types (strings have a fixed maximum length). Compiling to template-heavy C++ results in very high compilation times, making it impractical for production use. At the same time, p2c employs the same core concepts as Hyper and Umbra, making it an excellent starting point for learning about query compilation. Note that this type of compilation to C++ can also serve as an effective prototyping platform for research that explores new query processing ideas.
Find the code for p2c on github: https://github.com/viktorleis/p2c
November 05, 2025
TLA+ Modeling of AWS outage DNS race condition
As I mentioned, the enactor process has three actions. It’s not a single atomic block but three subactions. The apply step is simple. EnactorReceiveStep(self) models an enactor taking a new plan from the shared queue. If the queue isn’t empty and the enactor is idle (pc=0), it assigns itself the plan at the head, advances its program counter to 1 (ready to apply), removes that plan from the queue, and leaves other state unchanged.
The EnactorApplyStep models how an enactor applies a plan. If the plan’s processing isn’t deleted, it makes that plan current, updates the highest plan applied, and advances its counter to 2. If the plan was deleted, it resets its state (processing and pc to 0).
The EnactorCleanupStep runs when an enactor finishes applying a plan (pc=2). It updates plan_deleted using Cleanup, by marking plans older than PLAN_AGE_THRESHOLD as deleted. It then resets that enactor’s state (processing and pc to 0), and leaves all other variables unchanged.
Next defines the system’s state transitions: either the planner generates a new plan or an enactor executes one of its steps (receive, apply, cleanup).
NeverDeleteActive asserts that the currently active plan must never be marked deleted. This invariant breaks because the enactor’s operation isn’t executed as one atomic step but split into three subactions for performance reasons. Splitting the operation allows parallelism and avoids long blocking while waiting for slow parts—such as applying configuration changes—to complete. This design trades atomicity for throughput and responsiveness.
Anyone familiar with concurrency control or distributed systems can foresee how this race condition unfolds and leads to a NeverDeleteActive violation. The root cause is a classic time-of-check to time-of-update flaw.
The trace goes as follows. The planner issues several consecutive plans. Enactor 1 picks up one but lags on the next two steps. Enactor 2 arrives and moves faster: it takes the next plan, applies it, and performs cleanup. Because the staleness threshold isn’t yet reached, Enactor 1’s plan survives this cleanup. On the next round, Enactor 2 gets another plan and applies it. Only then does Enactor 1 finally execute its Apply step, overwriting Enactor 2’s newly installed plan. When Enactor 2’s cleanup runs again, it deletes that plan (which is now considered stale) violating the invariant NeverDeleteActiveRecord.
You can explore this violation trace using a browser-based TLA+ trace explorer that Will Schultz built by following this link. The Spectacle tool loads the TLA+ spec from GitHub, interprets it using JavaScript interpreter, and shows/visualizes step-by-step state changes. You can step backwards and forwards using the buttons, and explore enabled actions. This makes model outputs accessible to engineers unfamiliar with TLA+. You can share a violation trace simply by sending a link as I did above.
Surprise with innodb_doublewrite_pages in MySQL 8.0.20+
November 04, 2025
Announcing Vitess 23
November 03, 2025
PostgreSQL as a JSON database: Advanced patterns and best practices
$50 PlanetScale Metal
Notes on "Prothean AI"
If this were one person, I wouldn’t write this publicly. Since there are apparently multiple people on board, and they claim to be looking for investors, I think the balance falls in favor of disclosure.
Last week Prothean Systems announced they’d surpassed AGI and called for the research community to “verify, critique, extend, and improve upon” their work. Unfortunately, Prothean Systems has not published the repository they say researchers should use to verify their claims. However, based on their posts, web site, and public GitHub repositories, I can offer a few basic critiques.
ARC-AGI-2
First: Prothean’s core claim, repeated in the white paper, the FAQ, and the press release, is that Prothean has successfully solved all 400 tasks of the ARC-AGI-2 benchmark:
We present Prothean Emergent Intelligence, a novel five-pillar computational architecture that achieved 100% accuracy on all 400 tasks of the ARC-AGI-2 challenge in 0.887 seconds.
This system achieved 100% accuracy on all 400 tasks of the ARC-AGI-2 challenge in 0.887 seconds—a result previously considered impossible for artificial intelligence systems.
ARC-AGI-2 is specifically designed to resist gaming:
- 400 unique, novel tasks
- No training data available
It is not possible to solve “all 400 tasks of the ARC-AGI-2 challenge” because ARC-AGI-2 does not have 400 evaluation tasks to solve. Nor is it true that ARC-AGI-2 has no training data. The ARC-AGI-2 web site and docs are clear about this: the workload consists of “1,000 public training tasks and 120 public evaluation tasks”.
Prothean System’s “verification protocol” instructs researchers to download these 400 tasks from a non-existent URL:
Step 3: Obtain ARC-AGI-2 Dataset
# Download official dataset wget https://github.com/fchollet/ARC-AGI/arc-agi-2-dataset.zip
This isn’t even the right repository. ARC-AGI-2 is at https://github.com/arcprize/ARC-AGI-2.
Local Operations
Prothean Systems repeatedly claims that its operations are purely local. The announcement post states:
🧬 WE SURPASSED AGI. THIS IS EGI. NOT AI. HERE’S THE PROOF.
Welcome to Prothean EGI.
──────────
After years of research and months of physical development, I’ve been building something that shouldn’t exist yet.After months of solo development, I put together a small strike force team—the Prothean Systems team—to bring this vision to life.
Prothean Systems — Emergent General Intelligence (EGI) that lives on your device, remembers everything, and never leaves your hands.
No servers. No uploads. No corporate AI deciding what you can know.
The demo on beprothean.org also advertises “no servers” and “zero uploads”. This is obviously false. Typing almost anything into this box, such as “what are the five pillars”, triggers two queries to Wikipedia and ten messages, including a mix of reads and writes, to the Firebase hosted database service:
Compression
Prothean’s launch video explains that Prothean uses a multi-tier compression system called “Memory DNA”. This system “actually looks at the data, understands its characteristics, and then applies the absolute best compression technique for it”. The white paper lists nine compression tiers:
Compression Cascade:
1. Semantic Extraction (40% reduction) - Core meaning preservation
2. Pattern Recognition (25% reduction) - Recurring structure identification
3. Fibonacci Sequencing (15% reduction) - φ-ratio temporal organization
4. Harmonic Resonance (10% reduction) - Frequency-domain optimization
5. Contextual Pruning (8% reduction) - Relevance-based filtering
6. Recursive Abstraction (6% reduction) - Meta-pattern extraction
7. Golden Ratio Weighting (4% reduction) - φ-importance scaling
8. Temporal Decay (3% reduction) - Age-based compression
9. Neural Synthesis (2% reduction) - Final integration
The demo on beprothean.org does nothing of the sort. Instead, it makes a single call to the open-source lz-string library, which is based on LZW, a compression algorithm from 1984.
function demoMemoryDNA(){
const text = $('mdna-input').value.trim();
...
const compressed = LZString.compressToUTF16(text);
Guardian
Prothean advertises a system called “Guardian”: an “integrity firewall that validates every operation” and “detects sensitive data, prevents drift, enforces alignment at runtime”. The demo does not appear to address drift or alignment in any way. Instead, it matches strings against three regular expressions, looking for things shaped like e-mail addresses, credit cards, and API keys.
// Guardian: Pattern detection & integrity validation
function runGuardian(text){
...
if(/\b\w+@\w+\.\w+/.test(text)){ patterns.push('email'); threatLevel += 1; }
if(/\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}/.test(text)){ patterns.push('card'); threatLevel += 3; }
if(/(password|secret|api[_-]?key|token)/i.test(text)){ patterns.push('secret'); threatLevel += 2; }
...
return { patterns, threatLevel, integrity, safe: threatLevel < 5 };
}
Semantic Bridging
Prothean’s “Universal Pattern Engine” invites users to “connect two concepts through semantic bridging”. This semantic bridging system has no awareness of semantics. Instead, it is based entirely on the number of letters in the two concepts and picking some pre-determined words out of a list:
const a = $('upe-a').value.trim();
const b = $('upe-b').value.trim();
...
// Semantic bridge building
const bridges = ['system', 'design', 'architecture', 'implementation', 'protection',
'control', 'ownership', 'agency', 'autonomy', 'dignity', 'privacy'];
// Calculate semantic distance (simulated)
const similarity = (a.length + b.length) % 10 / 10;
...
const selectedBridges = bridges.slice(0, 3 + Math.floor(similarity * 3));
Tree Height
The white paper describes a “Radiant Data Tree” structure, which uses a "Fibonaci-branching tree structure with φ-ratio organization. The depth of this tree is given as:
Depth(n) = φ^n / √5 (rounded to nearest Fibonacci number)
Depth is generally understood to be a property of a specific node, not a tree as a whole—I assume they mean Height(n). In either case this is impossible. The height and maximum depth of a tree are bounded by the number of edges between the deepest node and the root. Trees cannot have a height which grows as φn, because this implies that the depth increases faster than the number of nodes. Per this equation, a “Radiant Data Tree” containing just six nodes must have a height of eight—and must therefore contain at least nine nodes, not six.
Transcendence Score
The white paper also introduces a transcendence score T which measures “mathematical beauty and emergent capability”:
T = (0.25×C + 0.25×(1-M) + 0.3×N + 0.2×P) × φ mod 1.0Where:
- C = Complexity handling (0 to 1)
- M = Memory compression ratio (0 to 1)
- N = Neural emergence score (0 to 1)
- P = Pattern recognition quality (0 to 1)
- φ = golden ratio (1.618…)
Speaking broadly, a quality metric should increase as its inputs improve. This score does not. As each metric improves, the transcendence score rises—until roughly halfway through it reaches one, and immediately wraps around to zero again. Consequently, small improvements in (e.g.) “pattern recognition quality” can cause T to fall dramatically. The white paper goes on to claim specific values from their work which are subject to this modular wrapping.
There’s a lot more here, but I don’t want to burn too much time on this.
Large Language Model Hazards
Based on the commit history and prose style, I believe Prothean is largely the the product of Large Language Models (LLMS). LLMs are very good at producing plausible text which is not connected to reality. People also like talking to LLMs: the architecture, training processes, and financial incentives around chatbots bias them towards emitting agreeable, engaging text. The technical term for this is “sycophancy”.
The LLM propensity for confabulation and reward hacking has led to some weird things in the past few years. People—even experts in a field—who engage heavily with LLMs can convince themselves that they have “awoken” an LLM partner, or discovered a novel form of intelligence. They may also become convinced that they have made a scientific breakthrough, especially in the field of AI.
Please be careful when talking to LLMs.
Report on our investigation of the 2025-10-20 incident in AWS us-east-1
Announcing Vitess 23.0.0
November 01, 2025
Covering index for $group/$sum in MongoDB aggregation (with hint)
MongoDB indexes enable quick equality and range filtering (with $match) and ordering (with $sort). For aggregation operations like $group with $first or $last accumulators, indexes can dramatically improve performance through the DISTINCT_SCAN optimization. However, for hash-based aggregations such as $group with $sum or $sortByCount, there's not such optimization. While indexes can provide significant performance benefits by creating covered query plans (avoiding document fetches), the query planner won't automatically select them. You must explicitly specify the index using a hint.
Setup
I created a test collection with 50,000 documents. Each document has a "groupme" field with 1000 distinct values, and a filler field (a random 50KB base64 string).
base64 -w 51200 /dev/urandom |
awk 'NR>50000{exit}{print int(1000*rand())"\t"$0}' |
mongoimport -c demo -f groupme,filler --type=tsv /dev/stdin --drop
I continue with mongosh and use an aggregation pipeline to group by "groupme" and count the documents:
db.demo.countDocuments()
50000
db.demo.aggregate([ { $sortByCount: "$groupme" } ])
[
{ _id: 937, count: 78 }, { _id: 517, count: 71 },
{ _id: 798, count: 70 }, { _id: 182, count: 68 },
{ _id: 812, count: 68 }, { _id: 158, count: 67 },
{ _id: 60, count: 67 }, { _id: 157, count: 66 },
{ _id: 653, count: 66 }, { _id: 901, count: 66 },
{ _id: 450, count: 66 }, { _id: 587, count: 66 },
{ _id: 701, count: 66 }, { _id: 403, count: 66 },
{ _id: 110, count: 65 }, { _id: 757, count: 65 },
{ _id: 461, count: 65 }, { _id: 593, count: 65 },
{ _id: 418, count: 65 }, { _id: 39, count: 64 }
]
Type "it" for more
Collection Scan
I executed it again to get the execution plan with execution statistics:
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
]).explain("executionStats")
In my small lab, the aggregation ran for 580 milliseconds:
print(x.stages[0].executionTimeMillisEstimate)
Long('590')
The execution plan shows a collection scan.
print(x.stages[0].$cursor.queryPlanner.winningPlan.queryPlan)
{
stage: 'GROUP',
planNodeId: 3,
inputStage: {
stage: 'COLLSCAN',
planNodeId: 1,
filter: {},
direction: 'forward'
}
}
According to execution statistics, the elapsed time is primarily caused by the scan operation, which took 549 milliseconds, and the aggregation step:
print(x.stages[0].$cursor.executionStats.executionStages.inputStage.inputStage)
{
stage: 'group',
planNodeId: 3,
nReturned: 1000,
executionTimeMillisEstimate: 579,
opens: 1,
closes: 1,
saveState: 37,
restoreState: 36,
isEOF: 1,
groupBySlots: [ Long('5') ],
expressions: { '6': 'count() ', initExprs: { '6': null } },
mergingExprs: { '4': 'sum(s4) ' },
usedDisk: false,
spills: 0,
spilledBytes: 0,
spilledRecords: 0,
spilledDataStorageSize: 0,
inputStage: {
stage: 'project',
planNodeId: 3,
nReturned: 50000,
executionTimeMillisEstimate: 569,
opens: 1,
closes: 1,
saveState: 37,
restoreState: 36,
isEOF: 1,
projections: { '5': '(s3 ?: null) ' },
inputStage: {
stage: 'scan',
planNodeId: 1,
nReturned: 50000,
executionTimeMillisEstimate: 549,
opens: 1,
closes: 1,
saveState: 37,
restoreState: 36,
isEOF: 1,
numReads: 50000,
recordSlot: 1,
recordIdSlot: 2,
scanFieldNames: [ 'groupme' ],
scanFieldSlots: [ Long('3') ]
}
}
}
Index on the grouping key
To demonstrate the benefit of a covering index, I create an index on the grouping key:
db.demo.createIndex(
{ groupme: 1 }
);
The query planner doesn't use the index and the query still uses a collection scan:
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
]).explain("executionStats")
print(x.stages[0].$cursor.queryPlanner.winningPlan.queryPlan)
{
stage: 'GROUP',
planNodeId: 3,
inputStage: {
stage: 'COLLSCAN',
planNodeId: 1,
filter: {},
direction: 'forward'
}
}
MongoDB's query planner doesn't automatically choose covering indexes, but I can force it with a hint:
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
],
{ hint: { groupme: 1 } } ).explain("executionStats")
print(x.stages[0].$cursor.queryPlanner.winningPlan.queryPlan)
{
stage: 'GROUP',
planNodeId: 3,
inputStage: {
stage: 'PROJECTION_COVERED',
planNodeId: 2,
transformBy: { groupme: true, _id: false },
inputStage: {
stage: 'IXSCAN',
planNodeId: 1,
keyPattern: { groupme: 1 },
indexName: 'groupme_1',
isMultiKey: false,
multiKeyPaths: { groupme: [] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: { groupme: [ '[MinKey, MaxKey]' ] }
}
}
}
The execution plan uses an index scan with a projection that covers the grouping columns. Because the query accesses only the small index entries, not the large documents, execution time dropped to 19 milliseconds:
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
],
{ hint: { groupme: 1 } } ).explain("executionStats")
print(x.stages[0].executionTimeMillisEstimate)
Long('22')
print(x.stages[0].$cursor.executionStats.executionStages.inputStage.inputStage)
{
stage: 'group',
planNodeId: 3,
nReturned: 1000,
executionTimeMillisEstimate: 20,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
groupBySlots: [ Long('4') ],
expressions: { '5': 'count() ', initExprs: { '5': null } },
mergingExprs: { '3': 'sum(s3) ' },
usedDisk: false,
spills: 0,
spilledBytes: 0,
spilledRecords: 0,
spilledDataStorageSize: 0,
inputStage: {
stage: 'project',
planNodeId: 3,
nReturned: 50000,
executionTimeMillisEstimate: 10,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
projections: { '4': '(s2 ?: null) ' },
inputStage: {
stage: 'ixseek',
planNodeId: 1,
nReturned: 50000,
executionTimeMillisEstimate: 10,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
indexName: 'groupme_1',
keysExamined: 50000,
seeks: 1,
numReads: 50001,
recordIdSlot: 1,
outputSlots: [ Long('2') ],
indexKeysToInclude: '00000000000000000000000000000001',
seekKeyLow: 'KS(0A01) ',
seekKeyHigh: 'KS(F0FE) '
}
}
}
While some databases use different aggregation algorithms when the input data is ordered, MongoDB always relies on a hash-based algorithm. I plan to test two indexes to determine if one with a prefix matching the grouping key is more efficient.
Index not starting with the grouping key
I create an index that is not ordered on the grouping key but only covers it:
db.demo.createIndex(
{ _id:1, groupme: 1 }
);
The index is slightly larger, and the scan takes a few additional milliseconds:
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
],
{ hint: { _id:1, groupme: 1 } } ).explain("executionStats")
print(x.stages[0].executionTimeMillisEstimate)
Long('24')
print(x.stages[0].$cursor.executionStats.executionStages.inputStage.inputStage)
{
stage: 'group',
planNodeId: 3,
nReturned: 1000,
executionTimeMillisEstimate: 19,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
groupBySlots: [ Long('4') ],
expressions: { '5': 'count() ', initExprs: { '5': null } },
mergingExprs: { '3': 'sum(s3) ' },
usedDisk: false,
spills: 0,
spilledBytes: 0,
spilledRecords: 0,
spilledDataStorageSize: 0,
inputStage: {
stage: 'project',
planNodeId: 3,
nReturned: 50000,
executionTimeMillisEstimate: 19,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
projections: { '4': '(s2 ?: null) ' },
inputStage: {
stage: 'ixseek',
planNodeId: 1,
nReturned: 50000,
executionTimeMillisEstimate: 19,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
indexName: '_id_1_groupme_1',
keysExamined: 50000,
seeks: 1,
numReads: 50001,
recordIdSlot: 1,
outputSlots: [ Long('2') ],
indexKeysToInclude: '00000000000000000000000000000010',
seekKeyLow: 'KS(0A0A01) ',
seekKeyHigh: 'KS(F0F0FE) '
}
}
}
For comparison, I created another index with the same fields in the key, but arranged in a different order, to match the size and start with the grouping key:
db.demo.createIndex(
{ groupme: 1, _id:1 }
);
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
],
{ hint: { groupme: 1, _id:1 } } ).explain("executionStats")
print(x.stages[0].executionTimeMillisEstimate)
Long('22')
print(x.stages[0].$cursor.executionStats.executionStages.inputStage.inputStage)
{
stage: 'group',
planNodeId: 3,
nReturned: 1000,
executionTimeMillisEstimate: 20,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
groupBySlots: [ Long('4') ],
expressions: { '5': 'count() ', initExprs: { '5': null } },
mergingExprs: { '3': 'sum(s3) ' },
usedDisk: false,
spills: 0,
spilledBytes: 0,
spilledRecords: 0,
spilledDataStorageSize: 0,
inputStage: {
stage: 'project',
planNodeId: 3,
nReturned: 50000,
executionTimeMillisEstimate: 20,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
projections: { '4': '(s2 ?: null) ' },
inputStage: {
stage: 'ixseek',
planNodeId: 1,
nReturned: 50000,
executionTimeMillisEstimate: 0,
opens: 1,
closes: 1,
saveState: 8,
restoreState: 7,
isEOF: 1,
indexName: 'groupme_1__id_1',
keysExamined: 50000,
seeks: 1,
numReads: 50001,
recordIdSlot: 1,
outputSlots: [ Long('2') ],
indexKeysToInclude: '00000000000000000000000000000001',
seekKeyLow: 'KS(0A0A01) ',
seekKeyHigh: 'KS(F0F0FE) '
}
}
}
The execution time remains similar, and the main benefit of the index is realized when it includes the fields used in the $group stage, no matter their order.
Forcing disk spill
To compare the work done with the two indexes, I force a disk spill by setting the hash aggregation to a value smaller than the aggregate:
db.adminCommand({
setParameter: 1,
internalQuerySlotBasedExecutionHashAggApproxMemoryUseInBytesBeforeSpill: 100
}).was
Long('104857600')
Both indexes show 50000 disk spill when grouping 50000 entries:
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
],
{ hint: { groupme: 1, _id:1 } } ).explain("executionStats")
print(x.stages[0].executionTimeMillisEstimate)
Long('59')
print(x.stages[0].$cursor.executionStats.executionStages.inputStage.inputStage.spills)
996
x=db.demo.aggregate([
{ $sortByCount: "$groupme" }
],
{ hint: { _id:1, groupme: 1 } } ).explain("executionStats")
print(x.stages[0].executionTimeMillisEstimate)
Long('617')
print(x.stages[0].$cursor.executionStats.executionStages.inputStage.inputStage.spills)
16667
When the aggregation needs to spill to disk due to many distinct values for the grouping key or reduced hash aggregation memory, the index starting with the grouping key offers an advantage.
I reset the configuration to the default of 100MB:
db.adminCommand({
setParameter: 1,
internalQuerySlotBasedExecutionHashAggApproxMemoryUseInBytesBeforeSpill: 104857600
}).was
Long('1')
Conclusion
For aggregations, especially when documents are large, using a covering index can greatly improve performance by allowing MongoDB to read compact index entries instead of full documents. Because MongoDB’s query planner does not automatically select covering indexes for these workloads, you typically need to use a ... (truncated)
October 31, 2025
How to Configure pgBackRest Backups and Restores in PostgreSQL (Local/k8s) Using a MinIO Object Store
Down with template (or not)!
Down with template (or not)!
If you are one of the few people in the world who, like me, actively follow the progress of the C++ standardization, you know that the C++ standard evolves significantly from each version to the next. The proposals often even have funny names like “I Stream, You Stream, We All Stream for istream_iterator”, “More trailing commas”, “std::array is a wrapper for an array!”, or “Down with typename!”.
When I first saw the paper “Down with typename!”, I thought that made a lot of sense. This change in C++20 allows you to write static_cast<std::vector<T>::value_type>(...) instead of static_cast<typename std::vector<T>::value_type>(...). Nice!
Recently I stumbled upon a similar weird-looking and unnecessary keyword in our code base:
template <typename T, typename F>
struct JoinFilterHasher {
template <typename S>
static uint64_t hashDict(const byte* data, unsigned index, const byte* arg) {
const T* dict = unalignedLoad<const T*>(arg);
const S pos = *reinterpret_cast<const S*>(data + index * sizeof(S));
return F::template hash<type>(dict + pos, nullptr);
// ^^^^^^^^ what is this???
}
};
When I saw this, I immediately thought that maybe the time had come for a new paper titled “Down with template!”. Unfortunately, after I fell into the rabbit hole of better understanding the template keyword, I understood why removing it isn’t (easily) possible.
However, I found an absolutely crazy workaround. Don’t try this at home kids! Read on and follow me through this journey of C++ template madness!
The template Keyword
The template keyword itself is not something that will surprise any C++ programmer. Every time you define a template, you’ll have to use the template keyword. Pretty obvious.
However, there are also cases where you’ll have to tell the compiler whether a name refers to a template by prefixing the name with the template keyword. Usually, you don’t do that and just write std::vector<int> but if you really wanted, you could also write std::template vector<int>!
Why would you ever need that? To explain this, let’s look at this simplified example:
struct Foo {
template <int I>
static int myfunc(int a) { return a + I; }
};
template <typename T>
int genericFoo(int a) {
return T::myfunc<42>(a);
}
int test() {
return genericFoo<Foo>(1);
}
This code snippet does not compile! The reason is that in genericFoo the type T is generic, i.e., T is only known when you instantiate genericFoo with a specific type. So, the compiler can’t know in advance whether T::myfunc ends up referring to a member variable, a member function, a nested type specification, or a template (in C++ standardese this is called a “dependent name”). But it needs to know what T::myfunc is in order to correctly parse the body of genericFoo. So, you have to use the template keyword to make this unambiguous.
Any good compiler will tell you that you should write T::template myfunc to fix the compilation. If the compiler already knows, why can’t we just get rid of the template keyword for this altogether?
Down with template?
In order to understand why we need the template keyword, let’s take a closer look at the compiler output:
<source>: In function 'int genericFoo(int)':
<source>:8:15: warning: expected 'template' keyword before dependent template name [-Wmissing-template-keyword]
8 | return T::myfunc<42>(a);
| ^~~~~~
| template
Ok, as I said, the compiler knows exactly that we need to add the template keyword. But wait a minute: This is just a compiler warning and not an error. Huh? If it’s just a warning, why doesn’t this code compile anyway? Let’s continue reading the compiler messages:
<source>:8:21: error: invalid operands of types '<unresolved overloaded function type>' and 'int' to binary 'operator<'
8 | return T::myfunc<42>(a);
| ~~~~~~^~~
Here we found the actual compilation error. But it doesn’t say anything about templates at all. Instead, it mentions an “unresolved operator< ”? What is going on here?!
It turns out, if you don’t prefix such a dependent name with the template keyword, the compiler treats it as a regular variable. So, what the compiler actually sees is this, broken down by tokens:
return // return keyword
T // name referring to a type
:: // scope resolution operator
// !!! Here's the interesting part !!!
myfunc // name referring to a variable
< // operator<
42 // integer literal
> // operator>
// !!! end of interesting part !!!
( // opening parenthesis (not a function call!)
a // name referring to a variable
) // closing parenthesis
; // semicolon
So, the compiler doesn’t see any template at all, just the variables myfunc and a and the constant 42 that are being compared using < and >. This is an equivalent statement, just formatted and parenthesized differently:
return (T::myfunc < 42) > a;
Since the compiler can’t know whether you just wanted a weirdly formatted comparison or an actual template, you need to specify that you’re actually dealing with a template using the template keyword.
Case closed. Or, is it?
Template Madness
We know that syntactically the code without the template keyword is correct. It just parses it as comparison operators instead of template arguments. So, can we somehow find a way to make this compile without changing the genericFoo function or the Foo class?
The answer is: Yes, but only if you’re willing to give up your sanity, sacrifice all your RAM to your compiler, and generally don’t care about runtime performance at all. Sounds good?
To solve this, we need to write code that can handle being compared using < and > and somehow translates this again into an application of template arguments.
Since we don’t want to change Foo , we’ll create a new class that will contain our evil hacks called DownWithTemplate. Our goal is the following: Write DownWithTemplate so that calling genericFoo<DownWithTemplate>(1) is equivalent to a fixed version of genericFoo<Foo>(1) where we add the template keyword as suggested by the compiler.
Operator Overloading
Let’s start with the easy part: overloading operator< and operator>. We know that genericFoo wants to access a value called myfunc , so myfunc will be a static member variable whose type overloads operator<:
struct DownWithTemplate {
/* ... */
struct MyFunc {
OtherHalf operator<(int i) const {
return OtherHalf{i};
}
};
static constexpr MyFunc myfunc{};
};
The value that operator< returns needs to implement the other half of our hack, namely it needs to overload operator>:
struct DownWithTemplate {
struct OtherHalf {
int value;
int operator>(int i) const {
return /* ??? */;
}
};
/* ... */
};
In OtherHalf::operator> we now have the two values of our expression: value contains the “template argument” (42 in our implementation of genericFoo ) and i gets the value of a in genericFoo.
Now, how do we call Foo::myfunc? Ideally we would like to just write Foo::myfunc<value>(i). If we do that, we’ll get the following compiler error:
<source>:8:16: note: template argument deduction/substitution failed:
<source>:27:32: error: '*(const DownWithTemplate::OtherHalf*)this' is not a constant expression
27 | return Foo::myfunc<value>(i);
| ^~~~~
Since value is not a compile time constant, we can’t pass it as a template argument which must be known at compile time.
Runtime Template Arguments (or: Inner Circle of C++ Template Hell)
How do we bridge the gap between a value only known at runtime and a template that needs to know the value at compile time? There is no easy fix. Conceptually, a template must know its template arguments at compile time but a runtime value obviously is only known at runtime.
So, if we don’t know the template argument, we’ll just have to select the correct template at runtime. For this to work, we’ll have to generate all possible template instantiations with all possible values as template arguments.
This is what we want to implement (I hope you never want to do this):
switch (value) {
case 0: return Foo::myfunc<0>(i);
case 1: return Foo::myfunc<1>(i);
/* ... */
case 2147483647: return Foo::myfunc<2147483647>(i);
case -1: return Foo::myfunc<-1>(i);
case -2: return Foo::myfunc<-2>(i);
/* ... */
case -2147483648: return Foo::myfunc<-2147483648>(i);
}
Obviously, we don’t want to write this code manually. So, let’s see what the tool box of C++ templates gives us to help us out here: We’ll use std::integer_sequence to generate all possible integer values at compile time and we’ll use something called “Fold expressions” that work on “Parameter packs” to generate the code for all cases automatically.
If you’ve never heard of these three C++ features, that’s probably a good sign. Your colleagues will be thankful to never have to review code using them!
Anyway, there’s no easy way to prepare you for this, so I’ll just show you the code in all it’s C++ template hack ugliness and explain afterwards:
struct DownWithTemplate {
template <int... Is>
static int callMyFunc(std::integer_sequence<int, Is...>, int a, int b) {
return (((Is == a) ? Foo::myfunc<Is>(b) : 0) + ...) +
(((-Is-1 == a) ? Foo::myfunc<-Is-1>(b) : 0) + ...);
}
struct OtherHalf {
int value;
int operator>(int i) const {
return callMyFunc(std::make_integer_sequence<int, std::numeric_limits<int>::max()>{}, value, i);
}
};
/* ... */
};
First, we create all possible positive integers using std::make_integer_sequence. There is no equivalent template that gives you all negative integers, so we’ll just go through the range of all positive integers twice and negate the values once.
Unfortunately, fold expressions are just that — expressions —, and not statements. So, we can’t write a real switch case statement. What we’ll do instead is to write a long list of additions like this:
((value == 0) ? Foo::myfunc<0>(i) : 0) +
((value == 1) ? Foo::myfunc<1>(i) : 0) +
/* ... */
So, we add zero for all “cases” that don’t match and only call Foo::myfunc for the correct value. The exact syntax of fold expressions is very weird so you’ll just have to trust me that the code above is equivalent to this sum.
The Fallout
Let us roughly estimate how much work the compiler will have to do: We have 2^32 different possible templates. Each template instantiation contains at least a new Foo::myfunc expression. In Clang, a C++ function template instantiation (the class FunctionTemplateSpecializationInfo) uses at least 32 bytes of memory. So, at a minimum, the compiler would need 2^32*32 bytes = 128 GiB of memory to compile our code!
You can quickly confirm this by trying to compile this program:
clang++ -c -std=c++23 -O0 ./down_with_template.cpp
On my machine, this quickly leads to furious swapping of memory and eventually the OOM killer killing the compiler process (and a bunch of other processes as well). Don’t try this at home!
If you try to compile the same example with a 16-bit integer, Clang will not eat all your RAM but it will refuse to compile the program citing a “maximum nesting level for fold expressions”.
I tried compiling it with gcc on our largest machine as well: After consuming over 300 GiB of RAM, the OOM killer also got to gcc.
For now, our code only compiles if you are using 8-bit integers. You can find the full code at the bottom.
Conclusion
If you see the template keyword in an unexpected location in C++, you now know that it’s there to disambiguate between template arguments and comparison operators!
Still, C++ allows you to employ evil template hacks to work around this. If you are willing to sacrifice all of your sanity and RAM, you’ll be able to get rid of the template keyword!
Appendix
Benchmark Code: Vectorized Sums
#include <limits>
#include <utility>
// DANGER: changing this to any larger type will make the compiler eat all of your RAM!
using Int = signed char;
struct Foo {
template <Int I>
static Int myfunc(Int a) { return a + I; }
};
template <typename T>
Int genericFoo(Int a) {
return T::myfunc<42>(a);
}
struct DownWithTemplate {
template <Int... Is>
static Int callMyFunc(std::integer_sequence<Int, Is...>, Int a, Int b) {
return (((Is == a) ? Foo::myfunc<Is>(b) : 0) + ...) +
(((-Is-1 == a) ? Foo::myfunc<-Is-1>(b) : 0) + ...);
}
struct OtherHalf {
Int value;
Int operator>(Int i) const {
return callMyFunc(std::make_integer_sequence<Int, std::numeric_limits<Int>::max()>{}, value, i);
}
};
struct MyFunc {
OtherHalf operator<(Int i) const {
return OtherHalf{i};
}
};
static constexpr MyFunc myfunc{};
};
Int test() {
return genericFoo<DownWithTemplate>(1);
}