HammerDB tproc-c on a large server, Postgres and MySQL
This has results for HammerDB tproc-c on a small server using MySQL and Postgres. I am new to HammerDB and still figuring out how to explain and present results so I will keep this simple and just share graphs without explaining the results.
The comparison might favor Postgres for the IO-bound workloads because I used smaller buffer pools than normal to avoid OOM. I have to do this because RSS for the HammerDB client grows over time as it buffers more response time stats. And while I used buffered IO for Postgres, I use O_DIRECT for InnoDB. So Postgres might have avoided some read IO thanks to the OS page cache while InnoDB did not.
tl;dr for MySQL
- With vu=40 MySQL 8.4.8 uses about 2X more CPU per transaction and does more than 2X more context switches per transaction compared to Postgres 18.1. I will get CPU profiles soon.
- Modern MySQL brings us great improvements to concurrency and too many new CPU overheads
- MySQL 5.6 and 8.4 have similar throughput at the lowest concurrency (vu=10)
- MySQl 8.4 is a lot faster than 5.6 at the highest concurrency (vu=40)
- Modern Postgres has regressions relative to old Postgres
- The regressions increase with the warehouse count, at wh=4000 the NOPM drops between 3% and 13% depending on the virtual user count (vu).
- Postgres and MySQL have similar throughput for the largest warehouse count (wh=4000)
- Otherwise Postgres gets between 1.4X and 2X more throughput (NOPM)
- an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
- 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
- 128G RAM
- Ubuntu 22.04 running the non-HWE kernel (5.5.0-118-generic)
- prior to 9.6 the config file is named my.cnf.cz12a50g_c32r128 (z12a50g_c32r128 or z12a50g) and is here for versions 5.6, 5.7, 8.0 and 8.4
- for 9.6 it is named my.cnf.cz13a50g_c32r128 (z13a50g_c32r128 or z13a50g) and is here
The benchmark was run for several workloads:
- vu=10, wh=1000 - 10 virtual users, 1000 warehouses
- vu=20, wh=1000 - 20 virtual users, 1000 warehouses
- vu=40, wh=1000 - 40 virtual users, 1000 warehouses
- vu=10, wh=2000 - 10 virtual users, 2000 warehouses
- vu=20, wh=2000 - 20 virtual users, 2000 warehouses
- vu=40, wh=2000 - 40 virtual users, 2000 warehouses
- vu=10, wh=4000 - 10 virtual users, 4000 warehouses
- vu=20, wh=4000 - 20 virtual users, 4000 warehouses
- vu=40, wh=4000 - 40 virtual users, 4000 warehouses
- stored procedures are enabled
- partitioning is used because the warehouse count is >= 1000
- a 5 minute rampup is used
- then performance is measured for 60 minutes
- average wMB/s increases with the warehouse count for Postgres but not for MySQL
- r/s increases with the warehouse count for Postgres and MySQL
- CPU utilization is almost 2X larger for MySQL
- Context switch rates are more than 2X larger for MySQL
- In the future I hope to learn why MySQL uses almost 2X more CPU per transaction and has more than 2X more context switches per transaction relative to Postgres
(NOPM for some-version / NOPM for base-version)
I provide three charts below:
- only MySQL - base-version is MySQL 5.6.51
- only Postgres - base-version is Postgres 12.22
- Postgres vs MySQL - base-version is Postgres 18.1, some-version is MySQL 8.4.8
Legend:
- my5651.z12a is MySQL 5.6.51 with the z12a50g config
- my5744.z12a is MySQL 5.7.44 with the z12a50g config
- my8045.z12a is MySQL 8.0.45 with the z12a50g config
- my8408.z12a is MySQL 8.4.8 with the z12a50g config
- my9500.z13a is MySQL 9.6.0 with the z13a50g config
Summary
- At the lowest concurrency (vu=10) MySQL 8.4.8 has similar throughput as 5.6.51 because CPU regressions in modern MySQL offset the concurrency improvements.
- At the highest concurrency (vu=40) MySQL 8.4.8 is much faster than 5.6.51 and the regressions after 5.7 are small. This matches what I have seen elsewhere -- while modern MySQL suffers from CPU regressions it benefits from concurrency improvements. Imagine if we could get those concurrency improvements without the CPU regressions.
And the absolute NOPM values are here:
Results: Postgres 12 to 18
Legend:
- pg1222 is Postgres 12.22 with the x10a50g config
- pg1323 is Postgres 13.23 with the x10a50g config
- pg1420 is Postgres 14.20 with the x10a50g config
- pg1515 is Postgres 15.15 with the x10a50g config
- pg1611 is Postgres 16.11 with the x10a50g config
- pg177 is Postgres 17.7 with the x10a50g config
- pg181 is Postgres 18.1 with the x10b50g config
Summary
- Modern Postgres has regressions relative to old Postgres
- The regressions increase with the warehouse count, at wh=4000 the NOPM drops between 3% and 13% depending on the virtual user count (vu).
by Mark Callaghan (noreply@blogger.com)
Butlers or Architects?In a recent viral post, Matt Shumer declares dramatically that we've crossed an irreversible threshold. He asserts that the latest AI models now exercise independent judgment, that he simply gives an AI plain-English instructions, steps away for a few hours, and returns to a flawlessly finished product that surpasses his own capabilities. In the near future, he claims, AI will autonomously handle all knowledge work and even build the next generation of AI itself, leaving human creators completely blindsided by the exponential curve. This was a depressing read. The dramatic tone lands well. And by extrapolating from progress in the last six years, it's hard to argue against what AI might achieve in the next six. I forwarded this to a friend of mine, who had the misfortune of reading it before bed. He told me he had a nightmare about it, dreaming of himself as an Uber driver, completely displaced from his high-tech career. Someone on Twitter had a come back: "The thing I don't get is: Claude Code is writing 100% of Claude's code now. But Anthropic has 100+ open dev positions on their jobs page?" Boris Cherny of Anthropic replied: "The reality is that someone has to prompt the Claudes, talk to customers, coordinate with other teams, and decide what to build next. Engineering is changing, and great engineers are more important than ever." This is strongly reminiscent of the Shell Game podcast I wrote about recently. And it connects to my arguments in "Agentic AI and The Mythical Agent-Month" about the mathematical laws of scaling coordination. Throwing thousands of AI agents at a project does not magically bypass Brooks' Law. Agents can dramatically scale the volume of code generated, but they do not scale insight. Coordination complexity and verification bottlenecks remain firmly in place. Until you solve the epistemic gap of distributed knowledge, adding more agents simply produces a faster, more expensive way to generate merge conflicts. Design, at its core, is still very human. Trung Phan's recent piece on how Docusign still employs 7,000 people in the age of AI provides useful context as well. Complex organizations don't dissolve overnight. Societal constructs, institutional inertia, regulatory frameworks, and the deeply human texture of business relationships all act as buffers. The world changes slower than the benchmarks suggest. So we are nowhere near a fully autonomous AI that sweeps up all knowledge work and solves everything. When we step back, two ways of reading the situation come into view. The first is that we are all becoming butlers for LLMs: priming the model, feeding it context in careful portions, adding constraints, nudging tone, coaxing the trajectory. Then stepping back to watch it cook. We do the setup and it does the real work. But as a perennial optimist, I think we are becoming architects. Deep work will not disappear, rather it will become the only work that matters. We get to design the blueprint, break down logic in high-level parts, set the vision, dictate strategy, and chart trajectory. We do the real thinking, and then we make the model grind.
In anyway, this shift brings a real danger. If we delegate execution, it becomes tempting to delegate thought gradually. LLMs make thinking feel optional. People were already reluctant to think; now they can bypass it entirely. It is unsettling to watch a statistical prediction machine stand in for reasoning. Humbling, too. Maybe we're not as special as we assumed. This reminds me Ted Chiang's story "Catching Crumbs from the Table" where humanity is reduced to interpreting the outputs of a vastly superior intellect. Human scientists no longer produce breakthroughs themselves; they spend their careers reverse-engineering discoveries made by "metahumans". The tragedy is that humans are no longer the source of the insight, they are merely trying to explain metahumans' genius. The title captures the feeling really well. We're not at the table anymore. We're just gathering what falls from it. Even if things come to that, I know I'll keep thinking, keep learning, keep striving to build things. As I reflected in an earlier post on finding one's true calling, this pursuit of knowledge and creation is my dharma. That basic human drive to understand things and build things is not something an LLM can automate away. This I believe. I recently launched a free email newsletter for the blog. Subscribe here to get these essays delivered to your inbox, along with behind-the-scenes commentary and curated links on distributed systems, technology, and other curiosities. February 14, 2026HammerDB tproc-c on a small server, Postgres and MySQLThis has results for HammerDB tproc-c on a small server using MySQL and Postgres. I am new to HammerDB and still figuring out how to explain and present results so I will keep this simple and just share graphs without explaining the results. tl;dr
Builds, configuration and hardware I compiled Postgres versions from source: 12.22, 13.23, 14.20, 15.15, 16.11, 17.7 and 18.1. I compiled MySQL versions from source: 5.6.51, 5.7.44, 8.0.44, 8.4.7, 9.4.0 and 9.5.0. The server is an ASUS ExpertCenter PN53 with an AMD Ryzen 7 7735HS CPU, 8 cores, SMT disabled, and 32G of RAM. Storage is one NVMe device for the database using ext-4 with discard enabled. The OS is Ubuntu 24.04. More details on it are here. For MySQL the config files are named my.cnf.cz12a_c8r32 and are here: 5.6.51, 5.7.44, 8.0.4x, 8.4.x, 9.x.0. For both Postgres and MySQL fsync on commit is disabled to avoid turning this into an fsync benchmark. The server has an SSD with high fsync latency. Benchmark The benchmark is tproc-c from HammerDB. The tproc-c benchmark is derived from TPC-C. The benchmark was run for several workloads:
The w=100 workloads are less heavy on IO. The w=1000 and w=2000 workloads are more heavy on IO. The benchmark for Postgres is run by this script which depends on scripts here. The MySQL scripts are similar.
Results My analysis at this point is simple -- I only consider average throughput. Eventually I will examine throughput over time and efficiency (CPU and IO). On the charts that follow y-axis does not start at 0 to improve readability at the risk of overstating the differences. The y-axis shows relative throughput. There might be a regression when the relative throughput is less than 1.0. There might be an improvement when it is > 1.0. The relative throughput is:
I provide three charts below:
Results: MySQL 5.6 to 8.4 Legend:
Summary
Results: Postgres 12 to 18 Legend:
Summary
Results: MySQL vs Postgres Legend:
Summary
Cross join in MongoDBRelational database joins are, conceptually, a cartesian product followed by a filter (the join condition). Without that condition, you get a cross join that returns every possible combination. In MongoDB, you can model the same behavior at read time using ExampleDefine two collections: one for clothing sizes and one for gender-specific fits:
Each collection stores independent characteristics, and every size applies to every fit. The goal is to generate all valid product variants. Cross join on read: $lookup + $unwindIn order to add all sizes to each body shape, use a $lookup without filter condition and, as it adds them as an embedded array, use $unwind to get one document per combination:
Application-sideFor such small static reference collections, the application may simply read both and join with loops:
While it's good to keep the reference in a database, such static data can stay in cache in the application. Cross join on write: embed the many-to-manyBecause sizes are inherently tied to body shapes (no size exists without a body shape), embedding them in the
Here is the new shape of the single collection: Once embedded, the query becomes straightforward, simply unwind the embedded array:
You may embed only the fields required, like the size code, or all fields like I did here with the neck size, and then remove the size collection:
Although this may duplicate the values for each body shape, it only requires using
Duplication has the advantage of returning all required information in a single read, without joins or multiple queries, and it is not problematic for updates since it can be handled with a single bulk update operation. Unlike relational databases—where data can be modified through ad‑hoc SQL and business rules must therefore be enforced at the database level—MongoDB applications are typically domain‑driven, with clear ownership of data and a single responsibility for performing updates. In that context, consistency is maintained by the application's service rather than by cross‑table constraints. This approach also lets business rules evolve, such as defining different sizes for men and women, without changing the data model. ConclusionIn a fully normalized relational model, all relationships use the same pattern: a one-to-many relationship between two tables, enforced by a primary (or unique) key on one side and a foreign key on the other. This holds regardless of cardinality (many can be three or one million), lifecycle rules (cascade deletes or updates), ownership (shared or exclusive parent), navigation direction (and access patterns). Even many-to-many relationships are just two one-to-many relationships via a junction table. MongoDB exposes these same concepts as modeling choices—handled at read time with February 13, 2026Supabase incident on February 12, 2026
A detailed account of the February 12 outage in us-east-2, what caused it, and the steps we are taking to prevent it from happening again.
February 12, 2026Achieve near-zero downtime database maintenance by using blue/green deployments with AWS JDBC Driver
In this post we introduce the blue/green deployment plugin for the AWS JDBC Driver, a built-in plugin that automatically handles connection routing, traffic management, and switchover detection during blue/green deployment switchovers. We show you how to configure and use the plugin to minimize downtime during database maintenance operations during blue/green deployment switchovers.
Do You Think I Am a Goldfish?Academic writing has long been criticized for its formulaic nature. As I wrote about earlier, research papers are unfortunately written to please 3 specific expert reviewers who are overwhelmingly from academia. Given this twisted incentive structure (looking impressive for peer-review), the papers end up becoming formulaic, defensive, and often inpenetrable. Ironically, this very uniformity makes it trivially easy for LLMs to replicate academic writing. It is easy to spot LLM use in personal essays, but I dare you to do it successfully in academic writing. Aside: Ok, I baited myself with my own dare. In general, it is very hard to detect LLM usage at the paragraph level in a research paper. But LLM usage in research papers becomes obvious when you see the same definition repeated 3-4 times across consecutive pages. The memoryless nature of LLMs causes them to recycle the same terms and phrases, and I find myself thinking "you already explained this to me four times, do you think I am a goldfish?" I have been reviewing a lot of papers recently, and this is the number one tell-tale sign. A careful read by the authors would clean this up easily, making LLM usage nearly undetectable. To be clear, I am talking about LLM assistance in polishing writing, not wholesale generation. A paper with no original ideas is a different beast entirely. They are vacuous and easy to spot. Anyway, as LLM use become ubiquitous, conference/journal reviewing is facing a big crisis. There are simply too many articles being submitted, as it is easy to generate text and rush half-baked ideas into the presses. I am, of course, unhappy about this. Writing that feels effortless because an LLM smooths every step deprives you of the strain that produces "actual understanding". That strain in writing is not a defect; it creates the very impetus for discovering what you actually think, rather than faking/imitating thought. But here we are. We are at an inflection point in academic publishing. I recently came across this post, which documents an experiment where an LLM replicated and extended a published empirical political science paper with near-human fidelity, at a fraction of the time and cost. I have been predicting the collapse of the publishing system for a decade. The flood of LLM-aided research might finally break its back. And here is where I want to take you in this post. I want to imagine how academic writing may change in this new publishing regime. Call it a 5-10 year outlook, because at this day and age, who can predict anything beyond that. I claim that costly signals of genuine intelligence will become the currency of survival in this new environment. Costly signals work because they are expensive to fake, like a peacock’s tail or an elk’s antlers. And I claim academic writing will increasingly demand features that are expensive to fake. Therefore, a distinctive voice becomes more valuable precisely because it cannot be generated without genuine intellectual engagement. Personal narratives, peculiar perspectives, unexpected conceptual leaps, and field-specific cultural fluency are things that require deep immersion and creative investment that LLMs lack. These are the costly signals that will make a paper worth publishing. Literature reviews are cheap to automate, so they will shrink --as we are already seeing. But reviews with distinctive voice and genuine insight, ones that reflect on the author's own learning and thought process, will survive. Work that builds creative frameworks and surprising connections, which are expensive to produce, will flourish. When anyone can generate competent prose, only writing that screams "a specific human spent serious time thinking about this" will cut through. So, LLMs may accidentally force academia toward what it always claimed to value: original thinking and clear communication. The costliest signal of all is having something genuinely new to say, and saying well. I am an optimist, as you can easily tell, if you are a long time reader of this blog. “Simplicity and elegance are unpopular because they require hard work and discipline to achieve and education to be appreciated.” -- Edsger W. Dijkstra February 11, 2026Migrate relational-style data from NoSQL to Amazon Aurora DSQL
In this post, we demonstrate how to efficiently migrate relational-style data from NoSQL to Aurora DSQL, using Kiro CLI as our generative AI tool to optimize schema design and streamline the migration process.
Prisma + MongoDB “Hello World”Prisma is an ORM (Object-Relational Mapper). With MongoDB, it acts as an Object Document Mapper, mapping collections to TypeScript models and providing a consistent, type-safe query API. MongoDB is a document database with a flexible schema. Prisma does not provide schema migrations for MongoDB, but it supports nested documents and embedded types to take advantage of MongoDB’s data locality. This article walks through a minimal “Hello World” setup on a Docker environment:
Start MongoDB as a replica setPrisma requires MongoDB to run as a replica set. While MongoDB supports many operations without transactions, Prisma relies on MongoDB sessions and transactional behavior internally, which are only available on replica sets. Start MongoDB in a Docker container with replica set support enabled:
Initialize the replica set (a single‑node replica set is sufficient for local development and testing):
Start a Node.js containerStart a Node.js container that can access MongoDB using the hostname
Prepare the Node.js environmentUpdate the package manager, install an editor, update npm, disable funding messages, and move to the working directory:
Install Prisma Client and enable ES modulesInstall Prisma Client and enable ES modules by adding
Using ES modules enables standard Install Prisma CLI and TypeScript toolingInstall the Prisma CLI and supporting tooling, and generate the initial Prisma configuration:
Configure the Prisma schemaEdit
Prisma maps MongoDB’s Configure the database connectionDefine the MongoDB connection string in
Prisma reads Generate the Prisma clientGenerate the Prisma client from the schema:
This produces TypeScript client files in Write the “Hello World” programCreate
This program connects to MongoDB, inserts a “Hello World” document, and prints all stored messages. Run the programFor running TypeScript directly in modern Node.js projects, Execute the TypeScript file:
Output:
Conclusion and final note on schemas in MongoDBThis example shows a minimal Prisma + MongoDB setup:
From here, you can add schema evolution, indexes, and more complex queries while keeping the same core configuration. MongoDB is often called schemaless, but that’s misleading in practice, as we started to declare the database schema in Unlike relational databases—where the schema is enforced in the database and then mapped into the application—MongoDB uses the same document structure across all layers: in‑memory cache, on‑disk storage, and application models. This preserves data locality, avoids ORM overhead and migration scripts, and simplifies the development. Prisma makes this explicit by defining the schema in code, providing type safety and consistency while keeping MongoDB’s document model flexible as your application evolves. OSTEP Chapter 8The crux of this chapter is how to schedule tasks without perfect knowledge. If you remember from the previous chapter, the core tension in CPU scheduling is these two conflicting goals:
Unfortunately, the OS does not have a crystal ball. It doesn't know if a process is a short interactive job or a massive number-crunching batch job. The Multi-Level Feedback Queue (MLFQ) solves this by encoding/capturing information from history of the job, and assumes that if a job has been CPU-intensive in the past, it likely will be in the future. As we'll see below, it also gives a chance for jobs to redeem themselves through the boosting process. I really enjoyed this chapter. MLFQ, invented by Corbato in 1962, is a brilliant scheduling algorithm. This elegant solution served as the base scheduler for many systems, including BSD UNIX derivatives, Solaris, and Windows NT and subsequent Windows operating systems. (This is part of our series going through OSTEP book chapters.) How MLFQ Works: The Basic RulesThe chapter constructs the MLFQ algorithm iteratively, starting with a basic structure involving distinct queues, each with a different priority level.
But how does a job get its priority?
This setup cleverly approximates Shortest Job First. Because the scheduler assumes every new job is short (giving it high priority), true short jobs finish quickly. Long jobs eventually exhaust their time slices and sink to the bottom queues, where they run only when the system isn't busy with interactive tasks. Patching the initial MLFQ rulesHowever, this basic version has fatal flaws.
To fix these issues, the chapter introduces two crucial modifications. The Priority Boost: To prevent low-priority jobs from starving, the scheduler employs Rule 5: After a set time period (S), all jobs are moved back to the topmost queue. This "boost" ensures that CPU-bound jobs get at least some processing time and allows jobs that have become interactive to return to a high-priority state. Better Accounting: To stop users from gaming the system, the scheduler rewrites Rule 4 regarding how it tracks time. Rule 4: Instead of resetting the allotment every time a job yields the CPU, the scheduler tracks the total time a job uses at a given priority level. Once the allotment is used up (regardless of how many times the job yielded the CPU) it is demoted. Tuning MLFQThe remaining piece of the puzzle is parameterization. An MLFQ requires choosing the number of queues, the time slice length for each, and the frequency of the priority boost. There are no easy answers to these questions, and finding a satisfactory balance often requires deep experience with specific workloads. For example, most implementations employ varying time-slice lengths, assigning short slices (e.g., 10 ms) to high-priority queues for responsiveness and longer slices (e.g., 100s of ms) to low-priority queues for efficiency. Furthermore, the priority boost interval is often referred to as a "voodoo constant" because it requires magic to set correctly; if the value is too high, jobs starve, but if it is too low, interactive performance suffers. MLFQ is a milestone in operating systems design. It delivers strong performance for interactive jobs without prior knowledge of job length, while remaining fair to long-running tasks. As noted earlier, it became the base scheduler for many operating systems, with several variants refining the core idea. One notable variant is the decay-usage approach used in FreeBSD 4.3. Instead of using fixed priority tables (as in Solaris), it computes priority using a mathematical function of recent CPU usage. Running increases a job’s usage counter and lowers its priority, while the passage of time decays this counter. Decay plays the same role as periodic priority boosts. As usage fades, priority rises, ensuring long-running jobs eventually run and allowing jobs that shift from CPU-bound to interactive to regain high priority. TLA+ modelI used Gemini to write a TLA+ model of the MLFQ algorithm here. To run this MLFQ TLA+ model at Spectacle for visualization, click this link and it will open the model on your browser, no installation or plugin required. What you will see is the initial state. Click on any enabled action to take it, you can go back and forward on the right pane to explore the execution. And you can share a URL back with anyone to point to an interesting state or trace, just like I did here. February 10, 2026Build a custom solution to migrate SQL Server HierarchyID to PostgreSQL LTREE with AWS DMS
In this post, we discuss configuring AWS DMS tasks to migrate HierarchyID columns from SQL Server to Aurora PostgreSQL-Compatible efficiently.
{ w: 1 } Asynchronous Writes and Conflict Resolution in MongoDBMongoDB guarantees durability—the D in ACID—over the network with strong consistency—the C in the CAP theorem—by default. It still maintains high availability: in the event of a network partition, the majority of nodes continue to serve consistent reads and writes transparently, without raising errors to the application. A consensus protocol based on Raft is used to achieve this at two levels:
It's important to distinguish the two types of consensus involved: one for controlling replica roles and one for the replication of data itself. By comparison, failover automation around monolithic databases like PostgreSQL can use a consensus protocol to elect a primary (as Patroni does), but replication itself is built into PostgreSQL and does not rely on a consensus protocol—a failure in the middle may leave inconsistency between replicas. Trade-offs between performance and consistencyConsensus on writes increases latency, especially in multi-region deployments, because it requires synchronous replication and waiting on the network, but it guarantees no data loss in disaster recovery scenarios (RPO = 0). Some workloads may prefer lower latency and accept limited data loss (for example, a couple of seconds of RPO after a datacenter burns). If you ingest data from IoT devices, you may favor fast ingestion at the risk of losing some data in such a disaster. Similarly, when migrating from another database, you might prefer fast synchronization and, in case of infrastructure failure, simply restart the migration from before the failure point. In such cases, you can use Most failures are not full-scale disasters where an entire data center is lost, but transient issues with short network disconnections. With
Because the failure is transient, when the old primary rejoins, no data is physically lost: writes from both sides still exist. However, these writes may conflict, resulting in a diverging database state with two branches. As with any asynchronous replication, this requires conflict resolution. MongoDB handles this as follows:
Thus, when you use This conflict resolution is a Recover To a Timestamp (RTT). Demo on a Docker labLet's try it. I start 3 containers as a replica set:
The last command waits until I insert "XXX-10" when connected to
I disconnect the secondary
With a replication factor of 3, the cluster is resilient to one failure and I insert "XXX-11", when connected to the primary:
I disconnect
Here, Two seconds later, a similar write fails because the primary stepped down:
I wait that
No nodes are down, it's only a network partition, and I can read from all nodes as long as I don't connect through the network. I query the collection on each side:
The inconsistency is visible, "XXX-12" is only in I reconnect
I query again and all nodes show the same values: "XXX-12" has disappeared and all nodes are now synchronized to the current state. When it rejoined, The rolled back operations are not lost, MongoDB logged them in a I read and decode all BSON in the rollback directory:
The deleted document is in
The deleted oplog for the related insert is in
Conclusion: beyond RaftBy default, MongoDB favors strong consistency and durability: writes use MongoDB also lets you explicitly relax that guarantee by choosing a weaker write concern such as
This rollback behavior is where MongoDB intentionally diverges from vanilla Raft. In classic Raft, the replicated log is the source of truth, and committed log entries are never rolled back. Raft assumes a linearizable, strongly consistent state machine where the application does not expect divergence. MongoDB, by contrast, comes from a NoSQL and event-driven background, where asynchronous replication, eventual consistency, and application-level reconciliation are sometimes acceptable trade-offs. As a result:
In short, MongoDB replication is based on Raft, but adds rollback semantics to support real-world distributed application patterns. Rollbacks happen only when you explicitly allow them, never with majority writes, and they are fully auditable and recoverable. Hydra joins Supabase
The Hydra team, maintainers of pg_duckdb, is joining Supabase to focus on Postgres + Analytics and Open Warehouse Architecture.
February 09, 2026Towards a Standard for JSON Document DatabasesDespite the ubiquity of the MongoDB aggregation framework, it has been lacking a formal mathematical framework/specification. This paper aims to fix this gap by providing a theoretical foundation, and proposes MQuery. The formalization in MQuery is largely based on the paper published at ICDT 2018 (for which the first author is involved), extending it to include more pipeline operators, relax the assumption that the JSON documents stored in the database comply to a predefined schema, and allow objects that are either ordered or unordered sets of key-value pairs. MotivationFor decades, SQL proponents have flaunted the rigorous mathematical foundation of relational algebra (courtesy of Edgar Codd). The world of JSON document databases, however, has remained a bit of a Wild West in comparison. The analogy is apt because, like the frontier, there is immense opportunity here. JSON is the undisputed king of data exchange, and the MongoDB aggregation framework has emerged as the widely adopted query language for JSON collections. Thanks to its expressive pipeline model, massive developer base, and popularity, MongoDB aggregation framework has effectively become the de facto standard for querying JSON. The fact that major vendors (including Amazon, Microsoft, Oracle, and Google) seek to provide compatibility with the MongoDB API further underscores its recognition as a common lingua franca. (The authors' words, not mine, so don't think I'm bragging on behalf of MongoDB.) To further motivate the need for a rigorous mathematical framework, the authors highlight current challenges. They argue that MongoDB's semantics are procedural rather than declarative. While the aggregation pipeline is pragmatic and powerful, its documentation often overlooks edge cases, leading to ambiguity. The paper illustrates this with an example about query predicates. In MongoDB, the query origin: "UK" matches a document where origin is the string "UK". However, it also matches a document where origin is the array ["UK", "Japan"]. While this loose equality is convenient for developers, it is bad for mathematical logic, as it violates the property of transitivity: ["UK"] matches "UK", and "UK" matches "UK", yet [["UK"]] does not match "UK". Furthermore, the paper argues MongoDB suffers from path polysemy. A path like origin.country is inherently ambiguous. Does it refer to a nested field in a single object? Or, if origin is an array, does it refer to the country field of every object inside that array? This leads to data-dependent behavior, where a valid query might throw a runtime error simply because a new document with a different structure was inserted into the collection. MQueryMQuery (which, admittedly, looks a lot like McQuery, and now that I've said this, you won't be able to read it any other way) serves as a formalized abstraction of the MongoDB language. MQuery formalizes the data model using "d-values" (document values), which encompass literals, arrays, and objects. It also defines 7 core pipeline stages that mirror the MongoDB aggregation framework: $match, $unwind, $project, $group, $lookup, $graphLookup, and $union. By formalizing these stages, the authors confirm in Section 4 that "the MongoDB aggregation framework is very expressive: at least as expressive as full relational algebra (RA)". They mention:
They demonstrate that $match corresponds to selection, $project to projection, and $lookup (or a combination of $unwind and $group) to joins. This confirms that document databases can theoretically perform every operation relational databases can, including complex joins and set operations. They also note the MongoDB aggregation framework goes beyond RA by handling Nested Relational Algebra and linear recursion via $graphLookup. The Payoff: Algebraic OptimizationWhy does all this math matter? The formal definition can help us safely optimize queries. The final section of the paper demonstrates algebraic rewriting rules. Thanks to the formal definitions, the authors can prove when it is safe to reorder pipeline stages without altering the result. They provide rules for filter anticipation (moving $match earlier to reduce data volume), unnesting postponement (moving $unwind later to save memory), and join optimization. How We Built Time Series: Configuration-Driven Visualization in Tinybird Forward
How we rebuilt Time Series: the SQL problems we hit, the ClickHouse patterns we used, and how we use it to debug our own alerts.
|