Continuing with notes from the BugBash talks. Yes, all of this goodness, including Will Wilson's keynote was before lunch the first day.
Where all the ladders start
Peter Alvaro, Associate Professor of Computer Science @ UC Santa Cruz
In this talk, Peter reflects back on his 20 years of distributing systems work. The cover image is Don Quixote (which is Peter) attacking the windmill (robust distributed systems) with a spear (which is some singular solution often borrowed from databases).
The first attack was through the use of arcane algebras. This is a purist approach of getting it right the first time. This was during Peter's PhD at UC Berkeley, where Neil Conway was also a peer and collaborator.
In his own admission, this was incited by a naive framing around what makes distributed systems difficult? The target was uncertainty regarding order and timing, which cause distributed consistency problems, and require coordination. But distributed coordination comes at the cost of latency, performance variability, decreased availability. Peter said, they were influenced by James Hamilton (LADIS'08 talk), and Pat Helland's work around avoiding coordination.
Peter then talked about two similarly sounding problems, deadlock detection versus garbage collection, in a partitioned replicated distributed system. While these have similar formulations around strongly connected components, the first one is a monotonic problem, and the second not! When new information arrives, the first is positive, and second is the negative direction-seeking.
They had formulated the CALM theorem to capture how monotonicity composes to properties on programs. This sounds grand and very promising, but not even close to reaching the target, solving the robust distributed systems problem.
The first problem is that disorder is not the whole story! The partial failure problem, rather than just binary failures, throws a wrench in the works. The second problem is that nobody wants to buy declarative languages.
The next attack is through lineage driven fault injection. This goes with the premise that fault-tolerance is just through redundancy. This was a practical and useful approach. It found real bugs in real systems at Netflix, Ebay, and Uber. But this also suffered from relying on sufficient observability and whitebox instrumentation/testing. A second limitation was that redundancy is not always a good thing as it comes with costs, both performance and monetary.
Recently, he has been working on yet another attack, Descartes: deterministic testing with Rupak Majumdar. The idea here is to feel out safety margins, and do stability testing.
Is this Sisyphus at work? No, this doesn't feel like punishment. This is fun. And we are making progress with each attack. Yes there is no silver bullet, but that won't stop us for prodding and searching for unifying approaches.
Peter was also good at making people laugh with inside jokes about "postmortem party", and mentioning some unhelpful misguided suggestions offered at postmortems. We have a long way to go as a discipline.
From dams to data: how to think about infrastructure
Deb Chachra, Professor of Engineering @ Olin College
Here are some out-of-context quotes to summarize the talk. Piecing them together is an exercise left to the reader.
"Technology is the active human interface with the material world." "Infrastructure is relational, it is based on relationships." "Infrastructure has a trajectory." "The gifts of nature are for the public." "Niagara Falls: 1903 triscuit, baked by electricity, Sir Henry Pellet the financier." "Robert R. Moses, boo! Utilitarianist ethics." "We are going to build a system: a small number of people will be harmed, but a lot of people will benefit." "Unfortunately, we are spectacularly bad at that decision making." "Renewable energy is incredibly abundant: Earth is not a closed system for energy." "Renewables are inherently decentralized and distributed." "Infrastructure becomes a political right." "If you can't read in a world where 80% cannot, you are probably fine. But if you can't read in a world where 80% can... you are not fine."
Yes, this was not a software talk. Yes, there are many technology and AI parallels here, but the speaker did not go into these.
Lightning Talks
What 20 years of kernel bugs taught us about finding the next one
Jenny Qu, AI Researcher @ Pebblebed
30% of kernel bugs hide 5+ years. These share the same pattern: control and data paths implicit. There is a shared channel, assumed sequencing, no enforcement. In other words, mostly concurrency bugs, which are hard to reproduce. LLMs are going to make this a lot worse! (Maybe the understatement of the year.) Every data channel is also a channel that can reach to control path. There is no control path anymore. What does Jepsen look like for LLM generated systems?
Formal verification in the web dev workflow
Fernanda Graciolli, Co-Founder @ Midspiral
This talk was about proofs. Thanks to AI, we can do it, and because of AI, we must do it! Claude can write a dafny proof, now what? She talked about Dafny to React components workflow. Pure logic is proved before compiling. Logic lives in specs, iterate on specs, not code. spec.yaml: structured spec
Old Tom Bombadil is a merry fuzzer!
Oskar Wickström, Senior Software Engineer @ Antithesis
I attended the BugBash 2026 these last two days, and had a blast. Here are my notes from the first keynote. I will try to find time to publish my notes from the other talks in the coming days.
Keynote: We won, what now?
Will Wilson, Co-founder & CEO @ Antithesis
The Antithesis team opened with a great animation/teaser clip, then Will took the stage. Here is the summary of his talk.
This is not a software testing conference. This is about building reliable software by any means: testing, observability, formal methods, people/culture, better languages. He shows a meme of fantastic five or something using their rings, they invoke this giant warrior.
Time to acknowledge the elephant in the room. A new contender emerges: AI!!
We are now taking a fundamentally unreliable system (AI) to make the systems we are developing reliable. And it is working somehow?! There is a vibe quality to it. When the cost of software generation goes down drastically, you can do a whole lot of it as per Jevons' paradox.
At this point Will starts talking about this hypothetical band: Quaternion dysfunction, and its noncommutative album. This is a niche band, following them early on makes us feel very special, part of a small in-group.
Now imagine that this niche band becomes freakishly popular suddenly. You feel many things. First you feel a great validation. But now more people start following the band, and you lost the in-group identity. You need to get a new personality. But who knows when maybe you can cash on it, as expert or talent lead.
Other copycat bands enter the scene, say Helvetica scenario. And now there are also many faker fans. People just follow these bands because they are popular. This happens in real world a lot, and in the technology world as well. Jon Evans wrote a Techcrunch article on this in 2015: Beware the pretty people. Lawyers, financiers, business people. He overstates the effects. Silicon Valley getting popular was mostly good diversity with these other people arriving. But yes there are also comes scammers, bad actors.
Here is a comparison of niche fields versus popular fields.
elite vs. energizing
elitist vs. lots of BS
cozy vs. innovative
defiant vs. ridiculous
Transition between these two worlds can be traumatic. And when all is said and done, Michael Lewis will come and write a book about it. (Will's zinger, not mine!)
In case, you still haven't caught on to the analogy. That band is "software correctness": formal methods, property-based testing, observability. I.e., the people/community inc BugBash.
As Will put it bluntly, you belong to a cult. Vast majority of engineers don't care about correctness much. They are not bad people, but they are doing this because other pressures/priorities. This is a fact of life: most people don't didn't care about correctness much.
Well, that is until something strange happened, which made people care about correctness! Check the Google Trends for property based testing. It shot up from zero to millions in 2025-26. Same for formal methods.
Everybody just started caring about this because of AI. But how has AI caused this to happen? The conventional story is that AI agents don't write correct software, but take this story at face value. You mean software is written by unreliable agents? Always has been meme! People have been writing bad software for decades, and nobody batted an eyelash before for verification. So why now all of a sudden they care about verification.
Previously when Will told to the managers that 50% of your teams time is spent in testing, they didn't use to believe him. Now they correct him and tell him it is 99%. Implication of AI and Amdahl's law means, now correctness is important. So no need to mention that, thanks to the AI wave, business is booming for Antithesis.
The Amdahl's law is a nice angle to look at this. But I think there is another reason for this, as Steve Klabnik mentioned in day 2. Previously, no matter how buggy it is, you had written your software, understood it, and tried it. And using AI breaks all three: now you don't have a way to validate the software without formal methods and property-based testing, etc.
Then Will went on to set up the roadmap and expectations for the software reliability folks.
This feels like the Eternal September (1993/1994), where the unwashed masses started onboarding the internet. Forums got flooded, the norms changed. The in-crowd protested, but it was for good. It was an overall positive. We should keep the looking back perspective in mind.
What about the payoffs? Will showed the Rembrandt painting titled the "Parable of the Workers in the Vineyard", which depicts the bible story of vineyard workers getting paid at the end of the day, where the workers who joined in the last couple hours of the day paid the same as the ones who toiled all day. The parable is interpreted to mean that even those who are converted late in life earn equal rewards along with those converted early, and that people who convert early in life need not feel jealous of those later converts.
Will iterated: Don't feel resentful. This is what winning looks like. Other people coming and coopting your thing is actually what winning looks like. It is okay to win! Your position will get demolished/bastardized, but the world would have moved slightly towards your position. This is the transition from defiant to ridiculous in the above table.
(My aside: As for one, I am tired of winning! Too much winning going on on all fronts recently. I feel like the word "winning" is getting devalued. Also I personally do not agree with the parable's lesson. Even the monkey's have this injustice instinct built in. Don't go philosophizing over me.)
Anyway, Will's takeaway message is this. The masses are coming. It is our community's time to shine. Software reliability tools had been for the elite, but it is changing. It is time to teach others.
Teach others?! On day two, Steve Klabnik also iterated this message. It is time for others to learn from this community. But, neither elaborated how this teaching/learning will take place. And I remain skeptical. Yeah, I do blog about this stuff, and enthusiasts and people in the know follow and they say they benefit and learn. But I am skeptical about how this would scale. Learning is an active process, it requires active participation and effort on the learner's side. Some educators even claim, there is no teaching, there is only learning. I am worried people will follow easy non-solution trends, like I don't know HOPE: Heuristic Oversight of Probabilistically-correct Execution. Or I don't know AGILE: Assert Goodness, Iterate Later, Eventually. The braindead solutions always get more popular. Thinking is hard, and the human brains are optimized to be lazy.
Let me talk about the talk mechanics to wrap this up. Overall, this was a good show, in the best sense of the word. The delivery of the talk looked effortless but it is clear Will put a lot of work in to this presentation to make it this smooth. He had so many zingers, and in-jokes. The band analogy is wonderful. The Rembrands painting story is really memorable. These set the stage well, and help people manage expectations for the roadmap. This is a technical talk, presented as a nontechnical talk.
It was very entertaining, as well as informative and thought-provoking. Will's liberal arts background comes through clearly. And the clever use memes was also a pattern shared among the best presenters in the conference. For a conference like this, the point is to score laughs, and entertain as much as teach.
In the previous guide, a robust Primary-Replica topology for Valkey was established. Read scaling is now active, and a hot copy of the data is securely stored on a second node. But there is a catch. If a primary node crashes, the replica will remain faithful and wait for instructions. It will not automatically take … Continued
At Percona, we’re passionate about open source database software, helping organizations of all sizes run, manage, and optimize their databases with the freedom and transparency that open source provides. That spirit of openness doesn’t stop at our products, it runs through everything we do, including how we encourage our own people to innovate. We recently … Continued
In the recent open-source data landscape, Valkey has emerged as a prominent player. Born as a Linux Foundation-backed, fully open-source fork of Redis (following Redis’s recent licensing changes), Valkey serves as a high-performance, in-memory key-value data store. Whether Valkey is deployed as a primary database, an ephemeral cache, or a rapid message broker, a single … Continued
We’re thrilled to welcome the open source database community back in person for Percona Live 2026, taking place May 27–29 in the Bay Area. After the energy of past events, there’s nothing like being together again — swapping war stories over coffee, sketching architectures on napkins, and learning from the people building and running databases … Continued
We ran a multiplayer DOOM server in pure SQL on different data stack architectures, recorded nice videos, and measured what breaks first.
Click here to go directly to the benchmark page (with videos).
Here is a video of CedarDB being “DOOMbench”-ed:
Why DOOM?
Pedantic note: The original DOOMQL uses raycasting, not BSP trees, making it technically more Wolfenstein than DOOM as some people pointed out.
Last year, we published DOOMQL: a multiplayer DOOM-like game running entirely inside SQL, using recursive CTEs for raycasting and a real client-server architecture where players connect directly to the database. We were very excited when it hit the front page of Hacker News.
DOOMBench builds on DOOMQL and turns it into a stress-test for different data stacks.
Latency numbers and throughput charts are easy to report but hard to feel.
A video of a database struggling to render a video game frame can be felt instantly!
Let’s look at the three dimensions DOOMQL covers:
Raw analytical performance: DOOMQL uses recursive CTEs to render a game world in ASCII-art using raycasting. That’s as number-crunchy as SQL gets! Some might interject that this is not a workload representative of the real world, to which I reply: Might be true, but there’s precedence.
Transaction Processing: DOOMQL uses a client-server architecture. Clients connect directly to the database and insert their inputs into an inputs table: WASD to move, X to shoot. That’s not going to be a lot of transactions (think: 10 players sending an input every 200 ms each -> 50 transactions per second), but latency is a big issue here. If you’ve ever played a multiplayer shooter with a high ping, you understand. Furthermore, there’s also a game loop the server has to run multiple times a second. This can reach from 100 ticks per minute (Runescape) to 128 ticks per second (Valorant).
Atomicity: Nothing feels worse than being shot by a player who was already dead on your screen. Good database systems can execute transactions in an atomic fashion. Either everything applies or nothing: There cannot be a player that has 0 or less hitpoints but hasn’t been killed and respawned yet. This is not really a metric to measure: It either applies, or it doesn’t. Fortunately, nearly all serious databases implement such ACID guarantees nowadays.
The interesting part is that it’s very hard for your data stack to be good at both analytical and transactional processing.
Analytics wants to crunch a lot of data and is usually bandwidth-bound (memory, disk, caches). Transaction processing wants writes to feel snappy and is usually latency-bound.
Both approaches traditionally use different data layouts, data structures, and system architecture. Let’s explore them in the context of DOOM!
What does a database running DOOM look like?
Let’s use Postgres as an example. Here’s a video of it running DOOMbench:
Let’s go over what we see:
The main view on the left shows the player’s view: The Raycasted game view itself, a minimap with the player’s sight cone, and a score screen including player health, ammo and kill count.
Right of that is a minimap of the world state. It’s the state Postgres is currently in, i.e. not the view rendered to the client, but the state of the database at the current tick.
Below that, we see the player input, as well as some performance numbers: the server tick rate and FPS, both current and as historic chart.
So how does data flow through the whole system? Let’s look at what has to happen for a new view to be rendered:
1. Inputs
The player presses ‘W’ which appends a new row into the inputs table: insert into inputs(player_id, action, timestamp) values (47, 'W', now()).
2. Game tick
The next game tick will then read that row, update the player position (as long as the player isn’t dead, the move is blocked by walls, etc…). We limit the server tick rate to 35 ticks per second (same as the original DOOM). Ticks and input are processed in lockstep.
3. Rendering
Clients can request a frame by querying a view that does all the raycasting behind the scenes on demand: select full_row from frames_by_row where player_id = 47 order by f.row asc (see here)
The rendering loop is decoupled from the game tick loop (game design 101), leading to a true mixed workload: Every client wants to maximize its own FPS (no VSYNC here, analytical workload), while the server also must be able to still process all the inputs at 35 ticks per second (transactional workload), while players continue to happily send their inputs whenever they please (transactional workload #2).
Since rendering is very(!) expensive and the game tick loop is decoupled from the rendering loop, the player’s view can get heavily out of sync with the game state.
You can see this with Postgres above: While it can process about 10 ticks per second, rendering the view takes multiple seconds.
The player makes a smooth 360 degree turn, as visible in the game state, but the output never catches up.
While the server already knows what happened to you, you don’t! I think we can all agree that a Counter-Strike pro wouldn’t call this playable.
OLAP: Let’s do everything in a data lake!
Okay, while Postgres is a battle-tested database system, it doesn’t seem to be a fit for our workload since it’s just not fast enough to push enough frames per second.
Let’s go with a pure OLAP system instead! They are purpose-built to answer complex analytical queries, so they should push a lot of FPS, right?
True, but unfortunately they are really bad at transactions and usually don’t have a way to do live transaction processing at all.
Where’s the value in rendering a lot of frames if your input isn’t processed?
There are two ways to get around this limitation: Two systems with an ETL pipeline in between (Extract-Transform-Load) or what I come to lovingly call the nesting doll approach.
Let’s look at both approaches!
ETL
The concept is pretty simple: Let’s use a system that’s really good at transaction processing, and a system that is really good at analytical processing and
insert a pipeline in the middle replicating the data (a so-called ETL pipeline). The transactional system can focus on processing all the player inputs and running
the game loop, the analytical system can push the FPS.
Here’s such a set up. It uses Postgres as transactional system and DuckDB as analytical system. A simple CDC (Change-Data-Capture) loop runs once a second
and copies over all tables from Postgres to DuckDB. Here’s the result:
Doesn’t look much better, right?
But it is, in fact, much better! If you look at the chart, you see that DuckDB pushes a respectable 10 FPS (compared to the 0.3 of Postgres).
But since the game state visible to DuckDB only updates once a second, 9 out of 10 frames just render the same view!
This system split is awesome for raw analytical performance but kind of useless if you need a tight feedback loop.
The Nesting Doll Approach
Having a second system and an ETL pipeline sucks: Apart from the replication lag we just encountered, you also have to maintain multiple systems and the pipeline in-between.
There is another approach, though: If everyone really likes using Postgres, but Postgres is not fast enough on analytical workloads, why not just co-locate a fast analytical engine
inside your Postgres?
One such approach is pg_clickhouse which we already discussed in a previous post on this blog.
It provides access to the ClickHouse database engine from inside Postgres and can push table scans to the far more capable ClickHouse analytical engine.
Here’s DOOMQL on pg_clickhouse:
As you can see, it shows more or less the same performance as Postgres. Improving table scan performance with modern engines is great, but in the end it’s still the Postgres query
optimizer and execution engine which are the bottleneck.
This is especially true for DOOMQL where tables are pretty small (so no need for fast table scans), but the queries themselves are very complex.
HTAP is the holy grail?
So far, we’ve seen:
Postgres handling transactions, but being unable to push frames.
DuckDB behind an ETL pipeline pushes frames but renders stale state.
Bolting a fast analytical engine onto Postgres doesn’t help because the bottleneck is query executor (and its execution model), not the bare table scans.
So what if your database was just good at both?
That’s the premise of HTAP (Hybrid Transactional/Analytical Processing).
A database built in such a way that it can handle writes with low latency and run complex analytical queries on the same data in parallel, without an ETL pipeline.
So no replication lag, stale reads, and especially no second system and data pipeline to maintain.
But if that’s so desirable, why aren’t all systems like that?
For most of database history, the hardware made you pick “either/or”.
OLTP systems were designed around the catastrophically bad random I/O of spinning disks.
To make sure that a newly inserted record touched as few different places on disk as possible, OLTP systems are usually row-oriented and new rows are just appended.
OLAP systems obviously have to work with the same disk limitations, but want to scan a lot of data in as few reads as possible. Since they were usually bottlenecked
by the measly throughput of an HDD, they try to read as little as possible, making extensive use of compression schemes and structuring their data layout into columns.
A query usually doesn’t touch all fields of a record (I’m not interested in the player’s password if I just want to find out their position on the map), this massively
reduces the amount of data to be scanned. Unfortunately, this is terrible for OLTP: Adding a new row now means I have to update all its columns which are spread all over the disk.
Nowadays, entire companies exist just to move data between OLTP and OLAP systems. But the foundational assumptions of those systems don’t hold anymore:
A single server can have dozens of cores and hundreds of GiB of RAM, which is often enough to keep your entire hot dataset in main memory.
Modern NVMe SSDs can do hundreds of thousands of random IOPS.
So, the hardware constraint that forced specialization is mostly gone, but most database architectures haven’t caught up.
They are still organized around the tradeoffs of the 90’s and 00’s.
CedarDB is built from scratch for the new reality: The storage layer, query optimizer, and execution engine are all designed to handle both workloads natively.
Instead of bolting an analytical engine onto a transactional one or vice versa, CedarDB follows one coherent architecture that assumes fast storage, abundant DRAM, and many cores.
But enough theory, let’s see CedarDB in action:
The difference is immediately visible. CedarDB can push ~30 FPS at 30 ticks per second without replication lag: Each frame shows the current system state.
DOOMbench records a median lag of 44 milliseconds, meaning it takes just 44 milliseconds for a keypress to lead to an observable outcome.
Still not enough for a counterstrike pro, but enough to actually play the game!
Is it a contrived workload? Absolutely! But the underlying pattern (make observations on fresh data) shows up everywhere.
Take for example dashboards, interactive analytics, or AI agents acting on their own decisions.
The DOOMbench web page
But enough about the videos. Head over to cedardb.com/doombench for the full result table, or keep reading for a summary.
DOOMbench measures four things:
Tickrate: This is a pure OLTP measurement without rendering any frames. Four players move around and shoot while the server processes game ticks as fast as possible. How fast can your database run the game loop?
Static FPS: This is a pure OLAP measurement without any movement or ticks. Four players query their rendered view as fast as possible. This is raw analytical query throughput. We sum up and report the FPS of all four clients.
Median Lag: The metric every eSports gamer cares about. Time from button press to the rendered view reflecting that input. This captures OLTP performance, OLAP performance and replication lag in a single number.
DOOMscore: The HTAP benchmark. Four clients playing the actual game with the game loop ticking at 35 Hz (original DOOM tick rate). How many combined frames per second can the database render while keeping up with the game loop? Systems that can’t sustain 35 ticks per second get penalized proportionally: If you only manage half the tick rate, your DOOMscore halves too.
Each system runs the same DOOMQL codebase on the same hardware. Since some systems (CockroachDB, DuckDB) have slight syntax deviations, DOOMbench allows you to declare database-specific SQL overrides.
Apart from the four numbers above, DOOMbench also records a video replay of the same scene for every database which you can watch.
DOOMbench is open source. If you want to add additional systems, feel free to open a PR!
DOOMbench currently only works with Postgres-compatible systems, but we’d like to add other systems as well.
Should I care?
About the benchmark?
Probably not! Every vendorpushes their ownbenchmark where they arethe best to the surprise of absolutely no one.
This benchmark isn’t different: It uses very obscure and arcane SQL features like recursion and very involved string manipulation.
But it generates videos that make the tradeoffs in your data stack instantly visible. That has to account for something, right?
About HTAP?
Depends on your workload. Plenty of workloads are fine with stale data. If you’re reading the report after having a coffee
anyway, you probably don’t care if it’s five minutes out of date.
If you want to make decisions without a human in the loop, or use your database for interactive workloads (you, your customer, or your AI agent
changes some parameter and expect an instant result), HTAP is a game changer! Or if you want to play DOOM, I guess…
What’s next?
We’re open-sourcing DOOMbench. Missing a system? Unhappy with our methodology? Open a pull request!
I’m also working on a BSP tree implementation in recursive SQL, so we’ll hopefully have a real DOOM inside SQL soon.
If you want to run DOOMbench yourself, you can check out the code here.
All database systems are dockerized and should work out of the box.
Want to try out CedarDB in your own stack? Get started here or get in touch
Why is it so hard to teach an old database new tricks? VillageSQL CTO Steve Schirripa breaks down a year of engineering hurdles, debugging nightmares, and core principles discovered while building the VillageSQL Extension Framework.
We recently looked at how various open-source database engines maintain their secondary indexes (in a previous analysis) and found significant differences. The maintenance of indexes is not the only aspect where storage engines differ, another significant difference is how they handle simple row updates. These updates highlight how these open-source databases organize data and manage … Continued
In this post, we share Ring’s billion-scale semantic video search on Amazon RDS for PostgreSQL with pgvector architectural decisions vs alternatives, cost-performance-scale challenges, key lessons, and future directions. The Ring team designed for global scale their vector search architecture to support millions of customers with vector embeddings, the key technology for numerical representations of visual content generated by an AI model. By converting video frames into vectors-arrays of numbers that capture what’s happening (visual content) in each frame – Ring can store these representations in a database and search them using similarity search. When you type “package delivery,” the system converts that text into a vector and finds the video frames whose vectors are most similar-delivering relevant results in under 2 seconds.
The latest release of the Percona Operator for MySQL, 1.1.0, is here. It brings point-in-time recovery, incremental backups, zstd backup compression, configurable asynchronous replication retries, and a set of stability fixes. This post walks through the highlights and how they help your MySQL deployments on Kubernetes. Percona Operator for MySQL 1.1.0 Running stateful databases … Continued
Introduction: Recently I was having a conversation with a DB Enthusiast, and he mentioned that when he was a fresher, he tuned an ETL/reporting query that was running for 8-10 hours via a nightly job by 1/3rd. He went to his manager, saying that he reduced the query execution time, thinking that the manager would … Continued
Amazon Aurora Serverless is an on-demand, auto scaling configuration for Aurora that scales up to support your most demanding workloads and down to zero when you don’t need it. The latest improvements deliver up to 30% better performance and enhanced scaling that understands your workload. These enhancements are available at no additional cost for a better price-performance ratio. In this post, we’ll share recent performance and scaling improvements with benchmark results, showing how Aurora Serverless can now scale up to 45.0% faster with a 32.9% faster workload completion time.
Having a separate DR cluster for production databases is a modern day requirement or necessity for tech and other related businesses that rely heavily on their database systems. Setting up such a [DC -> DR] topology for Percona XtraDB Cluster (PXC), which is a virtually- synchronous cluster, can be a bit challenging in a complex … Continued
AskTom Live is a great source of information from Oracle developer advocates and product managers, but I recently came across a clickbait marketing title ("Not All Binary Protocols Are Created Equal: The Science Behind OSON's 529x Performance Advantage") which compares apples to oranges, and it's an opportunity to explain what BSON is, the binary JSON format used by MongoDB.
TL;DR: If you want to compare with OSON, the Oracle Database datatype for JSON, you should compare the Mutable BSON Document which is the structure that MongoDB uses to access documents, reading and updating individual fields. Raw BSON is closer to protobuf: a compact serialization format for disk or network transfer, with access metadata removed and no blocks or headers.
I've left the following comment to the YouTube video but it seems that it is not publicly visible, so here it is.
Let me explain how Oracle Database and MongoDB handle disk-based data access, and you will understand the different design purposes of OSON and BSON, and why you are not testing the right thing to compare them.
Oracle Database, like many traditional databases, uses the same format on disk (blocks) and in memory (buffers), and must store all transient metadata that helps access it in memory on persistent storage. This applies to table blocks (which contain a table directory, a row directory, and even lock flags, ITLs, that need to be cleaned up later), and the same idea was used for OSON (header, dictionary, sorted field IDs, offset arrays). Think of it as a mini database with its catalog, like the Oracle database has its dictionary and segment headers, which map physical extents and blocks. Then accessing the on-disk OSON structure directly makes sense — it's designed to be used through buffers that match the disk blocks.
But MongoDB with WiredTiger uses a smarter cache where the in-memory structures are optimized for RAM: adding pointers instead of disk offsets, building an Elements Vector for O(1) field access, and adding skiplists to navigate fields, all when data is loaded into the database cache. So there are two formats: the mutable BSON that the database actually works on in memory for query processing and updates, and the on-disk raw BSON that, on purpose, strips any unnecessary metadata and compresses it, to maximize the OS filesystem cache usage, and fits to the major advantage of MongoDB for documents: read/write a document in a single I/O.
The raw BSON is a serialization format for disk and network, not to be accessed partially, because MongoDB has a powerful mutable BSON format in memory with O(1) access through its Elements Vector indexing. The O(n) sequential scan, the "no partial updates" limitation, and the field position penalties you describe — those are properties of the serialization format, not how MongoDB actually processes queries. And by definition, the serialization format is read sequentially, even though BSON can jump between fields. Don't do that except when you need a full document. Use the MongoDB server and drivers to access BSON, and learn how to use it correctly.
With this understanding, you can see that the "529x performance" clickbait title comes from a mistake: you used raw BSON to access individual fields, bypassing everything MongoDB does when serving a query. It would be like using BBED to query Oracle Datafiles without going through the instance — no buffer cache, no row directory navigation, no dictionary lookups — and then concluding that Oracle's storage format is slow.
Notably, the original OSON VLDB paper (Liu et al., 2020) by Zhen Hua Liu doesn't make the claims this video does. That paper honestly compares OSON against Oracle's own JSON text storage, not against MongoDB's query processing. It compares encoding sizes with BSON, which is legitimate for a serialization format comparison (though it overlooks that BSON in MongoDB is compressed on disk and over the network). The paper authors understood they were comparing serialization formats and storage approaches within Oracle, not benchmarking MongoDB's actual runtime performance. I believe OSON is the optimal format for Oracle because it was integrated into the existing instance, cache, and securefiles, which were created a long time ago. Conversely, BSON is ideal for MongoDB, as it capitalizes on the document database's purpose and the WiredTiger architecture.