a curated list of database news from authoritative sources

January 20, 2026

What Oracle Missed, We Fixed: More Performant Query Processing in Percona Server for MySQL, Part 2

Remember when Percona significantly improved query processing time by fixing the optimizer bug? I have described all the details in More Performant Query Processing in Percona Server for MySQL blog post. This time, we dug deeper into all the ideas from Enhanced for MySQL and based on our analysis, we proposed several new improvements. All […]

January 19, 2026

LLMs and your career

The most conservative way to build a career as a software developer is 1) to be practical and effective at problem solving but 2) not to treat all existing code as a black box. 1 means that as a conservative developer you should generally use PostgreSQL or MySQL (or whatever existing database), Rails or .NET (or whatever existing framework), and adapt code from Stack Overflow or LLMs. 2 means that you're curious and work over time to better understand how web servers and databases and operating systems and the browser actually work so that you can make better decisions for your own problems as you adapt other people's code and ideas.

Coding via LLM is not fundamentally different from coding with Rails or coding by perusing Stack Overflow. It's faster and more direct but it's still potentially just a human mindlessly adapting existing code.

The people who were only willing to look at existing frameworks and libraries and applications as black boxes were already not the most competitive when it came to finding and retaining work. And on the other hand, the most technically interesting companies always wanted to hire developers who understood fundamentals because they're 1) operating at such a scale that the way the application is written matters or they're 2) building PostgreSQL or MySQL or Rails or .NET or Stack Overflow or LLMs, etc.

The march of software has always been to reduce the need for (ever larger sizes of) SMBs (and teams within non-SMBs) to hire developers to solve problems or increase productivity. LLMs are part of that march. That doesn't change that at some point companies (or teams) need to hire developers because the business or its customer base has become too complex or too large.

The jobs that were dependent on fundamentals of software aren't going to stop being dependent on fundamentals of software. And if more non-developers are using LLMs it's going to mean all the more stress on tools and applications and systems that rely on fundamentals of software.

All of this is to say that if you like doing software development, I don't think interesting software development jobs are going to go away. So keep learning and keep building compilers and databases and operating systems and keep looking for companies that have compiler and database and operating system products, or companies with other sorts of interesting problems where fundamentals matter due to their scale.

January 18, 2026

MongoDB compared to Oracle Database Maximum Availability Architecture

Users looking at Oracle alternatives see the limitations of PostgreSQL in terms of high availability and ask:

Will MongoDB give me the same high availability (HA), disaster recovery (DR), and zero/near-zero data loss I’ve relied on in Oracle’s Maximum Availability Architecture (MAA)?

I previously compared Oracle Maximum Availability Architecture with YugabyteDB’s built-in high availability features here. I've done the same with MongoDB, which also offers built-in high availability though Raft-based replication to quorum, but for a document-model application rather than SQL.

Oracle MAA and MongoDB

Oracle DBAs are used to a mature, integrated stack—RAC, Data Guard, GoldenGate, Flashback, and RMAN—refined for decades to work together in the MAA framework. MongoDB instead takes a cloud-native, distributed approach, embedding replication, failover, and sharding directly in its core engine rather than through separate options. While the technologies differ—MongoDB uses pull-based logical replication instead of broadcasting physical log changes—they pursue the same goals.

MongoDB vs Oracle MAA Levels

MongoDB Oracle🥉 MAA Bronze Oracle🥈 MAA Silver Oracle🥇 MAA Gold Oracle💰 MAA Platinum
Main Goal Cloud-Native Resilience Backup MAA Bronze + HA MAA Silver + DR MAA Gold + Multi-Master
HA/DR Product Native in MongoDB core: replica sets, sharding, multi-region, backups RMAN RAC RAC + Data Guard (ADG) RAC + Data Guard + GoldenGate
RPO (Recovery Point Objective) 0 with writeConcern: "w:majority" (sync to quorum), which is the default Minutes (archivelog backups) Same as Bronze (no extra DB protection) 0 with Maximum Protection0 or more with Maximum Availability Seconds if tuned with low replication lag
RTO (Recovery Time Objective) 5–15 seconds for automatic primary election Hours–Days depending on database size Seconds for server crash, but DB is not protected Minutes with FSFO (Sync) Seconds if no conflicts in active-active failover
Corrupt Block Detection Automatic via storage engine checksums and replica healing RMAN detection + manual blockrecover No additional protection Automatic with lost-write detection (ADG) No additional protection beyond Gold
Online Upgrade Zero downtime via rolling upgrades Minutes–hours for major releases Some minor rolling upgrades (OS/patch) Near-zero downtime with transient logical Yes, if GoldenGate active-active is configured correctly
Human Error Mitigation Point-in-time restores from Atlas backups or filesystem snapshots Flashback Database, Snapshot Standby, LogMiner, Flashback Query, Flashback Transaction, dbms_rolling, dbms_redefinition, ASM Same + RAC Same + ADG Similar to Gold (GoldenGate can replay changes)
Online DDL Non-blocking schema changes (document model) No transactional DDL; EBR (Edition-Based Redefinition) Same as Bronze Same as Bronze Same as Gold
Active-Active Via sharded clusters or Atlas Global Clusters; shard-key design avoids conflicts No — only standby server offline unless licensed RAC active-active within the same datacenter No — standby is read-only (Active DG) Yes — GoldenGate active-active, requires conflict handling
Application Continuity Automatic failover with drivers + retryable writes CMAN proxy Transparent failover + Transaction Guard (RAC) Transparent failover (Data Guard) Transparent failover + conflict handling
Complexity Low (built-in distributed features) Low High (clusterware) Medium (RAC + DG) High (RAC + DG + GoldenGate)
Price (Options) EE (~$50K/CPU) EE + 50% (RAC option) EE + RAC + 25% (ADG option) EE + RAC + ADG + 75% (GoldenGate; often ~2.5× EE base)

Understanding Oracle MAA Levels

Oracle’s Maximum Availability Architecture (MAA) is organized into four tiers, each building on the previous:

  1. 🥉 Bronze – Backup Only: RMAN online backups, archived log retention, manual restore. RPO in minutes to hours, RTO in hours or days.
  2. 🥈 Silver – Bronze + High Availability: Add Real Application Clusters (RAC) for server-level HA and limited rolling upgrades. Very low RTO for server failures but no additional database protection beyond Bronze’s backups.
  3. 🥇 Gold – Silver + Disaster Recovery: Add Data Guard (often Active Data Guard) for synchronous or asynchronous failover to remote standby. With FSFO (Fast Start Failover) in sync mode, RPO can be zero, RTO in minutes.
  4. 💰 Platinum – Gold + Multi-Master: Add GoldenGate logical replication for active-active writes across sites and versions. Enables global presence but requires manual conflict handling and careful design.

In Oracle’s world, HA/DR is not one thing — it’s a stack of separate licensed products you combine according to your needs and budget. The complexity and cost rise quickly as you move up the tiers.

Focus on features rather than product names, which can be misleading. For example, Autonomous Data Guard (RPO > 0, no real-time query, no rolling upgrade) in the Autonomous Database managed service differs from Active Data Guard, which provides all features of MAA Gold, but is not part of the managed service.

How MongoDB Builds HA and DR Natively

MongoDB’s availability model takes a fundamentally different approach: it is a distributed database by design, with no single authoritative copy bottleneck like a monolithic RDBMS. Replication, failover, sharding, and multi-region deployment are built in, not optional add-ons.

Key components:

  • Replica Sets – Every MongoDB production deployment runs as a replica set by default.
    • Automatic leader (primary) election in 5–30 seconds if a node fails.
    • Tunable write concern for RPO control (e.g., "w:majority" for RPO=0).
    • Flow control to throttle writes when majority committed replication lag reaches a threshold to prevent secondaries from falling too far behind
  • Sharding – Horizontal partitioning with each shard itself being a replica set.
    • Supports scaling out writes and reads while maintaining HA per shard.
  • Multi-Region Clusters – Place replica set members in multiple regions/zones for DR.
    • Writes propagate to remote members, optionally synchronously (Atlas global writes).
  • Backups & Point-in-Time Restore – Atlas offers continuous cloud backups with PIT restore. Self-managed deployments use snapshots + Oplog replay.
  • Automatic Failover & Driver-Level Continuity – Client drivers reconnect automatically. Retryable writes can resume after failover without application-level intervention.
  • Rolling Upgrades Without Downtime – Upgrade nodes one at a time, keeping the cluster online.

This means that what takes three separate licensed Oracle products to achieve in Platinum is, with MongoDB, simply a topology decision you make when deploying the cluster.

Key Takeaways

  • Oracle MAA levels are additive—Bronze, Silver, Gold, and Platinum. Each higher tier requires extra products and adds complexity for on‑premises deployments, and not all tiers are available in managed services. They are very mature for enterprises, but you should not underestimate the operational cost.
  • MongoDB provides built-in HA and DR. Replication, failover, sharding, and multi-region deployments are topology settings that simplify operations. However, configuration still matters: consistency is tunable per application, and settings must align with your infrastructure’s capabilities.
  • RPO and RTO targets achievable with Oracle Platinum are also achievable with MongoDB. The main difference is flashback capabilities, so you must tightly control production access and ensure all changes are automated and tested first in pre-production.
  • MongoDB rolling upgrades eliminate downtime for routine maintenance—a major change from traditional monolithic upgrade windows. Avoid running outdated versions, as was common with legacy databases.
  • Global write scenarios are possible in both, but MongoDB’s sharded architecture can automatically avoid conflicts (ACID properties over the cluster, causal consistency)

January 16, 2026

Rethinking the University in the Age of AI

Three years ago, I wrote a post titled "Getting schooled by AI, colleges must evolve". I argued that as we entered the age of AI, the value of "knowing" was collapsing, and the value of "doing" was skyrocketing. (See Bloom's taxonomy.)

Today, that future has arrived. Entry-level hiring has stalled because AI agents absorb the small tasks where new graduates once learned the craft.

So how do we prepare students for this reality? Not only do I stand by my original advice, I am doubling down. Surviving this shift requires more than minor curriculum tweaks; it requires a different philosophy of education. I find two old ideas worth reviving: a systems design mindset that emphasizes holistic foundations, and alternative education philosophies of the 1960s that give students real agency and real responsibility.


Holistic Foundations

Three years ago, I begged departments: "Don't raise TensorFlow disk jockeys. Teach databases! Teach compilers! Teach system design!" This is even more critical today. Jim Waldo, in his seminal essay On System Design, warned us that we were teaching "programming" (the act of writing code) rather than "system design" (the art of structuring complexity).

AI may have solved the tedious parts of programming but it has not solved system design. To master system design, students must understand the layers both beneath and above what AI touches. Beneath the code layer lie compilers and formal methods. Compilers matter not so students can build languages, but so they understand cost models, optimization, and the real behavior of generated code. Formal methods matter not to turn everyone into a theorem prover, but to reason about safety, liveness, and failure using invariants. When an AI suggests a concurrent algorithm, students must be able to sense/red-flag the race conditions.

Above the code layer lies system design. As Waldo observed, the best architects are synthesizers. They glue components together without breaking reliability, a  skill that remains stubbornly human. System design lives in an open world of ambiguity, trade-offs, incentives, and failure modes that are never written down. This is why "mutts" with hybrid background often excel in system design. Liberal arts train you to reason about systems made of humans, incentives, and the big picture. Waldo's point rings true. Great system designers understand the environment/context first, then map that understanding into software.


Radical Agency 

My original post called for flipped classrooms and multi-year projects. These still assume a guided path. We may need to go further and embrace the philosophy of the "Summerhill" model, where the school adapts to the learner, rather than the learner fitting the school. In an age of abundant information, the scarce resource is judgment. Universities must stop micromanaging learning and start demanding ownership. The only metric that matters is learning to learn.

Replace tests with portfolios. Industry hires from portfolios, not from grades. Let students use AI freely, you cannot police it anyway. If a student claims they built a system, make them defend it in a rigorous, face-to-face oral exam using the Harkness method. Ask them to explain, how the system behaves under load, under failure, and at the corner cases. 

In 2023, I argued for collaboration and writing. In 2026, that is the whole game. Collaboration now beats individual brilliance. Writing is a core technical skill. A clear specification is fast becoming equivalent to correct code. Students who cannot write clearly cannot think clearly. 

I also argued for ownership-driven team projects. To make them real, universities must put skin in the game. Schools should create a small seed fund. The deal is simple: $20,000 for 1% equity in student-run startups. The goal is not financial return (which I think will materialize), but contact with reality. Students must incorporate, manage equity, run meetings, and make decisions with consequences.

Deploying Percona Operator for MySQL with OpenTaco for IaC Automation

Deploying databases on Kubernetes is getting easier every year. The part that still hurts is making deployments repeatable and predictable across clusters and environments, especially from Continuous Integration(CI) perspective. This is where PR-based automation helps; you can review a plan, validate changes, and only apply after approval, before anything touches your cluster.  If you’ve ever […]

January 15, 2026

Percona Operator for PostgreSQL 2025 Wrap Up and What We Are Focusing on Next

In 2025, the Percona Operator for PostgreSQL put most of its energy into the things that matter when PostgreSQL is running inside real Kubernetes clusters: predictable upgrades, safer backup and restore, clearer observability, and fewer surprises from image and HA version drift.  Backups and restores got more resilient and more controllable In March, Operator 2.6.0 […]

Announcing Percona ClusterSync for MongoDB: The Open Source Trail to Freedom

Migrating mission-critical databases is often compared to changing the engines on a plane while it’s in mid-flight. For MongoDB users, this challenge has been historically steep, often involving complex workarounds or proprietary tools that keep you locked into a specific ecosystem. Today, we are thrilled to announce the General Availability of Percona ClusterSync for MongoDB […]

When to use Read Replicas vs. bigger compute

A practical guide to diagnosing database slowdowns and choosing between vertical scaling and Read Replicas based on your workload, budget, and performance bottlenecks.

January 14, 2026

Debugging regressions with Postgres in IO-bound sysbench

I explained in this post that there is a possible performance regression for Postgres with IO-bound sysbench. It arrived in Postgres 16 and remains in Postgres 18. I normally run sysbench with a cached database, but I had a spare server so I repeated tests with an IO-bound workload.

The bad news for me is that I need to spend more time explaining the problem. The good news for me is that I learn more about Postgres by doing this. And to be clear, I have yet to explain the regression but this post documents my debugging efforts.

sysbench

This post explains how I use sysbench. Note that modern sysbench is a framework for running benchmarks and it comes with many built-in tests. By framework I mean that it includes much plumbing for several DBMS and you can write tests in Lua with support for the Lua JIT so the client side of the tests use less CPU.

The sysbench framework has been widely used in the MySQL community for a long time. Originally it hard-coded one (or perhaps a few tests) and in too many cases by sysbench people mean the classic (original) sysbench transaction that is a mix of point queries, range queries and writes. The classic transaction is now implemented by oltp_read_write.lua with help from oltp_common.lua.

The oltp_read_write.lua test is usually not good at detecting regressions but in this case (the problem motivating me to write this blog post) it has detected the regression.

How to find regressions

I have opinions about how to run DBMS benchmarks. For example, be careful about running read-only benchmarks for an LSM because the state (shape) of the LSM tree can have a large impact on CPU overhead and the LSM state (shape) might be in the same (bad or good) state for the duration of the test. With read-write tests the LSM state should cycle between better and worse states.

But traditional b-tree storage engines also have states. One aspect of that is MVCC debt -- write-heavy tests increase the debt and the engine doesn't always do a great job of limiting that debt. So I run sysbench tests in a certain order to both create and manage MVCC debt while trying to minimize noise.

The order of the tests is listed here and the general sequence is:

  1. Load and index the tables
  2. Run a few read-only tests to let MVCC debt get reduced if it exists
  3. Optionally run more read-only tests (I usually skip these)
  4. Run write-heavy tests
  5. Do things to reduce MVCC debt (see here for Postgres and MySQL)
  6. Run read-only tests (after the tables+indexes have been subject to random writes)
  7. Run delete-only and then insert-only tests
Results

The possible regressions are:
  • update-zipf in Postgres 16, 17 and 18
  • write-only in Postgres 16, 17 and 18
  • read-write in Postgres 16, 17 and 18
  • insert in Postgres 18 - this reproduces in 18.0 and 18.1

Legend:
* relative QPS for Postgres 16.11, 17.7 and 18.1
* relative QPS is (QPS for my version / QPS for Postgres 15.15)

16.11   17.7    18.1
1.01    1.00    1.01    update-inlist
1.06    1.03    1.04    update-index
1.04    1.04    1.04    update-nonindex
1.00    1.08    1.07    update-one
0.85    0.72    0.71    update-zipf
0.88    0.86    0.84    write-only_range=10000
0.71    0.82    0.81    read-write_range=100
0.74    0.78    0.82    read-write_range=10
1.06    1.03    1.00    delete
1.02    1.00    0.80    insert

Note that when the insert test has a relative QPS of 0.80 for Postgres 18.1, then 18.1 gets 80% of the throughput vs Postgres 15.5. So this is a problem to figure out.

Explaining the insert test for Postgres 18.1

I wrote useful but messy Bash scripts to make it easier to run and explain sysbench results. One of the things I do is collect results from vmstat and iostat per test and then summarize average and normalized values from them where normalized values are: (avg from iostat or vmstat / QPS). 

And then I compute relative values for them which is the following and the base case here is Postgres 15.15: (value from my version / value from the base case) 

From the results below I see that from Postgres 17.7 to 18.1
  • throughput decreases by ~20% in 18.1
  • cpu/o increased by ~17% in 18.1
  • cs/o increased by ~2.6% in 18.1
  • r/o increased by ~14% in 18.1
  • rKB/o increased by ~22% in 18.1
  • wKB/o increased by ~27% in 18.1
So Postgres 18.1 does a lot more IO during the insert test. Possible reasons are that either insert processing changed to be less efficient or there is more MVCC debt at the start of the insert test from the write-heavy tests that precede it and thus there is more IO from write-back and vacuum during the insert test which reduces throughput. At this point I don't know. A next step for me is to repeat the benchmark with the delete test removed as that immediately precedes the insert test (see here).

Legend:
* cpu/o - CPU per operation. This includes us and sy from vmstat but
          it isn't CPU microseconds.
* cs/o - context switches per operation
* r/o - reads from storage per operation
* rKB/o - KB read from storage per operation
* wKB/o - KB written to storage per operation
* o/s - operations/s, QPS, throughput
* dbms - Postgres version

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.000780        5.017   0.407   8.937   24.2    34164   15.15
0.000755        5.090   0.409   9.002   24.648  34833   16.11
0.000800        5.114   0.418   9.146   24.626  34195   17.7
0.000939        5.251   0.477   11.186  31.197  27304   18.1

--- relative to first result
0.97            1.01    1.00    1.01    1.02    1.02    16.11
1.03            1.02    1.03    1.02    1.02    1.00    17.7
1.20            1.05    1.17    1.25    1.29    0.80    18.1

I also have results from pg_stat_all_tables collected at the end of the delete and insert tests for Postgres 17.7 and 18.1. Then I computed the difference as (results after insert test - results after delete test). From the results below I don't see a problem. Note that the difference for n_dead_tup is zero. While my analysis was limited to one of the 8 tables used for the benchmark, the values for the other 7 tables are similar.

The columns with a non-zero difference for Postgres 17.7 are:
  3996146   n_tup_ins
  3996146   n_live_tup
  3996146   n_mod_since_analyze
  3996146   n_ins_since_vacuum

The columns with a non-zero difference for Postgres 18.1 are:
  3191278   n_tup_ins
  3191278   n_live_tup
  3191278   n_mod_since_analyze
  3191278   n_ins_since_vacuum

Explaining the other tests for Postgres 16, 17 and 18

Above I listed the order in which the write-heavy tests are run. The regressions occur on the update-zipf, write-only and read-write tests. All of these follow the update-one test. 
  • There is no regression for write-only and read-write when I change the test order so these run prior to update-one and update-zipf
  • There is a regression if either or both of update-one and update-zip are run prior to write-only and read-write
I assume that either the amount of MVCC debt created by update-one and update-zipf is larger starting with Postgres 16 or that starting in Postgres 16 something changed so that Postgres is less effective at dealing with that MVCC debt.

Note that the update-one and update-zipf tests are a bit awkward. For the update-one test all updates are limited to one row per table (the first row in the table). And for the update-zipf test a zipfian distribution is used to select the rows to update. So in both cases a small number of rows receive most of the updates.

Results from pg_stat_all_tables collected immediately prior to update-one and then after update-zipf are here for Postgres 15.15 and 16.11. Then I computed the difference as (results after insert test - results after delete test). 

The columns with a non-zero difference for Postgres 15.15 are:
  23747485   idx_scan
  23747485   idx_tup_fetch
  23747485   n_tup_upd
  22225868   n_tup_hot_upd
    -69576   n_live_tup
   3273433   n_dead_tup
  -1428177   n_mod_since_analyze

The columns with a non-zero difference for Postgres 16.11 are:
  23102012   idx_scan
  23102012   idx_tup_fetch
  23102012   n_tup_upd
  21698107   n_tup_hot_upd
   1403905   n_tup_newpage_upd
     -3568   n_live_tup
   2983730   n_dead_tup
  -2064095   n_mod_since_analyze

I also have the vmstat and iostat data for each of the tests. In all cases, after update-one, the amount of CPU and IO per operation increases after Postgres 15.15.

For update-one results don't change much across versions

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.000281        12.492  0.001   0.011   0.741   162046  15.15
0.000278        12.554  0.001   0.012   0.746   162415  16.11
0.000253        11.028  0.002   0.029   0.764   174715  17.7
0.000254        10.960  0.001   0.011   0.707   172790  18.1

--- relative to first result
0.99            1.00    1.00    1.09    1.01    1.00    16.11
0.90            0.88    2.00    2.64    1.03    1.08    17.7
0.90            0.88    1.00    1.00    0.95    1.07    18.1

For update-zipf the amount of CPU and IO per operation increases after Postgres 15.15

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.000721        6.517   0.716   6.583   18.811  41531   15.15
0.000813        7.354   0.746   7.838   22.746  35497   16.11
0.000971        5.679   0.796   10.492  27.858  29700   17.7
0.000966        5.718   0.838   10.354  28.965  29289   18.1

--- relative to first result
1.13            1.13    1.04    1.19    1.21    0.85    16.11
1.35            0.87    1.11    1.59    1.48    0.72    17.7
1.34            0.88    1.17    1.57    1.54    0.71    18.1

For write-only_range=10000 the amount of CPU and IO per operation increases after Postgres 15.15

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.000784        5.881   0.799   7.611   26.835  30174   15.15
0.000920        6.315   0.85    10.033  32      26444   16.11
0.001003        5.883   0.871   11.147  33.142  25889   17.7
0.000999        5.905   0.891   10.988  34.443  25232   18.1

--- relative to first result
1.17            1.07    1.06    1.32    1.19    0.88    16.11
1.28            1.00    1.09    1.46    1.24    0.86    17.7
1.27            1.00    1.12    1.44    1.28    0.84    18.1

For read-write_range=100 the amount of CPU and IO per operation increases after Postgres 15.15

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.001358        7.756   1.758   25.654  22.615  31574   15.15
0.001649        8.111   1.86    30.99   35.413  22264   16.11
0.001629        7.609   1.856   29.901  28.681  25906   17.7
0.001646        7.484   1.978   29.456  29.138  25573   18.1

--- relative to first result
1.21            1.05    1.06    1.21    1.57    0.71    16.11
1.20            0.98    1.06    1.17    1.27    0.82    17.7
1.21            0.96    1.13    1.15    1.29    0.81    18.1

For read-write_range=10 the amount of CPU and IO per operation increases after Postgres 15.15

--- absolute
cpu/o           cs/o    r/o     rKB/o   wKB/o   o/s     dbms
0.000871        6.441   1.096   10.238  21.426  37141   15.15
0.001064        6.926   1.162   13.761  29.738  27540   16.11
0.001074        6.461   1.181   14.759  29.338  28839   17.7
0.001045        6.497   1.222   14.726  28.632  30431   18.1

--- relative to first result
1.22            1.08    1.06    1.34    1.39    0.74    16.11
1.23            1.00    1.08    1.44    1.37    0.78    17.7
1.20            1.01    1.11    1.44    1.34    0.82    18.1

The Importance of Realistic Benchmark Workloads

Unveiling the Limits: A Performance Analysis of MongoDB Sharded Clusters with plgm In any database environment, assumptions are the enemy of stability. Understanding the point at which a system transitions from efficient to saturated is essential for maintaining uptime and ensuring a consistent and reliable user experience. Identifying these limits requires more than estimation—it demands […]

Database Transactions

What are database transactions and how do SQL databases isolate one transaction from another?

January 13, 2026

BSON vs OSON: Different design goals

There was a discussion on LinkedIn comparing OSON (the binary JSON in Oracle Database) and BSON (the binary JSON in MongoDB). To be clear, BSON and OSON aren’t directly comparable because they serve different purposes:

  • BSON is a general-purpose binary serialization format designed for transport and storage efficiency, similar to Protobuf. It prioritizes fast encode/decode operations for network transmission and disk storage across diverse applications.
  • OSON is Oracle Database's internal JSON storage format, specifically engineered for database operations, with extensive metadata to enable efficient field querying and navigation without fully decoding the document.

They share two objectives but make different trade-offs:

  • Compactness Through Binary Encoding – Both formats achieve compactness without compression through binary encoding and type-specific representations. However, BSON prioritizes encoding speed with a straightforward binary representation, whereas OSON achieves greater compactness through local dictionary compression of field names within documents.
  • Partial Navigation Without Full Scanning – Both enable efficient partial document traversal, but with different approaches. BSON uses simple size prefixes to enable jumping between fields. A BSON document is meant to be stored as a single variable-size block (possible in the WiredTiger B-Tree). OSON implements comprehensive metadata structures—such as a tree segment and jumpable offsets—so that it can reduce random reads when stored in multiple fixed-size blocks (the Oracle Database storage model).

They differ in one major design objective regarding updates:

  • BSON follows MongoDB's "store together what is accessed together" principle, optimizing for full-document read patterns and full-document writes to disk as a single contiguous block, rather than in-place modifications. Standard BSON documents are not designed for efficient partial updates outside of an in-memory mutable object representation.
  • OSON is designed to optimize database operations, including in-place modifications without additional memory allocation.

I mentioned the Java driver in the previous post. The Oracle Java driver supports fast access via OracleJsonObject.get(), which avoids instantiating a new object and uses the internal metadata for navigation. I wanted to see how this works, but because the Oracle Database Java driver isn’t open source, I tried Python instead, where the Oracle driver is open source. However, it doesn’t provide an equivalent to OracleJsonObject.get(), so you must decode and encode to a Python dict, similar to BSON.

I ran the following program to measure the time and size to encode and decode BSON and OSON:


import time    
import bson    
import oracledb    
from bson.codec_options import CodecOptions    
from bson.raw_bson import RawBSONDocument    

# Prepare RawBSON codec options once for lazy BSON decoding    
raw_codec_options = CodecOptions(document_class=RawBSONDocument)    

def generate_large_document(num_fields, field_length):    
    long_str = "a" * field_length    
    return {f"field_{i+1}": long_str for i in range(num_fields)}    

def compare_bson_oson(document, connection, iterations=100):    
    """Compare BSON and OSON encode/decode, plus access after decode."""    

    middle_field_name = f"field_{len(document)//2}"    

    # Precompute sizes    
    bson_data_sample = bson.encode(document)    
    bson_size = len(bson_data_sample)    

    oson_data_sample = connection.encode_oson(document)    
    oson_size = len(oson_data_sample)    

    # Timers    
    bson_encode_total = 0.0    
    bson_decode_total = 0.0    
    bson_access_after_decode_total = 0.0    
    bson_decode_raw_total = 0.0    
    bson_access_raw_total = 0.0    

    oson_encode_total = 0.0    
    oson_decode_total = 0.0    
    oson_access_after_decode_total = 0.0    

    for _ in range(iterations):    
        # BSON encode    
        start = time.perf_counter()    
        bson_data = bson.encode(document)    
        bson_encode_total += (time.perf_counter() - start)    

        # BSON decode raw (construct RawBSONDocument)    
        start = time.perf_counter()    
        raw_bson_doc = RawBSONDocument(bson_data, codec_options=raw_codec_options)    
        bson_decode_raw_total += (time.perf_counter() - start)    

        # BSON access single field from raw doc    
        start = time.perf_counter()    
        _ = raw_bson_doc[middle_field_name]    
        bson_access_raw_total += (time.perf_counter() - start)    

        # BSON full decode    
        start = time.perf_counter()    
        decoded_bson = bson.decode(bson_data)    
        bson_decode_total += (time.perf_counter() - start)    

        # BSON access after full decode    
        start = time.perf_counter()    
        _ = decoded_bson[middle_field_name]    
        bson_access_after_decode_total += (time.perf_counter() - start)    

        # OSON encode    
        start = time.perf_counter()    
        oson_data = connection.encode_oson(document)    
        oson_encode_total += (time.perf_counter() - start)    

        # OSON full decode    
        start = time.perf_counter()    
        decoded_oson = connection.decode_oson(oson_data)    
        oson_decode_total += (time.perf_counter() - start)    

        # OSON access after full decode    
        start = time.perf_counter()    
        _ = decoded_oson[middle_field_name]    
        oson_access_after_decode_total += (time.perf_counter() - start)    

    return (    
        bson_encode_total,    
        bson_decode_total,    
        bson_access_after_decode_total,    
        bson_size,    
        bson_decode_raw_total,    
        bson_access_raw_total    
    ), (    
        oson_encode_total,    
        oson_decode_total,    
        oson_access_after_decode_total,    
        oson_size    
    )    

def run_multiple_comparisons():    
    iterations = 100    
    num_fields_list = [10, 100, 1000]    
    field_sizes_list = [1000, 10000, 100000]  # small length → large length    

    # Init Oracle client    
    oracledb.init_oracle_client(config_dir="/home/opc/Wallet")    
    connection = oracledb.connect(    
        user="franck",    
        password="My Strong P455w0rd",    
        dsn="orcl_tp"    
    )    

    print(f"{'Format':<6} {'Fields':<8} {'FieldLen':<10} "    
          f"{'Encode(s)':<12} {'Decode(s)':<12} {'Access(s)':<12} {'Size(bytes)':<12} "    
          f"{'DecRaw(s)':<12} {'AccRaw(s)':<12} "    
          f"{'EncRatio':<9} {'DecRatio':<9} {'SizeRatio':<9}")    
    print("-" * 130)    

    for field_length in field_sizes_list:    
        for num_fields in num_fields_list:    
            document = generate_large_document(num_fields, field_length)    
            bson_res, oson_res = compare_bson_oson(document, connection, iterations)    

            enc_ratio = oson_res[0] / bson_res[0] if bson_res[0] > 0 else 0    
            dec_ratio = oson_res[1] / bson_res[1] if bson_res[1] > 0 else 0    
            size_ratio = oson_res[3] / bson_res[3] if bson_res[3] > 0 else 0    

            # BSON row    
            print(f"{'BSON':<6} {num_fields:<8} {field_length:<10} "    
                  f"{bson_res[0]:<12.4f} {bson_res[1]:<12.4f} {bson_res[2]:<12.4f} {bson_res[3]:<12} "    
                  f"{bson_res[4]:<12.4f} {bson_res[5]:<12.4f} "    
                  f"{'-':<9} {'-':<9} {'-':<9}")    

            # OSON row    
            print(f"{'OSON':<6} {num_fields:<8} {field_length:<10} "    
                  f"{oson_res[0]:<12.4f} {oson_res[1]:<12.4f} {oson_res[2]:<12.4f} {oson_res[3]:<12} "    
                  f"{'-':<12} {'-':<12} "    
                  f"{enc_ratio:<9.2f} {dec_ratio:<9.2f} {size_ratio:<9.2f}")    

    connection.close()    

if __name__ == "__main__":    
    run_multiple_comparisons()    

I got the following results:

  • Encode(s) — total encode time over 100 iterations.
  • Decode(s) — full decode into Python objects (dict).
  • Access(s) — access to a field from Python objects (dict).
  • DecRaw(s) — creation of a RawBSONDocument for BSON (no equivalent for OSON).
  • AccRaw(s) — single middle‑field access from a raw document (lazy decode for BSON).
  • Ratios — OSON time / BSON time.

$ TNS_ADMIN=/home/opc/Wallet python bson-oson.py  
Format Fields   FieldLen   Encode(s)    Decode(s)    Access(s)    Size(bytes)  DecRaw(s)    AccRaw(s)    EncRatio  DecRatio  SizeRatio  
----------------------------------------------------------------------------------------------------------------------------------  
BSON   10       1000       0.0005       0.0005       0.0000       10146        0.0002       0.0011       -         -         -  
OSON   10       1000       0.0013       0.0006       0.0000       10206        -            -            2.44      1.26      1.01  
BSON   100      1000       0.0040       0.0043       0.0000       101497       0.0012       0.0101       -         -         -  
OSON   100      1000       0.0103       0.0053       0.0000       102009       -            -            2.58      1.25      1.01  
BSON   1000     1000       0.0422       0.0510       0.0000       1015898      0.0098       0.0990       -         -         -  
OSON   1000     1000       0.1900       0.0637       0.0000       1021912      -            -            4.50      1.25      1.01  
BSON   10       10000      0.0019       0.0017       0.0000       100146       0.0005       0.0025       -         -         -  
OSON   10       10000      0.0045       0.0021       0.0000       100208       -            -            2.36      1.27      1.00  
BSON   100      10000      0.0187       0.0177       0.0000       1001497      0.0026       0.0225       -         -         -  
OSON   100      10000      0.1247       0.0241       0.0000       1002009      -            -            6.66      1.36      1.00  
BSON   1000     10000      0.2709       0.2439       0.0001       10015898     0.0235       0.2861       -         -         -  
OSON   1000     10000      14.4215      0.3185       0.0001       10021912     -            -            53.23     1.31      1.00  

Important nuance: In Python, oracledb.decode_oson() yields a standard dict, so we cannot measure lazy access as we can with the Java driver’s OracleJsonObject.get() method, which can skip object instantiation. We measured it for one field from the raw BSON to show that there is a cost, which is higher than accessing the dict, though still less than a microsecond for large documents. In general, since you store together what is accessed together, it often makes sense to decode to an application object.

Encoding OSON is slower than BSON, especially for large documents, by design—because it computes navigation metadata for faster reads—whereas BSON writes a contiguous field stream. For the largest case, encoding takes ~15 seconds over 100 iterations, which translates to milliseconds per operation.

Decoding BSON is marginally faster than OSON, but the difference is negligible since all decoding times are under a millisecond. OSON’s extra metadata helps mainly when reading a few fields from a large document, as it avoids instantiating an immutable object.

Raw BSON provides faster "decoding" (as it isn’t actually decoded), but slower field access. Still, this difference—less than a millisecond—is negligible except when accessing many fields, in which case you should decode once to a Python dict.

This test illustrates the different design goals of BSON and OSON. I used the Python driver to illustrage what an application does: get a document from a query and manipulate as an application object. On the server, it is different, and queries may modify a single field in a document. OSON will do it directly on the OSON datatype, as it has all metadata, whereas BSON will be accessed though mutable BSON object.

AWS Organizations now supports upgrade rollout policy for Amazon Aurora and Amazon RDS automatic minor version upgrades

AWS Organizations now supports an upgrade rollout policy, a new capability that provides a streamlined solution for managing automatic minor version upgrades across your database fleet. This feature supports Amazon Aurora MySQL-Compatible Edition and Amazon Aurora PostgreSQL-Compatible Edition and Amazon RDS database engines MySQL, PostgreSQL, MariaDB, SQL Server, Oracle, and Db2. It eliminates the operational overhead of coordinating upgrades across hundreds of resources and accounts while validating changes in less critical environments before reaching production. In this post, we explore how upgrade rollout policy works, its key benefits, and how you can use it to implement a systematic approach to database maintenance across your organization.