January 20, 2026
January 19, 2026
LLMs and your career
The most conservative way to build a career as a software developer is 1) to be practical and effective at problem solving but 2) not to treat all existing code as a black box. 1 means that as a conservative developer you should generally use PostgreSQL or MySQL (or whatever existing database), Rails or .NET (or whatever existing framework), and adapt code from Stack Overflow or LLMs. 2 means that you're curious and work over time to better understand how web servers and databases and operating systems and the browser actually work so that you can make better decisions for your own problems as you adapt other people's code and ideas.
Coding via LLM is not fundamentally different from coding with Rails or coding by perusing Stack Overflow. It's faster and more direct but it's still potentially just a human mindlessly adapting existing code.
The people who were only willing to look at existing frameworks and libraries and applications as black boxes were already not the most competitive when it came to finding and retaining work. And on the other hand, the most technically interesting companies always wanted to hire developers who understood fundamentals because they're 1) operating at such a scale that the way the application is written matters or they're 2) building PostgreSQL or MySQL or Rails or .NET or Stack Overflow or LLMs, etc.
The march of software has always been to reduce the need for (ever larger sizes of) SMBs (and teams within non-SMBs) to hire developers to solve problems or increase productivity. LLMs are part of that march. That doesn't change that at some point companies (or teams) need to hire developers because the business or its customer base has become too complex or too large.
The jobs that were dependent on fundamentals of software aren't going to stop being dependent on fundamentals of software. And if more non-developers are using LLMs it's going to mean all the more stress on tools and applications and systems that rely on fundamentals of software.
All of this is to say that if you like doing software development, I don't think interesting software development jobs are going to go away. So keep learning and keep building compilers and databases and operating systems and keep looking for companies that have compiler and database and operating system products, or companies with other sorts of interesting problems where fundamentals matter due to their scale.
LLMs and your career pic.twitter.com/lxu1HLF2LC
— Phil Eaton (@eatonphil) January 19, 2026
January 18, 2026
MongoDB compared to Oracle Database Maximum Availability Architecture
Users looking at Oracle alternatives see the limitations of PostgreSQL in terms of high availability and ask:
Will MongoDB give me the same high availability (HA), disaster recovery (DR), and zero/near-zero data loss I’ve relied on in Oracle’s Maximum Availability Architecture (MAA)?
I previously compared Oracle Maximum Availability Architecture with YugabyteDB’s built-in high availability features here. I've done the same with MongoDB, which also offers built-in high availability though Raft-based replication to quorum, but for a document-model application rather than SQL.
Oracle MAA and MongoDB
Oracle DBAs are used to a mature, integrated stack—RAC, Data Guard, GoldenGate, Flashback, and RMAN—refined for decades to work together in the MAA framework. MongoDB instead takes a cloud-native, distributed approach, embedding replication, failover, and sharding directly in its core engine rather than through separate options. While the technologies differ—MongoDB uses pull-based logical replication instead of broadcasting physical log changes—they pursue the same goals.
MongoDB vs Oracle MAA Levels
| MongoDB | Oracle🥉 MAA Bronze | Oracle🥈 MAA Silver | Oracle🥇 MAA Gold | Oracle💰 MAA Platinum | |
|---|---|---|---|---|---|
| Main Goal | Cloud-Native Resilience | Backup | MAA Bronze + HA | MAA Silver + DR | MAA Gold + Multi-Master |
| HA/DR Product | Native in MongoDB core: replica sets, sharding, multi-region, backups | RMAN | RAC | RAC + Data Guard (ADG) | RAC + Data Guard + GoldenGate |
| RPO (Recovery Point Objective) | 0 with writeConcern: "w:majority" (sync to quorum), which is the default | Minutes (archivelog backups) | Same as Bronze (no extra DB protection) | 0 with Maximum Protection0 or more with Maximum Availability | Seconds if tuned with low replication lag |
| RTO (Recovery Time Objective) | 5–15 seconds for automatic primary election | Hours–Days depending on database size | Seconds for server crash, but DB is not protected | Minutes with FSFO (Sync) | Seconds if no conflicts in active-active failover |
| Corrupt Block Detection | Automatic via storage engine checksums and replica healing | RMAN detection + manual blockrecover | No additional protection | Automatic with lost-write detection (ADG) | No additional protection beyond Gold |
| Online Upgrade | Zero downtime via rolling upgrades | Minutes–hours for major releases | Some minor rolling upgrades (OS/patch) | Near-zero downtime with transient logical | Yes, if GoldenGate active-active is configured correctly |
| Human Error Mitigation | Point-in-time restores from Atlas backups or filesystem snapshots | Flashback Database, Snapshot Standby, LogMiner, Flashback Query, Flashback Transaction, dbms_rolling, dbms_redefinition, ASM | Same + RAC | Same + ADG | Similar to Gold (GoldenGate can replay changes) |
| Online DDL | Non-blocking schema changes (document model) | No transactional DDL; EBR (Edition-Based Redefinition) | Same as Bronze | Same as Bronze | Same as Gold |
| Active-Active | Via sharded clusters or Atlas Global Clusters; shard-key design avoids conflicts | No — only standby server offline unless licensed | RAC active-active within the same datacenter | No — standby is read-only (Active DG) | Yes — GoldenGate active-active, requires conflict handling |
| Application Continuity | Automatic failover with drivers + retryable writes | CMAN proxy | Transparent failover + Transaction Guard (RAC) | Transparent failover (Data Guard) | Transparent failover + conflict handling |
| Complexity | Low (built-in distributed features) | Low | High (clusterware) | Medium (RAC + DG) | High (RAC + DG + GoldenGate) |
| Price (Options) | EE (~$50K/CPU) | EE + 50% (RAC option) | EE + RAC + 25% (ADG option) | EE + RAC + ADG + 75% (GoldenGate; often ~2.5× EE base) |
Understanding Oracle MAA Levels
Oracle’s Maximum Availability Architecture (MAA) is organized into four tiers, each building on the previous:
- 🥉 Bronze – Backup Only: RMAN online backups, archived log retention, manual restore. RPO in minutes to hours, RTO in hours or days.
- 🥈 Silver – Bronze + High Availability: Add Real Application Clusters (RAC) for server-level HA and limited rolling upgrades. Very low RTO for server failures but no additional database protection beyond Bronze’s backups.
- 🥇 Gold – Silver + Disaster Recovery: Add Data Guard (often Active Data Guard) for synchronous or asynchronous failover to remote standby. With FSFO (Fast Start Failover) in sync mode, RPO can be zero, RTO in minutes.
- 💰 Platinum – Gold + Multi-Master: Add GoldenGate logical replication for active-active writes across sites and versions. Enables global presence but requires manual conflict handling and careful design.
In Oracle’s world, HA/DR is not one thing — it’s a stack of separate licensed products you combine according to your needs and budget. The complexity and cost rise quickly as you move up the tiers.
Focus on features rather than product names, which can be misleading. For example, Autonomous Data Guard (RPO > 0, no real-time query, no rolling upgrade) in the Autonomous Database managed service differs from Active Data Guard, which provides all features of MAA Gold, but is not part of the managed service.
How MongoDB Builds HA and DR Natively
MongoDB’s availability model takes a fundamentally different approach: it is a distributed database by design, with no single authoritative copy bottleneck like a monolithic RDBMS. Replication, failover, sharding, and multi-region deployment are built in, not optional add-ons.
Key components:
- Replica Sets – Every MongoDB production deployment runs as a replica set by default.
- Automatic leader (primary) election in 5–30 seconds if a node fails.
- Tunable write concern for RPO control (e.g., "w:majority" for RPO=0).
- Flow control to throttle writes when majority committed replication lag reaches a threshold to prevent secondaries from falling too far behind
- Sharding – Horizontal partitioning with each shard itself being a replica set.
- Supports scaling out writes and reads while maintaining HA per shard.
- Multi-Region Clusters – Place replica set members in multiple regions/zones for DR.
- Writes propagate to remote members, optionally synchronously (Atlas global writes).
- Backups & Point-in-Time Restore – Atlas offers continuous cloud backups with PIT restore. Self-managed deployments use snapshots + Oplog replay.
- Automatic Failover & Driver-Level Continuity – Client drivers reconnect automatically. Retryable writes can resume after failover without application-level intervention.
- Rolling Upgrades Without Downtime – Upgrade nodes one at a time, keeping the cluster online.
This means that what takes three separate licensed Oracle products to achieve in Platinum is, with MongoDB, simply a topology decision you make when deploying the cluster.
Key Takeaways
- Oracle MAA levels are additive—Bronze, Silver, Gold, and Platinum. Each higher tier requires extra products and adds complexity for on‑premises deployments, and not all tiers are available in managed services. They are very mature for enterprises, but you should not underestimate the operational cost.
- MongoDB provides built-in HA and DR. Replication, failover, sharding, and multi-region deployments are topology settings that simplify operations. However, configuration still matters: consistency is tunable per application, and settings must align with your infrastructure’s capabilities.
- RPO and RTO targets achievable with Oracle Platinum are also achievable with MongoDB. The main difference is flashback capabilities, so you must tightly control production access and ensure all changes are automated and tested first in pre-production.
- MongoDB rolling upgrades eliminate downtime for routine maintenance—a major change from traditional monolithic upgrade windows. Avoid running outdated versions, as was common with legacy databases.
- Global write scenarios are possible in both, but MongoDB’s sharded architecture can automatically avoid conflicts (ACID properties over the cluster, causal consistency)
January 16, 2026
Rethinking the University in the Age of AI
Three years ago, I wrote a post titled "Getting schooled by AI, colleges must evolve". I argued that as we entered the age of AI, the value of "knowing" was collapsing, and the value of "doing" was skyrocketing. (See Bloom's taxonomy.)
Today, that future has arrived. Entry-level hiring has stalled because AI agents absorb the small tasks where new graduates once learned the craft.
So how do we prepare students for this reality? Not only do I stand by my original advice, I am doubling down. Surviving this shift requires more than minor curriculum tweaks; it requires a different philosophy of education. I find two old ideas worth reviving: a systems design mindset that emphasizes holistic foundations, and alternative education philosophies of the 1960s that give students real agency and real responsibility.
Holistic Foundations
Three years ago, I begged departments: "Don't raise TensorFlow disk jockeys. Teach databases! Teach compilers! Teach system design!" This is even more critical today. Jim Waldo, in his seminal essay On System Design, warned us that we were teaching "programming" (the act of writing code) rather than "system design" (the art of structuring complexity).
AI may have solved the tedious parts of programming but it has not solved system design. To master system design, students must understand the layers both beneath and above what AI touches. Beneath the code layer lie compilers and formal methods. Compilers matter not so students can build languages, but so they understand cost models, optimization, and the real behavior of generated code. Formal methods matter not to turn everyone into a theorem prover, but to reason about safety, liveness, and failure using invariants. When an AI suggests a concurrent algorithm, students must be able to sense/red-flag the race conditions.
Above the code layer lies system design. As Waldo observed, the best architects are synthesizers. They glue components together without breaking reliability, a skill that remains stubbornly human. System design lives in an open world of ambiguity, trade-offs, incentives, and failure modes that are never written down. This is why "mutts" with hybrid background often excel in system design. Liberal arts train you to reason about systems made of humans, incentives, and the big picture. Waldo's point rings true. Great system designers understand the environment/context first, then map that understanding into software.
Radical Agency
My original post called for flipped classrooms and multi-year projects. These still assume a guided path. We may need to go further and embrace the philosophy of the "Summerhill" model, where the school adapts to the learner, rather than the learner fitting the school. In an age of abundant information, the scarce resource is judgment. Universities must stop micromanaging learning and start demanding ownership. The only metric that matters is learning to learn.
Replace tests with portfolios. Industry hires from portfolios, not from grades. Let students use AI freely, you cannot police it anyway. If a student claims they built a system, make them defend it in a rigorous, face-to-face oral exam using the Harkness method. Ask them to explain, how the system behaves under load, under failure, and at the corner cases.
In 2023, I argued for collaboration and writing. In 2026, that is the whole game. Collaboration now beats individual brilliance. Writing is a core technical skill. A clear specification is fast becoming equivalent to correct code. Students who cannot write clearly cannot think clearly.
I also argued for ownership-driven team projects. To make them real, universities must put skin in the game. Schools should create a small seed fund. The deal is simple: $20,000 for 1% equity in student-run startups. The goal is not financial return (which I think will materialize), but contact with reality. Students must incorporate, manage equity, run meetings, and make decisions with consequences.
Deploying Percona Operator for MySQL with OpenTaco for IaC Automation
January 15, 2026
Percona Operator for PostgreSQL 2025 Wrap Up and What We Are Focusing on Next
Announcing Percona ClusterSync for MongoDB: The Open Source Trail to Freedom
When to use Read Replicas vs. bigger compute
January 14, 2026
Debugging regressions with Postgres in IO-bound sysbench
I explained in this post that there is a possible performance regression for Postgres with IO-bound sysbench. It arrived in Postgres 16 and remains in Postgres 18. I normally run sysbench with a cached database, but I had a spare server so I repeated tests with an IO-bound workload.
The bad news for me is that I need to spend more time explaining the problem. The good news for me is that I learn more about Postgres by doing this. And to be clear, I have yet to explain the regression but this post documents my debugging efforts.
sysbench
This post explains how I use sysbench. Note that modern sysbench is a framework for running benchmarks and it comes with many built-in tests. By framework I mean that it includes much plumbing for several DBMS and you can write tests in Lua with support for the Lua JIT so the client side of the tests use less CPU.
The sysbench framework has been widely used in the MySQL community for a long time. Originally it hard-coded one (or perhaps a few tests) and in too many cases by sysbench people mean the classic (original) sysbench transaction that is a mix of point queries, range queries and writes. The classic transaction is now implemented by oltp_read_write.lua with help from oltp_common.lua.
The oltp_read_write.lua test is usually not good at detecting regressions but in this case (the problem motivating me to write this blog post) it has detected the regression.
How to find regressions
I have opinions about how to run DBMS benchmarks. For example, be careful about running read-only benchmarks for an LSM because the state (shape) of the LSM tree can have a large impact on CPU overhead and the LSM state (shape) might be in the same (bad or good) state for the duration of the test. With read-write tests the LSM state should cycle between better and worse states.
But traditional b-tree storage engines also have states. One aspect of that is MVCC debt -- write-heavy tests increase the debt and the engine doesn't always do a great job of limiting that debt. So I run sysbench tests in a certain order to both create and manage MVCC debt while trying to minimize noise.
The order of the tests is listed here and the general sequence is:
- Load and index the tables
- Run a few read-only tests to let MVCC debt get reduced if it exists
- Optionally run more read-only tests (I usually skip these)
- Run write-heavy tests
- Do things to reduce MVCC debt (see here for Postgres and MySQL)
- Run read-only tests (after the tables+indexes have been subject to random writes)
- Run delete-only and then insert-only tests
- update-zipf in Postgres 16, 17 and 18
- write-only in Postgres 16, 17 and 18
- read-write in Postgres 16, 17 and 18
- insert in Postgres 18 - this reproduces in 18.0 and 18.1
I wrote useful but messy Bash scripts to make it easier to run and explain sysbench results. One of the things I do is collect results from vmstat and iostat per test and then summarize average and normalized values from them where normalized values are: (avg from iostat or vmstat / QPS).
From the results below I see that from Postgres 17.7 to 18.1
- throughput decreases by ~20% in 18.1
- cpu/o increased by ~17% in 18.1
- cs/o increased by ~2.6% in 18.1
- r/o increased by ~14% in 18.1
- rKB/o increased by ~22% in 18.1
- wKB/o increased by ~27% in 18.1
* cpu/o - CPU per operation. This includes us and sy from vmstat but
The columns with a non-zero difference for Postgres 17.7 are:
- There is no regression for write-only and read-write when I change the test order so these run prior to update-one and update-zipf
- There is a regression if either or both of update-one and update-zip are run prior to write-only and read-write
Note that the update-one and update-zipf tests are a bit awkward. For the update-one test all updates are limited to one row per table (the first row in the table). And for the update-zipf test a zipfian distribution is used to select the rows to update. So in both cases a small number of rows receive most of the updates.
Results from pg_stat_all_tables collected immediately prior to update-one and then after update-zipf are here for Postgres 15.15 and 16.11. Then I computed the difference as (results after insert test - results after delete test).
The Importance of Realistic Benchmark Workloads
Database Transactions
January 13, 2026
BSON vs OSON: Different design goals
There was a discussion on LinkedIn comparing OSON (the binary JSON in Oracle Database) and BSON (the binary JSON in MongoDB). To be clear, BSON and OSON aren’t directly comparable because they serve different purposes:
- BSON is a general-purpose binary serialization format designed for transport and storage efficiency, similar to Protobuf. It prioritizes fast encode/decode operations for network transmission and disk storage across diverse applications.
- OSON is Oracle Database's internal JSON storage format, specifically engineered for database operations, with extensive metadata to enable efficient field querying and navigation without fully decoding the document.
They share two objectives but make different trade-offs:
- Compactness Through Binary Encoding – Both formats achieve compactness without compression through binary encoding and type-specific representations. However, BSON prioritizes encoding speed with a straightforward binary representation, whereas OSON achieves greater compactness through local dictionary compression of field names within documents.
- Partial Navigation Without Full Scanning – Both enable efficient partial document traversal, but with different approaches. BSON uses simple size prefixes to enable jumping between fields. A BSON document is meant to be stored as a single variable-size block (possible in the WiredTiger B-Tree). OSON implements comprehensive metadata structures—such as a tree segment and jumpable offsets—so that it can reduce random reads when stored in multiple fixed-size blocks (the Oracle Database storage model).
They differ in one major design objective regarding updates:
- BSON follows MongoDB's "store together what is accessed together" principle, optimizing for full-document read patterns and full-document writes to disk as a single contiguous block, rather than in-place modifications. Standard BSON documents are not designed for efficient partial updates outside of an in-memory mutable object representation.
- OSON is designed to optimize database operations, including in-place modifications without additional memory allocation.
I mentioned the Java driver in the previous post. The Oracle Java driver supports fast access via OracleJsonObject.get(), which avoids instantiating a new object and uses the internal metadata for navigation. I wanted to see how this works, but because the Oracle Database Java driver isn’t open source, I tried Python instead, where the Oracle driver is open source. However, it doesn’t provide an equivalent to OracleJsonObject.get(), so you must decode and encode to a Python dict, similar to BSON.
I ran the following program to measure the time and size to encode and decode BSON and OSON:
import time
import bson
import oracledb
from bson.codec_options import CodecOptions
from bson.raw_bson import RawBSONDocument
# Prepare RawBSON codec options once for lazy BSON decoding
raw_codec_options = CodecOptions(document_class=RawBSONDocument)
def generate_large_document(num_fields, field_length):
long_str = "a" * field_length
return {f"field_{i+1}": long_str for i in range(num_fields)}
def compare_bson_oson(document, connection, iterations=100):
"""Compare BSON and OSON encode/decode, plus access after decode."""
middle_field_name = f"field_{len(document)//2}"
# Precompute sizes
bson_data_sample = bson.encode(document)
bson_size = len(bson_data_sample)
oson_data_sample = connection.encode_oson(document)
oson_size = len(oson_data_sample)
# Timers
bson_encode_total = 0.0
bson_decode_total = 0.0
bson_access_after_decode_total = 0.0
bson_decode_raw_total = 0.0
bson_access_raw_total = 0.0
oson_encode_total = 0.0
oson_decode_total = 0.0
oson_access_after_decode_total = 0.0
for _ in range(iterations):
# BSON encode
start = time.perf_counter()
bson_data = bson.encode(document)
bson_encode_total += (time.perf_counter() - start)
# BSON decode raw (construct RawBSONDocument)
start = time.perf_counter()
raw_bson_doc = RawBSONDocument(bson_data, codec_options=raw_codec_options)
bson_decode_raw_total += (time.perf_counter() - start)
# BSON access single field from raw doc
start = time.perf_counter()
_ = raw_bson_doc[middle_field_name]
bson_access_raw_total += (time.perf_counter() - start)
# BSON full decode
start = time.perf_counter()
decoded_bson = bson.decode(bson_data)
bson_decode_total += (time.perf_counter() - start)
# BSON access after full decode
start = time.perf_counter()
_ = decoded_bson[middle_field_name]
bson_access_after_decode_total += (time.perf_counter() - start)
# OSON encode
start = time.perf_counter()
oson_data = connection.encode_oson(document)
oson_encode_total += (time.perf_counter() - start)
# OSON full decode
start = time.perf_counter()
decoded_oson = connection.decode_oson(oson_data)
oson_decode_total += (time.perf_counter() - start)
# OSON access after full decode
start = time.perf_counter()
_ = decoded_oson[middle_field_name]
oson_access_after_decode_total += (time.perf_counter() - start)
return (
bson_encode_total,
bson_decode_total,
bson_access_after_decode_total,
bson_size,
bson_decode_raw_total,
bson_access_raw_total
), (
oson_encode_total,
oson_decode_total,
oson_access_after_decode_total,
oson_size
)
def run_multiple_comparisons():
iterations = 100
num_fields_list = [10, 100, 1000]
field_sizes_list = [1000, 10000, 100000] # small length → large length
# Init Oracle client
oracledb.init_oracle_client(config_dir="/home/opc/Wallet")
connection = oracledb.connect(
user="franck",
password="My Strong P455w0rd",
dsn="orcl_tp"
)
print(f"{'Format':<6} {'Fields':<8} {'FieldLen':<10} "
f"{'Encode(s)':<12} {'Decode(s)':<12} {'Access(s)':<12} {'Size(bytes)':<12} "
f"{'DecRaw(s)':<12} {'AccRaw(s)':<12} "
f"{'EncRatio':<9} {'DecRatio':<9} {'SizeRatio':<9}")
print("-" * 130)
for field_length in field_sizes_list:
for num_fields in num_fields_list:
document = generate_large_document(num_fields, field_length)
bson_res, oson_res = compare_bson_oson(document, connection, iterations)
enc_ratio = oson_res[0] / bson_res[0] if bson_res[0] > 0 else 0
dec_ratio = oson_res[1] / bson_res[1] if bson_res[1] > 0 else 0
size_ratio = oson_res[3] / bson_res[3] if bson_res[3] > 0 else 0
# BSON row
print(f"{'BSON':<6} {num_fields:<8} {field_length:<10} "
f"{bson_res[0]:<12.4f} {bson_res[1]:<12.4f} {bson_res[2]:<12.4f} {bson_res[3]:<12} "
f"{bson_res[4]:<12.4f} {bson_res[5]:<12.4f} "
f"{'-':<9} {'-':<9} {'-':<9}")
# OSON row
print(f"{'OSON':<6} {num_fields:<8} {field_length:<10} "
f"{oson_res[0]:<12.4f} {oson_res[1]:<12.4f} {oson_res[2]:<12.4f} {oson_res[3]:<12} "
f"{'-':<12} {'-':<12} "
f"{enc_ratio:<9.2f} {dec_ratio:<9.2f} {size_ratio:<9.2f}")
connection.close()
if __name__ == "__main__":
run_multiple_comparisons()
I got the following results:
-
Encode(s)— total encode time over 100 iterations. -
Decode(s)— full decode into Python objects (dict). -
Access(s)— access to a field from Python objects (dict). -
DecRaw(s)— creation of aRawBSONDocumentfor BSON (no equivalent for OSON). -
AccRaw(s)— single middle‑field access from a raw document (lazy decode for BSON). - Ratios — OSON time / BSON time.
$ TNS_ADMIN=/home/opc/Wallet python bson-oson.py
Format Fields FieldLen Encode(s) Decode(s) Access(s) Size(bytes) DecRaw(s) AccRaw(s) EncRatio DecRatio SizeRatio
----------------------------------------------------------------------------------------------------------------------------------
BSON 10 1000 0.0005 0.0005 0.0000 10146 0.0002 0.0011 - - -
OSON 10 1000 0.0013 0.0006 0.0000 10206 - - 2.44 1.26 1.01
BSON 100 1000 0.0040 0.0043 0.0000 101497 0.0012 0.0101 - - -
OSON 100 1000 0.0103 0.0053 0.0000 102009 - - 2.58 1.25 1.01
BSON 1000 1000 0.0422 0.0510 0.0000 1015898 0.0098 0.0990 - - -
OSON 1000 1000 0.1900 0.0637 0.0000 1021912 - - 4.50 1.25 1.01
BSON 10 10000 0.0019 0.0017 0.0000 100146 0.0005 0.0025 - - -
OSON 10 10000 0.0045 0.0021 0.0000 100208 - - 2.36 1.27 1.00
BSON 100 10000 0.0187 0.0177 0.0000 1001497 0.0026 0.0225 - - -
OSON 100 10000 0.1247 0.0241 0.0000 1002009 - - 6.66 1.36 1.00
BSON 1000 10000 0.2709 0.2439 0.0001 10015898 0.0235 0.2861 - - -
OSON 1000 10000 14.4215 0.3185 0.0001 10021912 - - 53.23 1.31 1.00
Important nuance: In Python, oracledb.decode_oson() yields a standard dict, so we cannot measure lazy access as we can with the Java driver’s OracleJsonObject.get() method, which can skip object instantiation. We measured it for one field from the raw BSON to show that there is a cost, which is higher than accessing the dict, though still less than a microsecond for large documents. In general, since you store together what is accessed together, it often makes sense to decode to an application object.
Encoding OSON is slower than BSON, especially for large documents, by design—because it computes navigation metadata for faster reads—whereas BSON writes a contiguous field stream. For the largest case, encoding takes ~15 seconds over 100 iterations, which translates to milliseconds per operation.
Decoding BSON is marginally faster than OSON, but the difference is negligible since all decoding times are under a millisecond. OSON’s extra metadata helps mainly when reading a few fields from a large document, as it avoids instantiating an immutable object.
Raw BSON provides faster "decoding" (as it isn’t actually decoded), but slower field access. Still, this difference—less than a millisecond—is negligible except when accessing many fields, in which case you should decode once to a Python dict.
This test illustrates the different design goals of BSON and OSON. I used the Python driver to illustrage what an application does: get a document from a query and manipulate as an application object. On the server, it is different, and queries may modify a single field in a document. OSON will do it directly on the OSON datatype, as it has all metadata, whereas BSON will be accessed though mutable BSON object.