March 10, 2025
Publish and Perish: Why Ponder Stibbons Left the Ivory Tower
(With apologies to Terry Pratchett)
Ponder Stibbons sat in the shadowy recesses of the Uncommon Room, clutching a lukewarm cup of tea and the last vestiges of his patience. Across from him, the Dean, wrapped in his usual self-satisfaction, puffed his pipe and surveyed the room as if it were all a great and glorious joke that he alone understood.
"Leaving?" the Dean spluttered, when Ponder finally managed to slide the conversation in that direction. "What do you mean, leaving? Where would you even go?"
"Industry," said Ponder, trying not to make it sound like a curse. "Databases. Big ones."
The Dean blinked. "But we have databases here. The Archchancellor’s hat alone contains centuries of magical indexing..."
"Yes, but in the real world, we use them to actually store and retrieve information, not just argue about what information is worth storing in the first place," Ponder snapped. "And I’ll be paid properly."
"Pah," scoffed the Dean. "The true academic mind isn’t in it for the money."
"Yes, I noticed that," Ponder said. "But, funnily enough, the people running the place seem to be. We have more suits than I can count, and none of them seem to know how to count. We used to do research, but now we do ‘strategic impact planning’ and ‘value-oriented magical development initiatives’."
The Dean made a valiant effort at looking wise. "Well, you have to admit, Stibbons, there’s no point in just doing magic for magic’s sake."
"Then what’s the point of this place?" Ponder snapped. "If we’re not here to push the boundaries of knowledge, then we’re just a very exclusive, very inefficient vocational school for people too dim-witted to teach themselves. The students don’t want to learn magic, they want to finish a course of study in three semesters so they can get a cushy post enchanting chairs for the Patrician’s office."
The Dean waved a hand vaguely. "That’s always been the case."
"It wasn’t always quite this blatant," Ponder retorted. "Back in the day, we had time to actually think about things, research things, discuss things. Now it’s all quotas, reviews, assessments! And if the students don’t like you, they file complaints! Whining course reviews! As if the study of magic were meant to be… comfortable! We used to throw fireballs at people! Now we have to ‘create a nurturing environment for knowledge acquisition’."
"Ah," said the Dean, puffing his pipe. "Well, times change."
"Not here, they don’t," said Ponder bitterly. "That’s the whole problem. The world outside moves forward while we shuffle our papers and complain that we don’t have enough funding to investigate basic transmutation spells."
The Dean chuckled. "You always were an idealist, Stibbons. Research takes time. Why, I myself have been preparing to write a paper on the comparative metaphysical mass of a thaum for—"
"Twenty years!" snapped Ponder. "And you’ve produced nothing!"
"Ah, but when I do…" The Dean waggled his fingers mysteriously.
"And even if you did, it wouldn’t get published," Ponder continued. "Do you know what happens to papers these days? You send them off, wait a year, and then get them back covered in cryptic, punishing comments from anonymous reviewers who seem to hate the very idea of scholarship."
"Ah yes, peer review!" said the Dean enthusiastically. "Ensuring the highest quality!"
"Ensuring nobody gets anything done!" Ponder snapped. "It’s not review, it’s ritualized academic hazing. Half the time the reviewers are just competitors trying to torpedo your work so their own gets published first. It’s like trying to hold a debate where the other side gets to set your chair on fire before you start speaking."
"Ah, well," the Dean said philosophically. "At least we all suffer equally."
"No, we don’t!" Ponder threw up his hands. "It’s a zero-sum game! Everyone hoards their ideas because collaboration just means giving someone else ammunition to shoot down your next grant proposal! We’re supposed to be discovering new knowledge, and instead we spend all our time writing meticulously worded rebuttals to complaints about our font choice."
"Well, I suppose that’s just how academia works," the Dean said, smiling in what he clearly thought was a reassuring way.
Ponder sighed. "You know what’s coming, don’t you? The golems are already doing spellwork. Automated spell matrices. Soon, students won’t even need to be here. They’ll just buy the knowledge in convenient pre-packaged modules."
"Nonsense," the Dean huffed. "You can’t replace a wizard with a machine."
"That’s exactly what I thought," said Ponder. "Until I have seen them in the wild. It turned out you very well can."
The Dean froze. "What?"
"Oh yes. A spell-engine that automates theorem derivation and spell stabilization. Does in minutes what takes us years. Accurate, tireless, and best of all, doesn’t insist on being called ‘Professor’."
The Dean was pale. "But… but the prestige, Stibbons! The robes! The titles! The long lunches!"
"Industry has long lunches too," said Ponder. "And they reimburse you in two days instead of two months."
The Dean looked positively ill. "And you’re going to… what? Spend your days solving real problems? Making a tangible difference?"
"Yes."
"But… how will you cope?" The Dean’s voice was faint. "Without the endless meetings? The unread grant proposals? The departmental infighting over whose name goes first on a paper no one will read?"
Ponder stood, dusted off his robes, and picked up his satchel. "I think I’ll manage."
March 09, 2025
Comparing Execution Plans: MongoDB vs. Compatible APIs
MongoDB is the standard API for document databases, and some cloud providers have created services with a similar API. AWS and Azure call theirs 'DocumentDB', and Oracle provides the MongoDB API as a proxy on top of its SQL database. These databases offer a subset of features from past versions of MongoDB, but user experience and performance are also crucial.
Oracle claims better performance, but their tests lack real-world queries. I will utilize their "autonomous" managed service to compare execution plans and identify which minimizes unnecessary reads in a common OLTP use case - where clause and pagination.
TL;DR: MongoDB has better performance, and more indexing possibilities.
Document Model (Order - Order Detail)
I used a simple schema of orders and order lines, ideal for a document model. I illustrated it with UML notation to distinguish strong and weak entities, representing the association as a composition (⬧-).
In a SQL database, a one-to-many relationship between orders and line items requires two tables because the the first normal form. In MongoDB, a composition relationship allows the weak entity (Order Detail) to be embedded within the strong entity (Order) as an array, simplifying data management and enhancing performance, as we will see when indexing it.
I will insert orders with few attributes. The country and creation date are fields in Order. The line number, product, and quantity are fields of Detail, which is an embedded array in Order:
+--------------------------+
| Order |
+--------------------------+
| country_id: Number |
| created_at: Date |
| details: Array |
| +----------------------+ |
| | Detail | |
| +----------------------+ |
| | line: Number | |
| | product_id: Number | |
| | quantity: Number | |
| +----------------------+ |
+--------------------------+
Sample Data
I generated one million documents for my example. I'll focus on predictable metrics like the number of documents examined rather than execution time, so that it can be easily reproduced with a small dataset. To simulate products with fluctuating popularity, I use a randomized logarithmic value to create product IDs:
const bulkOps = [];
for (let i = 0; i < 1000000; i++) {
const orderDetails = [];
for (let line = 1; line <= 10; line++) {
orderDetails.push({
line: line,
product_id: Math.floor(Math.log2(1 + i * Math.random())),
quantity: Math.floor(100 * Math.random()),
});
}
bulkOps.push({
insertOne: {
document: {
country_id: Math.floor(10 * Math.random()),
created_at: new Date(),
order_details: orderDetails
}
}
});
}
db.orders.bulkWrite(bulkOps).insertedCount;
Access Pattern and ESR Index
Users seek insights into product usage through a query for the most recent orders that include a specific product, in a specific country. Following the ESR rule, I created an index with equality fields in front of the key, followed by the fields for ordering results.
db.orders.createIndex( {
"country_id": 1,
"order_details.product_id": 1,
"created_at": -1
});
Query the 10 last orders for a product / country
I queried the ten last orders in country 1 including product 5:
print(
db.orders.find({
country_id: 1,
order_details: { $elemMatch: { product_id: 5 } }
}).sort({ created_at: -1 }).limit(10)
);
The user can analyze this data to understand the last orders for this product.
Execution plan for MongoDB API on Oracle Database
I query the execution plan for this query:
print(
db.orders.find( {
country_id: 1,
order_details: { $elemMatch: { product_id: 5 } }
}).sort({ created_at: -1 }).limit(10)
.explain("executionStats")
);
When using the "MongoDB API for Oracle Database," the query is re-written to SQL since the collection resides in SQL tables with OSON data type, employing internal functions to simulate MongoDB BSON document. The explain() method reveals the executed queries:
This is interresting to understand how storing JSON in SQL databases is different from using MongoDB and requires an emulation layer that generates complex non-standard SQL. Unfortunately, this does not show execution statistics.
To gain more insights, I gathered the SQL statement from V$SQL and ran it with the MONITOR hint to generate a SQL Monitor report:
select /*+ FIRST_ROWS(10) MONITOR */ "DATA",rawtohex("RESID"),"ETAG"
from "ORA"."orders"
where JSON_EXISTS("DATA"
,'$?( (@.country_id.numberOnly() == $B0) &&
( exists(@.order_details[*]?( (@.product_id.numberOnly() == $B1) )) ) )' passing 1 as "B0", 5 as "B1" type(strict))
order by JSON_QUERY("DATA", '$.created_at[*].max()') desc nulls last
fetch next 10 rows only
;
Here is the SQL Monitor report:
- 276 rows have been read from the index (INDEX RANGE SCAN). The access predicates are internal virtual columns and undocumented functions to apply the equality conditions:
"orders"."SYS_NC00005$" = SYS_CONS_ANY_SCALAR(1, 3) AND "orders"."SYS_NC00006$" = SYS_CONS_ANY_SCALAR(5, 3)
. - The index entries go though a deduplication step (HASH UNIQUE).
- 276 documents are fetched from the SQL table (TABLE ACCESS BY ROWID).
- They are finally sorted for Top-k (SORT ORDER BY STOPKEY) to return 31 documents, from which 10 are fetched to provide the result.
More operations occur to transform this result into MongoDB-compatible documents, but this happens on 10 documents as it occurs after the limit (COUNT STOPKEY). What is more problematic is what happens before, unnecessary work: read, hash, and sort the rows that do not participate to the result. It is 276 instead of 10 in this small example, but can be more in a larger database. The SORT ORDER BY is a blocking operation that must read all rows before being able to output one.
Oracle Database's index usage does not adhere to the MongoDB ESR (Equality, Sort, Range) rule. It only utilized index scans for the equality predicates, country_id and product_id, rendering the created_at field ineffective and failing to prevent a blocking sort operation. This occurs on 276 rows in this small example, but it can impacts more in a production database.
Since the index is only part of the filtering process, the execution plan may switch to another index. For example, an index starting with created_at
could assist with sort().limit()
, but may read too many countries and products.
The result from Oracle Database is compatible with MongoDB, but the performance and scalability is not.
Execution plan for MongoDB
Here is the execution plan on a real MongoDB database - I've run it on an Atlas free cluster:
db> print(
... db.orders.find( {
... country_id: 1,
... order_details: { $elemMatch: { product_id: 5 } }
... }).sort({ created_at: -1 }).limit(10)
... .explain("executionStats").executionStats
... );
{
executionSuccess: true,
nReturned: 10,
executionTimeMillis: 0,
totalKeysExamined: 10,
totalDocsExamined: 10,
executionStages: {
isCached: false,
stage: 'LIMIT',
nReturned: 10,
executionTimeMillisEstimate: 0,
works: 11,
advanced: 10,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 1,
limitAmount: 10,
inputStage: {
stage: 'FETCH',
filter: {
order_details: { '$elemMatch': { product_id: { '$eq': 5 } } }
},
nReturned: 10,
executionTimeMillisEstimate: 0,
works: 10,
advanced: 10,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
docsExamined: 10,
alreadyHasObj: 0,
inputStage: {
stage: 'IXSCAN',
nReturned: 10,
executionTimeMillisEstimate: 0,
works: 10,
advanced: 10,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 0,
keyPattern: {
country_id: 1,
'order_details.product_id': 1,
created_at: -1
},
indexName: 'country_id_1_order_details.product_id_1_created_at_-1',
isMultiKey: true,
multiKeyPaths: {
country_id: [],
'order_details.product_id': [ 'order_details' ],
created_at: []
},
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
country_id: [ '[1, 1]' ],
'order_details.product_id': [ '[5, 5]' ],
created_at: [ '[MaxKey, MinKey]' ]
},
keysExamined: 10,
seeks: 1,
dupsTested: 10,
dupsDropped: 0
}
}
}
}
Here MongoDB didn't read more rows than necessary:
- Index scan (
stage: 'IXSCAN'
) with a single access (seeks: 1
) to the values of the equality condition. - Read only the ten index entries (
keysExamined: 10
) needed for the result. - No sort operation, the ten documents (nReturned: 10) are read (stage: 'FETCH') sorted on the index key.
This is summarized by:
executionSuccess: true,
nReturned: 10,
totalKeysExamined: 10,
totalDocsExamined: 10,
When the number of keys examined matches the number of documents returned, it indicates optimal execution with no unnecessary operations.
This alignment ensures efficiency in processing, as all examined keys are relevant to the returned documents.
You can also look at the visual execution plan in MongoDB Compass:
Using document data modeling and MongoDB indexes allows you to access only the necessary data, ensuring that query costs are directly related to the results obtained.
Conclusion on Documents (vs. relational) and MongoDB (vs. emulations)
In a SQL database, the Orders - Order Details example requires two tables and a join to filter results. The join itself may not be too expensive, but SQL databases lack multi-table indexes. They do unnecessary work reading and joining rows that will be discarded later.
The document model, with embedded entities, allows for comprehensive indexing, offering optimal access unlike normalized tables in relational databases. MongoDB shines with indexes that follow Equality, Sort, and Range, and they can cover documents and sub-documents, with multiple keys per document.
While some SQL databases have copied the MongoDB API to provide better developer experience to those who are not SQL experts, they do not gain the same benefits as MongoDB, provide fewer indexing possibilities, and incur additional operations when executing queries.
March 07, 2025
Long-term backup options for Amazon RDS and Amazon Aurora
How to run load tests in real-time data systems
Dedicated Poolers
March 06, 2025
Authentication Best Practices: Convex, Clerk and Next.js
Simplify Valkey Adoption with Percona Support and Services
To B or not to B: B-Trees with Optimistic Lock Coupling
To B or not to B: B-Trees with Optimistic Lock Coupling
B-Trees are one of the few specialized data structures which truly stood the test of time, being over 50 years old. They were invented in year zero of unix time, on a system with a whopping 7.25MB of disk storage — Not DRAM! They are only one month younger than the idea of relational databases itself and witnessed the birth of multi-core architectures, complex cache hierarchies, fast network attached storage and many more groundbreaking changes. However, while most technologies would show their age and can comfortably retire, B-Trees are still ubiquitous for data storage.
March 03, 2025
We added the Backup Database engine to ClickHouse
Postgres 17.4 vs sysbench on a large server
Postgres has done a great job at avoiding performance regressions over time. It has results from sysbench and a large server for all point releases from Postgres 17.x and then the latest point release from Postgres 10 through 16.
This work was done by Small Datum LLC and not sponsored.
tl;dr - over time ...
- For the 27 microbenchmarks there is one regression that arrives in 11.x and remains through 17.x
- Postgres has many big improvements
- I am still trying to explain the regression but the problem isn't obvious.
- The regression first arrives in Postgres 11.0
- Flamegraphs don't show an obvious problem
- Perf HW counters show an increase in memory system stalls
The tests run with 8 tables and 10M rows/table. There are 40 client threads, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds.
(QPS for $version) / (QPS for Postgres 10.23)
Notes on the charts
- the y-axis shows the relative QPS
- the y-axis starts at 0.80 to make it easier to see differences
- in some cases the y-axis truncates the good outliers, cases where the relative QPS is greater than 1.5. I do this to improve readability for values near 1.0. Regardless, the improvements are nice.
- performance is stable over time in most cases
- performance gets much better for hot-points with Postgres 17.0
- performance is stable over time with small improvements for most tests
- performance gets ~1.2X better over time on the scan test
- these do a short range scan with aggregation and the =10, =100 and =10000 is the number of rows scanned per query
- performance is stable over time on the tests that do shorter range scans (=100, =10)
- performance drops by ~10% starting in 11.22 on the test with a longer range scan (=10000)
- depending on the test, performance over time is either stable, has a small improvement or has a large improvement
For the read-only_range=10000 tests where QPS drops by ~10% starting in 11.22 the problem is that Postgres uses more CPU per query starting in 11.22 (see the cpu/o column which stands for CPU per operation or CPU per query).
Metrics per test from vmstat and iostat are here and I highlighted results for the read-only_range=X tests that do a range scan with aggregation.
Things I have checked for read-only_range=10000
- output from ps for 10.23, 11.22 and 12.22 looks similar
- output from vmstat for 10.23, 11.22 and 12.22 looks similar with a few small differences. This output is from the middle of the microbenchmark.
- both swpd and free are larger in 11.22 and 12.22
- both buff and cache are larger in 10.23
- for 10.23, 11.22 and 12.22 those values don't change during the microbenchmark
- Flamegraphs for 10.23, 11.22 and 12.22 are here and look similar. The numbered (*.1.svg, *.2.svg) files are taken in sequence and then all are combined to create the *.all.svg file.
- Output with perf HW counters for 10.23, 11.0 and 11.10 are here. These show there are more memory system stalls starting in 11.0. I haven't explained the file naming scheme, but the data in the table below is from this file.
- Flat (no flamegraphs) perf profiles are here for 10.23, 11.0 and 11.10. There are ~8 samples per DBMS version and the 7th sample is here for 10.23, for 11.0 and for 11.10. From a quick check of them I don't see obvious problems.
Why Companies Are Saying No to Proprietary Software, Including MongoDB
February 28, 2025
Our K-Drama Journey
Finding a show that suits everyone in our family has always been a challenge. Intense action/stress is too much for our youngest daughter, and we prefer to keep things PG-13.
We’ve finally found a great solution: Korean-dramas. We’ve been watching them back to back recently, and they’ve been a hit across all ages.
Hometown Cha-Cha-Cha (2021) This was our gateway into K-dramas. Since it’s dubbed in English, it was easy to start. Set in a small seaside town, it’s a wholesome, family-friendly show with both comedic moments and touching scenes. It also gave us a glimpse into everyday Korean life. It's a solid feel-good watch.
Queen of Tears (2024) We picked this next because it was also dubbed. This one was a commitment (16 episodes, each 1.5 hours long) but it was worth it. It had more romance, more drama, better cinematography, and a stellar cast. The intense drama in some scenes made our youngest bawling. We got so hooked that we ended up binge-watching the last six episodes over a weekend.
Start-Up (2020) This is our first subtitled K-drama, so now we’re listening to the original Korean audio. Another love triangle with a childhood connection, a decent plot, and comedic moments. Again sixteen 1.5 hours episodes. One of the main actors was also in Hometown Cha-Cha-Cha, so we’re starting to recognize actors.
Observations on K-Dramas
Recurring Themes: Early love, childhood connections, and fated relationships. The main couple is always "meant to be."
Female Leads Have Erratic Behavior: Is it just me, or do Korean female leads suddenly become very clingy and irrational in romance? They expect grand gestures and act capriciously. Is this cultural, or just a K-drama trope?
Male Leads Are Idealized: The male protagonists are always near-perfect—handsome, successful, and incredibly capable. Doesn't this set unrealistic expectations for real-life relationships?
Drinking Culture: Drinking seems to be a big part of Korean life in these shows. Whenever characters are stressed or sad, they get completely wasted. And this is portrayed as funny, rather than problematic.
Seoul National University: The two (maybe all three) had the male lead character finish Seoul National University. Must be a big thing.
Hard Work and Cow Dung: For some reason these get featured back to back in K-dramas, the latter as comedic relief I guess.
Quirky eating: The more dramatically the male lead slurps his noodles, the more the female lead seems drawn to him. Is this an actual thing?
The Korean Language & Culture: Korean sounds familiar at times, possibly due to its distant connection to Turkish. Korean also sounds like the ideal language for ranting. Finally, the honorifics system is fascinating.
Better Than Turkish Dramas: We never clicked with Turkish dramas: too many long stares, very strained plots, and huge plot holes. K-dramas have better pacing and more engaging storylines.
Coda
Watching K-dramas has been a great experience. They spark discussions, keep us engaged in predicting plot twists, and help the kids develop storytelling and observational skills. Watching these shows also makes us want to visit Korea.
Got any recommendations for what we should watch next? Of course, my son and I watched Squid Game together, but that’s definitely not family-friendly. These more traditional K-dramas have been a much better fit.
Minimal downtime Postgres major version upgrades with EDB Postgres Distributed
This is an external post of mine. Click here if you are not redirected.
February 27, 2025
Order-preserving or-expansion in MongoDB and $in:[,,] treated as a range skip scan, or exploded to an equality for the ESR rule
I concluded the previous post with a question. We have explored how queries with range and equality filters behave differently based on the order of fields in the index key. Now, let’s examine queries that use the $in[]
operator.
This operator can be viewed as an equality filter that handles multiple values or as a range filter that skips scans across multiple ranges. In the previous post, we saw that MongoDB can use an index scan for both.
I use the same collection as in the previous post:
mdb> db.demoesr.drop();
mdb> db.demoesr.insertMany([
{ e:42, s:"a", r:10 },
{ e:42, s:"b", r:20 },
{ e:42, s:"b", r:10 },
{ e:42, s:"d", r:30 },
{ e:99, r:11 }
]);
I will query one value of "e" for various values of "r" and sort by "s". Rather than creating an Equality, Sort, Range index (ESR), I believe the Range condition is much more selective and crucial than the sort, so I will create an index with the order of Equality, Range, Sort (ERS):
test> db.demoesr.createIndex(
{ e: 1 , r: 1, s : 1 }
);
e_1_r_1_s_1
This post aims to determine whether a $in[]
functions similarly to an Equality filter with multiple values or resembles a Range filter that skips some ranges.
I run the following query where e:{ $eq: 42 }
and r:{ $in:[...]}
with a list of more than 200 values:
test> db.demoesr.find(
{ e:{ $eq: 42 }, r:{ $in:[
201, 200, 199, 198, 197, 196, 195, 194, 193, 192, 191, 190, 189, 188, 187, 186, 185, 184, 183, 182, 181, 180, 179, 178, 177, 176, 175, 174, 173, 172, 171, 170, 169, 168, 167, 166, 165, 164, 163, 162, 161, 160, 159, 158, 157, 156, 155, 154, 153, 152, 151, 150, 149, 148, 147, 146, 145, 144, 143, 142, 141, 140, 139, 138, 137, 136, 135, 134, 133, 132, 131, 130, 129, 128, 127, 126, 125, 124, 123, 122, 121, 120, 119, 118, 117, 116, 115, 114, 113, 112, 111, 110, 109, 108, 107, 106, 105, 104, 103, 102, 101, 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49, 48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1
] } } ,
{ e:1,s:1,r:1,"_id":0 }
).sort({ s: 1 }).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 4,
totalKeysExamined: 5,
totalDocsExamined: 0,
executionStages: {
stage: 'SORT',
nReturned: 4,
sortPattern: { s: 1 },
inputStage: {
stage: 'PROJECTION_COVERED',
nReturned: 4,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'IXSCAN',
nReturned: 4,
keyPattern: { e: 1, r: 1, s: 1 },
indexName: 'e_1_r_1_s_1',
isMultiKey: false,
multiKeyPaths: { e: [], r: [], s: [] },
direction: 'forward',
indexBounds: {
e: [ '[42, 42]' ],
r: [
'[1, 1]', '[2, 2]', '[3, 3]', '[4, 4]', '[5, 5]',
'[6, 6]', '[7, 7]', '[8, 8]', '[9, 9]', '[10, 10]',
'[11, 11]', '[12, 12]', '[13, 13]', '[14, 14]', '[15, 15]',
'[16, 16]', '[17, 17]', '[18, 18]', '[19, 19]', '[20, 20]',
'[21, 21]', '[22, 22]', '[23, 23]', '[24, 24]', '[25, 25]',
'[26, 26]', '[27, 27]', '[28, 28]', '[29, 29]', '[30, 30]',
'[31, 31]', '[32, 32]', '[33, 33]', '[34, 34]', '[35, 35]',
'[36, 36]', '[37, 37]', '[38, 38]', '[39, 39]', '[40, 40]',
'[41, 41]', '[42, 42]', '[43, 43]', '[44, 44]', '[45, 45]',
'[46, 46]', '[47, 47]', '[48, 48]', '[49, 49]', '[50, 50]',
'[51, 51]', '[52, 52]', '[53, 53]', '[54, 54]', '[55, 55]',
'[56, 56]', '[57, 57]', '[58, 58]', '[59, 59]', '[60, 60]',
'[61, 61]', '[62, 62]', '[63, 63]', '[64, 64]', '[65, 65]',
'[66, 66]', '[67, 67]', '[68, 68]', '[69, 69]', '[70, 70]',
'[71, 71]', '[72, 72]', '[73, 73]', '[74, 74]', '[75, 75]',
'[76, 76]', '[77, 77]', '[78, 78]', '[79, 79]', '[80, 80]',
'[81, 81]', '[82, 82]', '[83, 83]', '[84, 84]', '[85, 85]',
'[86, 86]', '[87, 87]', '[88, 88]', '[89, 89]', '[90, 90]',
'[91, 91]', '[92, 92]', '[93, 93]', '[94, 94]', '[95, 95]',
'[96, 96]', '[97, 97]', '[98, 98]', '[99, 99]', '[100, 100]',
... 101 more items
],
s: [ '[MinKey, MaxKey]' ]
},
keysExamined: 5,
seeks: 1
}
}
}
}
This is the best way to filter the documents for retrieval, utilizing multiple ranges with a skip scan as discussed in the previous post. However, because it doesn't adhere to the ESR rule, the index failed to return the entries in the expected order, necessitating an additional SORT operation.
Sort Merge over a OR Expansion
I used more than 200 values to demonstrate the general behavior of the $in[]
operator within the IXSCAN range. MongoDB optimizes performance when the list contains 200 values or fewer.
Here is the execution plan with a smaller list:
test> db.demoesr.find(
{ e:{ $eq: 42 }, r:{ $in:[10,30] } } ,
{ e:1,s:1,r:1,"_id":0 }
).sort({ s: 1 }).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 3,
totalKeysExamined: 3,
totalDocsExamined: 0,
executionStages: {
stage: 'PROJECTION_DEFAULT',
nReturned: 3,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'SORT_MERGE',
nReturned: 3,
sortPattern: { s: 1 },
inputStages: [
{
stage: 'IXSCAN',
nReturned: 2,
keyPattern: { e: 1, r: 1, s: 1 },
indexName: 'e_1_r_1_s_1',
isMultiKey: false,
multiKeyPaths: { e: [], r: [], s: [] },
direction: 'forward',
indexBounds: {
e: [ '[42, 42]' ],
r: [ '[10, 10]' ],
s: [ '[MinKey, MaxKey]' ]
},
keysExamined: 2,
seeks: 1
},
{
stage: 'IXSCAN',
nReturned: 1,
keyPattern: { e: 1, r: 1, s: 1 },
indexName: 'e_1_r_1_s_1',
isMultiKey: false,
multiKeyPaths: { e: [], r: [], s: [] },
indexBounds: {
e: [ '[42, 42]' ],
r: [ '[30, 30]' ],
s: [ '[MinKey, MaxKey]' ]
},
keysExamined: 1,
seeks: 1
}
]
}
}
}
When processing 200 or fewer values, MongoDB treats them as one equality filter for each value, utilizing a separate IXSCAN. If the following field in the index key is used to sort the result, each index scan returns the keys in the correct sort order. MongoDB reads from these sorted segments and merges them while preserving the order, effectively performing a SORT_MERGE to bypass a sort operation.
Since each IXSCAN treats the $in[]
value as an Equality instead of a Range, my index configuration consists of Equality, Equality, Sort rather than Equality, Range, Sort. It follows the Equality, Sort, Range (ESR) rule.
I've rarely seen such intelligent optimization in a database before. Oracle Database can convert a list into a concatenation (UNION ALL) using a technique known as OR Expansion, yet it lacks a merge sort feature on top of it. On the other hand, PostgreSQL can perform a merge sort on concatenated UNION but does not support the OR Expansion transformation to get it transparently. YugabyteDB employs a similar method for batching the nested loop join by pushing down the join condition as an array.
The internal name of this feature in MongoDB query planner is Explode for Sort, as a combination of Index Scan Explosion and Merge Sort and is advantageous with pagination when the limit is pushed down to each index scan.
It can be used by $or:{}
or $in:[]
. For example, the following query read one key from each index scan:
test> db.demoesr.find(
{
e: { $eq: 42 },
$or: [ { r: 10 } , { r: 30 } ]
}).sort( { s: 1 } ).limit(1);
{
nReturned: 1,
totalKeysExamined: 2,
totalDocsExamined: 0,
executionStages: {
stage: 'LIMIT',
nReturned: 1,
limitAmount: 1,
inputStage: {
stage: 'PROJECTION_DEFAULT',
nReturned: 1,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'SORT_MERGE',
nReturned: 1,
sortPattern: { s: 1 },
inputStages: [
{
stage: 'IXSCAN',
nReturned: 1,
keysExamined: 1,
seeks: 1
},
{
stage: 'IXSCAN',
nReturned: 1,
keysExamined: 1,
seeks: 1
}
]
}
}
}
}
This feature is also documented in the ESR rule additional considerations: https://www.mongodb.com/docs/manual/tutorial/equality-sort-range-rule/#additional-considerations
Skip Scan in MongoDB with Range, Equality (RE) index as an alternative to Equality, Sort, Range (ESR)
To achieve the ideal index for each query while adhering to the Equality, Sort, and Range (ESR) rule, you may end up with too many indexes. Instead of creating the best index for every query, the most effective indexing strategy is using a small set of indexes that addresses performance and scalability requirements.
Indexes that start with a field not used as a filter can still be beneficial if the column has a limited range of values. The index scans the entries based on the index key's following field and skips the first field's values to seek the following range. We will illustrate this using a similar collection as in the previous post:
mdb> db.demoesr.drop();
mdb> db.demoesr.insertMany([
{ e:42, s:"a", r:10 },
{ e:42, s:"b", r:20 },
{ e:42, s:"b", r:10 },
{ e:42, s:"d", r:30 },
{ e:99, r:11 } // add one to skip
]);
My query finds the document with e = 42
and r > 10
:
test> db.demoesr.find(
{ r:{$gt:10} , e:{$eq: 42} } ,
{e:1,s:1,r:1,"_id":0 }
)
;
[
{ e: 42, s: 'b', r: 20 },
{ e: 42, s: 'd', r: 30 }
]
Without an index, this reads all documents with a COLLSCAN:
db.demoesr.find(
{ r:{$gt:10} , e:{$eq: 42} } ,
{e:1,s:1,r:1,"_id":0 }
).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 2,
totalKeysExamined: 0,
totalDocsExamined: 5,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 2,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'COLLSCAN',
filter: {
'$and': [ { e: { '$eq': 42 } }, { r: { '$gt': 10 } } ]
},
nReturned: 2,
direction: 'forward',
docsExamined: 5
}
}
}
Although my collection for the demo is limited, the execution plan highlights its scalability challenge. It examines all documents but only returns a subset, which lacks efficiency.
ESR index (Equality, Sort, Range)
With the index created in the previous post, my query uses an index scan:
test> db.demoesr.createIndex(
{ e: 1 , s: 1, r : 1 }
);
e_1_s_1_r_1
test> db.demoesr.find(
{ r:{$gt:10} , e:{$eq: 42} } ,
{e:1,s:1,r:1,"_id":0 }
).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 2,
totalKeysExamined: 5,
totalDocsExamined: 0,
executionStages: {
stage: 'PROJECTION_COVERED',
nReturned: 2,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'IXSCAN',
nReturned: 2,
keyPattern: { e: 1, s: 1, r: 1 },
indexName: 'e_1_s_1_r_1',
isMultiKey: false,
multiKeyPaths: { e: [], s: [], r: [] },
direction: 'forward',
indexBounds: {
e: [ '[42, 42]' ],
s: [ '[MinKey, MaxKey]' ],
r: [ '(10, inf.0]' ]
},
keysExamined: 5,
seeks: 3
}
}
}
This plan is acceptable if the equality part (e: [ '[42, 42]' ]
) is highly selective. Since the subsequent range is not filtering, all values were read. However, a skip scan was used to seek to the next value at the end of each range of the next bound (r: [ '(10, inf.0]' ]
).
ER index (Equality, Range)
Since I don't do any sort and read all values for "s", having an index on the two fields used for filtering is preferable:
test> db.demoesr.createIndex(
{ e: 1 , r : 1 }
);
e_1_r_1
test> db.demoesr.find(
{ r:{$gt:10} , e:{$eq: 42} } ,
{e:1,s:1,r:1,"_id":0 }
).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 2,
executionTimeMillis: 3,
totalKeysExamined: 2,
totalDocsExamined: 2,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 2,
executionTimeMillisEstimate: 0,
works: 4,
advanced: 2,
needTime: 0,
needYield: 0,
saveState: 0,
restoreState: 0,
isEOF: 1,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'FETCH',
nReturned: 2,
inputStage: {
stage: 'IXSCAN',
nReturned: 2,
keyPattern: { e: 1, r: 1 },
indexName: 'e_1_r_1',
isMultiKey: false,
multiKeyPaths: { e: [], r: [] },
direction: 'forward',
indexBounds: { e: [ '[42, 42]' ], r: [ '(10, inf.0]' ] },
keysExamined: 2,
seeks: 1
}
}
}
}
This decreased the number of seeks since no values were skipped. It reads a single range (e: [ '[42, 42]' ], r: [ '(10, inf.0]' ]
) encompassing all the keys for the returned documents. You achieve the optimal index for your filter when seeks: 1
and keysExamined = nReturned
.
RE index (Range, Equality)
Now, imagine that a similar index was created for another critical query with the fields arranged in a different order, as this use case had an equality predicate on "r" and reads many values of "e". I don't want to create another index for my query if I don't need to, so let's remove the indexes created above and check the execution plan with an index starting with "r":
test> db.demoesr.dropIndexes("*");
{
nIndexesWas: 3,
msg: 'non-_id indexes dropped for collection',
ok: 1
}
test> db.demoesr.createIndex(
{ r: 1, e : 1 }
);
test> db.demoesr.find(
{ r:{$gt:10} , e:{$eq: 42} } ,
{e:1,s:1,r:1,"_id":0 }
).explain("executionStats").executionStats
;
{
executionSuccess: true,
nReturned: 2,
totalKeysExamined: 3,
totalDocsExamined: 2,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 2,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'FETCH',
nReturned: 2,
docsExamined: 2,
alreadyHasObj: 0,
inputStage: {
stage: 'IXSCAN',
nReturned: 2,
keyPattern: { r: 1, e: 1 },
indexName: 'r_1_e_1',
isMultiKey: false,
multiKeyPaths: { r: [], e: [] },
direction: 'forward',
indexBounds: { r: [ '(10, inf.0]' ], e: [ '[42, 42]' ] },
keysExamined: 3,
seeks: 2
}
}
}
}
In my case, I have only one value for "e" to skip, as { e:99, r:11 }
is in the first range r: [ '(10, inf.0]' ]
but not in the following one e: [ '[42, 42]' ]
, which is evident from one additional seek. If there aren't too many of those, this index remains efficient, and there's no need to create an additional one with a better key field ordering for this query.
Keep in mind that if there's no range predicate in your query, the IXCAN cannot be utilized, as we discussed in the previous post. Without an index that begins with "e", this will result in a COLLSCAN:
db.demoesr.find(
{ e:{$eq: 42} } , {e:1,s:1,r:1,"_id":0 }
).explain("executionStats").executionStats;
The workaround remains unchanged from the previous post, which involves adding an infinite range filter to the first field of the index:
db.demoesr.find(
{ r:{$gte:MinKey}, e:{$eq: 42} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1}).explain("executionStats").executionStats;
We have seen equality predicates, where the index scan can seek directly to the desired point, and range predicates, where multiple values can be examined but skipped according to the following ranges. You may wonder if a filter on a list of values ($in
) is processed like multiple equalities or a single range with skips. That will be the next topic.