a curated list of database news from authoritative sources

November 23, 2025

Data Locality vs. Independence: Which Should Your Database Prioritize?

Understand how the principle of "store together what is accessed together" is a game-changer for database performance and scalability.

When your application needs several pieces of data at once, the fastest approach is to read them from a single location in a single call. In a document database, developers can decide what is stored together, both logically and physically.

Fragmentation has never been beneficial for performance. In databases, the proximity of data — on disk, in memory or across the network — is crucial for scalability. Keeping related data together allows a single operation to fetch everything needed, reducing disk I/O, memory cache misses and network round-trips, thereby making performance more predictable.

The principle “store together what is accessed together” is central to modeling in document databases. Yet its purpose is to allow developers to control the physical storage layout, even with flexible data structures.

In contrast, SQL databases were designed for data independence — allowing users to interact with a logical model separate from the physical implementation managed by a database administrator.

Today, the trend is not to separate development and operations, allowing faster development cycles without the complexity of coordinating multiple teams or shared schemas. Avoiding the separation into logical and physical models further simplifies the process.

Understanding the core principle of data locality is essential today, especially as many databases emulate document databases or offer similar syntax on top of SQL. To qualify as a document database, it’s not enough to accept JSON documents with a developer-friendly syntax.

The database must also preserve those documents intact in storage so that accessing them has predictable performance. Whether they expose a relational or document API, it is essential to know if your objective is data independence or data locality.

Why Locality Still Matters in Modern Infrastructure

Modern hardware still suffers from penalties for scattered access. Hard disk drives (HDDs) highlighted the importance of locality because seek and rotational latency are more impactful than transfer speed, especially for online transactional processing (OTLP) workloads.

While solid state drives (SSDs) remove mechanical delays, random writes remain expensive, and cloud storage adds latency due to network access to storage. Even in-memory access isn’t immune: on multisocket servers, non-uniform memory access (NUMA) causes varying access times depending on where the data was loaded into memory by the first access, relative to the CPU core that processes it later.

Scale-out architecture further increases complexity. Vertical scaling — keeping all reads and writes on a single instance with shared disks and memory — has capacity limits. Large instances are expensive, and scaling them down or up often requires downtime, which is risky for always-on applications.

For example, you might need your maximum instance size for Black Friday but would have to scale up progressively in the lead-up, incurring downtime as usage increases. Without horizontal scalability, you end up provisioning well above your average load “just in case,” as in on-premises infrastructures sized years in advance for occasional peaks — something that can be prohibitively costly in the cloud.

Horizontal scaling allows adding or removing nodes without downtime. However, more nodes increase the likelihood of distributed queries, in which operations that once hit local memory must now traverse the network, introducing unpredictable latency. Data locality becomes critical with scale-out databases.

To create scalable database applications, developers should understand storage organization and prioritize single-document operations for performance-critical transactions. CRUD functions (insert, find, update, delete) targeting a single document in MongoDB are always handled by a single node, even in a sharded deployment. If that document isn’t in memory, it can be read from disk in a single I/O operation. Modifications are applied to the in-memory copy and written back as a single document during asynchronous checkpoints, avoiding on-disk fragmentation.

In MongoDB, the WiredTiger storage engine stores each document’s fields together in contiguous storage blocks, allowing developers to follow the principle “store together what is accessed together.” By avoiding cross-document joins, such as the $lookup operation in queries, this design helps prevent scatter-gather operations internally, which promotes consistent performance. This supports predictable performance regardless of document size, update frequency or cluster scale.

The Relational Promise: Physical Data Independence

For developers working with NoSQL databases, what I exposed above seems obvious: There is one single data model — the domain model — defined in the application, and the database stores exactly that model.

The MongoDB data modeling workshop defines a database schema as the physical model that describes how the data is organized in the database. In relational databases, the logical model is typically independent of the physical storage model, regardless of the data type used, because they serve different purposes.

SQL developers work with a relational model that is mapped to their object model via object relational mapping (ORM) tooling or hand-coded SQL joins. The models and schemas are normalized for generality, not necessarily optimized for specific application access patterns.

The goal of the relational model was to serve online interactive use by non-programmers and casual users by providing an abstraction that hides physical concerns. This includes avoiding data anomalies through normalization and enabling declarative query access without procedural code. Physical optimizations, like indexes, are considered implementation details. You will not find CREATE INDEX in the SQL standard.

In practice, a SQL query planner chooses access paths based on statistics. When writing JOIN clauses, the order of tables in the FROM clause should not matter. The SQL query planner reorders based on cost estimates. The database guarantees logical consistency, at least in theory, even with concurrent users and internal replication. The SQL approach is database-centric: rules, constraints and transactional guarantees are defined in the relational database, independent of specific use cases or table sizes.

Today, most relational databases sit behind applications. End users rarely interact with them directly, except in analytical or data science contexts. Applications can enforce data integrity and handle code anomalies, and developers understand data structures and algorithms. Nonetheless, relational database experts still advise keeping constraints, stored procedures, transactions, and joins within the database.

The physical storage remains abstracted — indexes, clustering, and partitions are administrator-level, not application-level, concepts, as if the application developers were like the non-programmer casual users described in the early papers about relational databases.

How Codd’s Rules Apply to SQL/JSON Documents

Because data locality matters, some relational databases have mechanisms to enforce it internally. For example, Oracle has long supported “clustered tables” for co-locating related rows from multiple columns, and more recently offers a choice for JSON storage as either binary JSON (OSON, Oracle’s native binary JSON) or decomposed relational rows (JSON-relational duality views). However, those physical attributes are declared and deployed in the database using a specific data definition language (DDL) and are not exposed to the application developers. This reflects Codd’s “independence” rules:

  • Rule 8: Physical data independence
  • Rule 9: Logical data independence
  • Rule 10: Integrity independence
  • Rule 11: Distribution independence

Rules 8 and 11 relate directly to data locality: The user is not supposed to care whether data is physically together or distributed. The database is opened to users who ignore the physical data model, access paths and algorithms. Developers do not know what is replicated, sharded or distributed across multiple data centers.

Where the SQL Abstraction Begins to Weaken

In practice, no relational database perfectly achieves these rules. Performance tuning often requires looking at execution plans and physical data layouts. Serializable isolation is rarely used due to scalability limitations of two-phase locking, leading developers to fall back to weaker isolation levels or to explicit locking (SELECT ... FOR UPDATE). Physical co-location mechanisms — hash clusters, attribute clustering — exist, but are difficult to size and maintain optimally without precise knowledge of access patterns. They often require regular data reorganization as updates can fragment it again.

The normalized model is inherently application-agnostic, so optimizing for locality often means breaking data independence ( denormalizing, maintaining materialized views, accepting stale reads from replicas, disabling referential integrity). With sharding, constraints like foreign keys and unique indexes generally cannot be enforced across shards. Transactions must be carefully ordered to avoid long waits and deadlocks. Even with an abstraction layer, applications must be aware of the physical distribution for some operations.

The NoSQL Approach: Modeling for Access Patterns

As data volumes and latency expectations grow, a different paradigm has emerged: give developers complete control rather than an abstraction with some exceptions.

NoSQL databases adopt an application-first approach: The physical model matches the access patterns, and the responsibility for maintaining integrity and transactional scope is pushed to the application. Initially, many NoSQL stores delegated all responsibility, including consistency, to developers, acting as “dumb” key-value or document stores. Most lacked ACID (atomicity, consistency, isolation and durability) transactions or query planners. If secondary indexes were present, they needed to be queried explicitly.

This NoSQL approach was the opposite of the relational database world: Instead of one shared, normalized database, there were many purpose-built data stores per application. It reduces the performance and scalability surprises, but at the price of more complexity.

MongoDB’s Middle Road for Flexible Schemas

MongoDB evolved by adding essential relational database capabilities — indexes, query planning, multidocument ACID transactions — while keeping the application-first document model. When you insert a document, it is stored as a single unit.

In WiredTiger, the MongoDB storage engine, BSON documents (binary JSON with additional datatypes and indexing capabilities) are stored in B-trees with variable-sized leaf pages, allowing large documents to remain contiguous, which differs from the fixed-size page structures used by many relational databases. This avoids splitting a business object across multiple blocks and ensures consistent latency for operations that appear as a single operation to developers.

Updates in MongoDB are applied in memory. Committing them as in-place changes on disk would fragment pages. Instead, WiredTiger uses reconciliation to write a complete new version at checkpoints — similar to copy-on-write filesystems, but with a flexible block size. This may cause write amplification, but preserves document locality. With appropriately sized instances, these writes occur in the background and do not affect in-memory write latency.

Locality defined at the application’s document schema flows all the way down to the storage layer, something that relational database engines typically cannot match with their goal of physical data independence.

How Data Locality Improves Application Performance

Designing for locality simplifies development and operations in several ways:

  • Transactions: A business change affecting a single aggregate (in the domain-driven design sense) becomes a single atomic read–modify–write on one document — no multiple roundtrips like BEGIN, SELECT ... FOR UPDATE, multiple updates and COMMIT.
  • Queries and indexing: Related data in one document avoids SQL joins and ORM lazy/eager mapping. A single compound index can cover filters and projections across fields that would otherwise be in separate tables, ensuring predictable plans without join-order uncertainty.
  • Development: The same domain model in the application is used directly as the database schema. Developers can reason about access patterns without mapping to a separate model, making latency and plan stability predictable.
  • Scalability: Most operations targeting a single aggregate, with shard keys chosen accordingly, can be routed to one node, avoiding scatter–gather fan-out for critical use cases.

MongoDB’s optimistic concurrency control avoids locks, though it requires retry logic on write conflict errors. For single-document calls, retries are handled transparently by the databases, which have a complete view of the transaction intent, making it simpler and faster.

Embedding vs. Referencing in Document Data Modeling

Locality doesn’t mean “embed everything.” It means: Embed what you consistently access together. Bounded one-to-many relationships (such as an order and its line items) are candidates for embedding. Rarely updated references and dimensions can also be duplicated and embedded. High-cardinality or unbounded-growth relationships, or independently updated entities, are better represented as separate documents and can be co-located via shard keys.

MongoDB’s compound and multikey indexes support embedded fields, maintaining predictable, selective access without joins. Embedding within the same document is the only way to guarantee co-location at the block level. Multiple documents in a single collection are not stored close together, except for small documents inserted at the same time, as they might share the same block. In sharding, the shard key ensures co-location on the same node but not within the same block.

In MongoDB, locality is an explicit design choice in domain-driven design:

  • Identify aggregates that change and are read together.
  • Store them in one document when appropriate.
  • Use indexes aligned with access paths.
  • Choose shard keys so related operations route to one node.

What MongoDB Emulations Miss About Locality

Given the popularity of the document model, some cloud services offer MongoDB-like APIs on top of SQL databases. These systems may expose a MongoDB-like API while retaining a relational storage model, which typically does not maintain the same level of physical locality.

Relational databases store rows in fixed-size blocks (often 8 KB). Large documents must be split across multiple blocks. Here are some examples in popular SQL databases:

  • PostgreSQL JSONB: Stores JSON in heap tables and large documents in many chunks, using TOAST, the oversized attribute storage technique. The document is compressed and split into chunks stored in another table, accessed via an index. Reading a large document is like a nested loop join between the row and its TOAST table.
  • Oracle JSON-Relational Duality Views: Map JSON documents to relational tables, preserving data independence rather than physical locality. Elements accessed together may be scattered across blocks, requiring internal joins, multiple I/Os and possibly network calls in distributed setups.

In both scenarios, the documents are divided into either binary chunks or normalized tables. Although the API resembles MongoDB, it remains a SQL database that lacks data locality. Instead, it provides an abstraction that keeps the developer unaware of internal processes until they inspect the execution plan and understand the database internals.

Conclusion

“Store together what is accessed together” reflects realities across sharding, I/O patterns, transactions, and memory cache efficiency. Relational database engines abstract away physical layout, which works well for centralized, normalized databases serving multiple applications in a single monolithic server. At a larger scale, especially in elastic cloud environments, horizontal sharding is essential — and often incompatible with pure data independence. Developers must account for locality.

In SQL databases, this means denormalizing, duplicating reference data, and avoiding cross-shard constraints. The document model, when the database truly enforces locality down to storage offers an alternative to this abstraction and exceptions.

In MongoDB, locality can be explicitly defined at the application level while still providing indexing, query planning and transactional features. When assessing “MongoDB-compatible”systems on relational engines, it is helpful to determine whether the engine stores aggregates contiguously on disk and routes them to a single node by design. If not, the performance characteristics may differ from those of a document database that maintains physical locality.

Both approaches are valid. In database-first deployment, developers depend on in-database declarations to ensure performance, working alongside the database administrator and using tools like execution plans for troubleshooting. In contrast, application-first deployment shifts more responsibility to developers, who must validate both the application’s functionality and its performance.

November 22, 2025

Challenges compiling old C++ code on modern Linux

I often compile old versions of MySQL, MariaDB, Postgres and RocksDB in my search for performance regressions. Compiling is easy with Postgres as they do a great job at avoiding compilation warnings and I never encounter broken builds. Certainly the community gets the credit for this, but I suspect their task is easier because they use C.  This started as a LinkedIn post.

I expect people to disagree, and I am far from a C++ expert, but here goes ...

tl;dr - if you maintain widely used header files (widely used by C++ projects) consider not removing that include that you don't really need (like <cstdint>) because such removal is likely to break builds for older releases of projects that use your include.

I have more trouble compiling older releases of C++ projects. For MySQL I have a directory in github that includes patches that must be applied. And for MySQL I have to patch all 5.6 versions, 5.7 versions up to 5.7.33 and 8.0 versions up to 8.0.23. The most common reason for the patch is missing C++ includes (like <cstdint>).

For RocksDB with gcc I don't have to patch files but I need to use gcc-11 for RocksDB 6.x and gcc-12 for RocksDB 7.x.

For RocksDB with clang I don't have to patch files for RocksDB 8.x, 9.x and 10.x while I do have to patch 6.x and 7.x. For RocksDB 7.10 I need to edit two files to add <cstdint>. The files are:

  • table/block_based/data_block_hash_index.h
  • util/string_util.h
All of this is true for Ubuntu 24.04 with clang 18.1.3 and gcc 13.3.0.

November 21, 2025

INNER JOIN and LEFT OUTER JOIN in MongoDB (with $lookup and $unwind)

We often explain $lookup as similar to a relational JOIN, but there’s a key difference in the returned output format. In SQL, a JOIN returns a flat, tabular result set, which means values from the “one” side of a one‑to‑many relationship are repeated for each matching row on the “many” side. In MongoDB, $lookup outputs related data as arrays embedded within each document, so the “one” side appears only once and related items are grouped together.

Here’s the funny part: relational databases were invented to avoid duplication through relational normalization — yet a JOIN immediately re‑introduces duplication in the query results, consumed as one stream of records. That’s why application code or ORMs often have to break those repeated values back into a nested structure. In MongoDB, when data is stored using references, $lookup can resolve those references and embed the related documents directly in the output, avoiding duplication and producing a structure that’s naturally more application‑friendly. MongoDB is a document database, either you embed on write to store related data in a single document, or you embed on read from multiple documents with references, with $lookup, but the result is the same: an object aggregate in JSON, or rather BSON, that is fetched as-is by the drivers.

Still, MongoDB can provide the equivalent of an INNER JOIN or a LEFT OUTER JOIN, when adding an $unwind stage after the $lookup.

I'll demonstrate this with the traditional department and employee schema, using the names from the SCOTT example schema of Oracle database, but as MongoDB collections:

db.createCollection("dept");
db.dept.insertMany([
  { deptno: 10, dname: "ACCOUNTING", loc: "NEW YORK" },
  { deptno: 20, dname: "RESEARCH", loc: "DALLAS" },
  { deptno: 30, dname: "SALES", loc: "CHICAGO" },
  { deptno: 40, dname: "OPERATIONS", loc: "BOSTON" }
]);
db.createCollection("emp");
db.emp.insertMany([
  { empno: 7839, ename: "KING",  deptno: 10 },
  { empno: 7698, ename: "BLAKE", deptno: 30 },
  { empno: 7782, ename: "CLARK", deptno: 10 },
  { empno: 7566, ename: "JONES", deptno: 20 },
  { empno: 7788, ename: "SCOTT", deptno: 20 },
  { empno: 7902, ename: "FORD",  deptno: 20 },
  { empno: 7369, ename: "SMITH", deptno: 20 },
  { empno: 7499, ename: "ALLEN", deptno: 30 },
  { empno: 7521, ename: "WARD",  deptno: 30 },
  { empno: 7654, ename: "MARTIN",deptno: 30 },
  { empno: 7844, ename: "TURNER",deptno: 30 },
  { empno: 7876, ename: "ADAMS", deptno: 20 },
  { empno: 7900, ename: "JAMES", deptno: 30 },
  { empno: 7934, ename: "MILLER",deptno: 10 }
]);

Note that the "OPERATIONS" department has no employees at this point in time.

$lookup: join with no data duplication

In a SQL database, joining tables duplicates data to ensure each record has all necessary information for independent processing. When loading this data as objects rather than records, the application must detect where objects begin and end and eliminate duplicates—or rely on an ORM to do this automatically. Here is an example of a join to show the departments with their employees:

SQL> select dept.*,empno,ename 
     from dept left outer join emp on dept.deptno = emp.deptno
;

   DEPTNO DNAME         LOC            EMPNO ENAME
_________ _____________ ___________ ________ _________
       10 ACCOUNTING    NEW YORK       7,782 CLARK
       10 ACCOUNTING    NEW YORK       7,839 KING
       10 ACCOUNTING    NEW YORK       7,934 MILLER
       20 RESEARCH      DALLAS         7,369 SMITH
       20 RESEARCH      DALLAS         7,566 JONES
       20 RESEARCH      DALLAS         7,788 SCOTT
       20 RESEARCH      DALLAS         7,876 ADAMS
       20 RESEARCH      DALLAS         7,902 FORD
       30 SALES         CHICAGO        7,499 ALLEN
       30 SALES         CHICAGO        7,521 WARD
       30 SALES         CHICAGO        7,654 MARTIN
       30 SALES         CHICAGO        7,698 BLAKE
       30 SALES         CHICAGO        7,844 TURNER
       30 SALES         CHICAGO        7,900 JAMES

14 rows selected.

MongoDB provides a consistent document model across both application code and database storage, and returns the same join result without duplication:

db.dept.aggregate([
  { $lookup: {
      from: "emp",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  }
])

[
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: [
      { _id: ObjectId('691f1623f4a22d6613d4b0f1'), empno: 7839, ename: 'KING', deptno: 10 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f3'), empno: 7782, ename: 'CLARK', deptno: 10 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fe'), empno: 7934, ename: 'MILLER', deptno: 10 }
    ]
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'),deptno: 20,dname: 'RESEARCH',loc: 'DALLAS',
    employees: [
      { _id: ObjectId('691f1623f4a22d6613d4b0f4'), empno: 7566, ename: 'JONES', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f5'), empno: 7788, ename: 'SCOTT', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f6'), empno: 7902, ename: 'FORD', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f7'), empno: 7369, ename: 'SMITH', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fc'), empno: 7876, ename: 'ADAMS', deptno: 20 }
    ]
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'),deptno: 30,dname: 'SALES',loc: 'CHICAGO',
    employees: [
      { _id: ObjectId('691f1623f4a22d6613d4b0f2'), empno: 7698, ename: 'BLAKE', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f8'), empno: 7499, ename: 'ALLEN', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f9'), empno: 7521, ename: 'WARD', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fa'), empno: 7654, ename: 'MARTIN', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fb'), empno: 7844, ename: 'TURNER', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fd'), empno: 7900, ename: 'JAMES', deptno: 30 }
    ]
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dd'), deptno: 40,dname: 'OPERATIONS', loc: 'BOSTON',
    employees: []
  }
]

The $lookup operation in MongoDB is similar to a LEFT OUTER JOIN except that it keeps the data in a structured schema, to be consumed by the application.

$lookup + $unwind : inner join (with duplication)

The result of $lookup is the easiest for the application, but if you need a record-like output to process further, $unwind can unnest the array into as many documents as the array items:

db.dept.aggregate([
  { $lookup: {
      from: "emp",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  },
  { $unwind: 
       "$employees" 
  }
])

[
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f1'), empno: 7839, ename: 'KING', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f3'), empno: 7782, ename: 'CLARK', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fe'), empno: 7934, ename: 'MILLER', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f4'), empno: 7566, ename: 'JONES', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f5'), empno: 7788, ename: 'SCOTT', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f6'), empno: 7902, ename: 'FORD', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f7'), empno: 7369, ename: 'SMITH', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fc'), empno: 7876, ename: 'ADAMS', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f2'), empno: 7698, ename: 'BLAKE', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f8'), empno: 7499, ename: 'ALLEN', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f9'), empno: 7521, ename: 'WARD', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fa'), empno: 7654, ename: 'MARTIN', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fb'), empno: 7844, ename: 'TURNER', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fd'), empno: 7900, ename: 'JAMES', deptno: 30 }
  }
]

Internally, the implementation avoids materializing the array and immediately returns one document per match, the two stages $lookup and $unwind being optimized into a single lookup-unwind stage that acts like a SQL join.

Note that because there are no items in the employees array, there are no documents in the output for department 40. This is exactly like an INNER JOIN in SQL.

$lookup + $unwind with preserveNullAndEmptyArrays: left outer join

It is possible to preserve the LEFT OUTER JOIN output of $lookup through $unwind with preserveNullAndEmptyArrays: true:

db.dept.aggregate([
  { $lookup: {
      from: "emp",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  },
  { $unwind: {
       path: "$employees" ,
       preserveNullAndEmptyArrays: true
     }
  }
])

[
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f1'), empno: 7839, ename: 'KING', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f3'), empno: 7782, ename: 'CLARK', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fe'), empno: 7934, ename: 'MILLER', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f4'), empno: 7566, ename: 'JONES', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f5'), empno: 7788, ename: 'SCOTT', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ... (truncated)
                                    

INNER JOIN and LEFT OUTER JOIN in MongoDB (with $lookup and $unwind)

We often explain $lookup as similar to a relational JOIN, but there’s a key difference in the returned output format. In SQL, a JOIN returns a flat, tabular result set, which means values from the “one” side of a one‑to‑many relationship are repeated for each matching row on the “many” side. In MongoDB, $lookup outputs related data as arrays embedded within each document, so the “one” side appears only once and related items are grouped together.

Here’s the funny part: Relational databases were invented to avoid duplication through relational normalization—yet a JOIN immediately reintroduces duplication in the query results, consumed as one stream of records. That’s why application code or ORMs often have to break those repeated values back into a nested structure. In MongoDB, when data is stored using references, $lookup can resolve those references and embed the related documents directly in the output, avoiding duplication and producing a structure that’s naturally more application‑friendly.

MongoDB is a document database. Either you embed on write to store related data in a single document, or you embed on read from multiple documents with references, with $lookup, but the result is the same: an object aggregate in JSON, or rather BSON, that is fetched as-is by the drivers.

Still, MongoDB can provide the equivalent of an INNER JOIN or a LEFT OUTER JOIN, when adding an $unwind stage after the $lookup.

I'll demonstrate this with the traditional department and employee schema, using the names from the SCOTT example schema of Oracle database, but as MongoDB collections:

db.createCollection("dept");
db.dept.insertMany([
  { deptno: 10, dname: "ACCOUNTING", loc: "NEW YORK" },
  { deptno: 20, dname: "RESEARCH", loc: "DALLAS" },
  { deptno: 30, dname: "SALES", loc: "CHICAGO" },
  { deptno: 40, dname: "OPERATIONS", loc: "BOSTON" }
]);
db.createCollection("emp");
db.emp.insertMany([
  { empno: 7839, ename: "KING",  deptno: 10 },
  { empno: 7698, ename: "BLAKE", deptno: 30 },
  { empno: 7782, ename: "CLARK", deptno: 10 },
  { empno: 7566, ename: "JONES", deptno: 20 },
  { empno: 7788, ename: "SCOTT", deptno: 20 },
  { empno: 7902, ename: "FORD",  deptno: 20 },
  { empno: 7369, ename: "SMITH", deptno: 20 },
  { empno: 7499, ename: "ALLEN", deptno: 30 },
  { empno: 7521, ename: "WARD",  deptno: 30 },
  { empno: 7654, ename: "MARTIN",deptno: 30 },
  { empno: 7844, ename: "TURNER",deptno: 30 },
  { empno: 7876, ename: "ADAMS", deptno: 20 },
  { empno: 7900, ename: "JAMES", deptno: 30 },
  { empno: 7934, ename: "MILLER",deptno: 10 }
]);

Note that the "OPERATIONS" department has no employees at this point in time.

$lookup: Join with no data duplication

In a SQL database, joining tables duplicates data to ensure each record has all necessary information for independent processing. When loading this data as objects rather than records, the application must detect where objects begin and end and eliminate duplicates—or rely on an ORM to do this automatically. Here is an example of a join to show the departments with their employees:

SQL> select dept.*,empno,ename 
     from dept left outer join emp on dept.deptno = emp.deptno
;

   DEPTNO DNAME         LOC            EMPNO ENAME
_________ _____________ ___________ ________ _________
       10 ACCOUNTING    NEW YORK       7,782 CLARK
       10 ACCOUNTING    NEW YORK       7,839 KING
       10 ACCOUNTING    NEW YORK       7,934 MILLER
       20 RESEARCH      DALLAS         7,369 SMITH
       20 RESEARCH      DALLAS         7,566 JONES
       20 RESEARCH      DALLAS         7,788 SCOTT
       20 RESEARCH      DALLAS         7,876 ADAMS
       20 RESEARCH      DALLAS         7,902 FORD
       30 SALES         CHICAGO        7,499 ALLEN
       30 SALES         CHICAGO        7,521 WARD
       30 SALES         CHICAGO        7,654 MARTIN
       30 SALES         CHICAGO        7,698 BLAKE
       30 SALES         CHICAGO        7,844 TURNER
       30 SALES         CHICAGO        7,900 JAMES
       40 OPERATIONS    BOSTON

15 rows selected.

MongoDB provides a consistent document model across both application code and database storage, and returns the same join result without duplication:

db.dept.aggregate([
  { $lookup: {
      from: "emp",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  }
])

[
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: [
      { _id: ObjectId('691f1623f4a22d6613d4b0f1'), empno: 7839, ename: 'KING', deptno: 10 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f3'), empno: 7782, ename: 'CLARK', deptno: 10 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fe'), empno: 7934, ename: 'MILLER', deptno: 10 }
    ]
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'),deptno: 20,dname: 'RESEARCH',loc: 'DALLAS',
    employees: [
      { _id: ObjectId('691f1623f4a22d6613d4b0f4'), empno: 7566, ename: 'JONES', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f5'), empno: 7788, ename: 'SCOTT', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f6'), empno: 7902, ename: 'FORD', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f7'), empno: 7369, ename: 'SMITH', deptno: 20 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fc'), empno: 7876, ename: 'ADAMS', deptno: 20 }
    ]
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'),deptno: 30,dname: 'SALES',loc: 'CHICAGO',
    employees: [
      { _id: ObjectId('691f1623f4a22d6613d4b0f2'), empno: 7698, ename: 'BLAKE', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f8'), empno: 7499, ename: 'ALLEN', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0f9'), empno: 7521, ename: 'WARD', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fa'), empno: 7654, ename: 'MARTIN', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fb'), empno: 7844, ename: 'TURNER', deptno: 30 },
      { _id: ObjectId('691f1623f4a22d6613d4b0fd'), empno: 7900, ename: 'JAMES', deptno: 30 }
    ]
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dd'), deptno: 40,dname: 'OPERATIONS', loc: 'BOSTON',
    employees: []
  }
]

The $lookup operation in MongoDB is similar to a LEFT OUTER JOIN except that it keeps the data in a structured schema, to be consumed by the application.

$lookup + $unwind: Inner join (with duplication)

The result of $lookup is the easiest for the application, but if you need a record-like output to process further, $unwind can unnest the array into as many documents as the array items:

db.dept.aggregate([
  { $lookup: {
      from: "emp",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  },
  { $unwind: 
       "$employees" 
  }
])

[
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f1'), empno: 7839, ename: 'KING', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f3'), empno: 7782, ename: 'CLARK', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fe'), empno: 7934, ename: 'MILLER', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f4'), empno: 7566, ename: 'JONES', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f5'), empno: 7788, ename: 'SCOTT', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f6'), empno: 7902, ename: 'FORD', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f7'), empno: 7369, ename: 'SMITH', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fc'), empno: 7876, ename: 'ADAMS', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f2'), empno: 7698, ename: 'BLAKE', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f8'), empno: 7499, ename: 'ALLEN', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f9'), empno: 7521, ename: 'WARD', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fa'), empno: 7654, ename: 'MARTIN', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fb'), empno: 7844, ename: 'TURNER', deptno: 30 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0dc'), deptno: 30, dname: 'SALES', loc: 'CHICAGO',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fd'), empno: 7900, ename: 'JAMES', deptno: 30 }
  }
]

Internally, the implementation avoids materializing the array and immediately returns one document per match, the two stages $lookup and $unwind being optimized into a single lookup-unwind stage that acts like a SQL join.

Note that because there are no items in the employees array, there are no documents in the output for department 40. This is exactly like an INNER JOIN in SQL.

$lookup + $unwind preserving empty arrays: Left outer join

It is possible to preserve the LEFT OUTER JOIN output of $lookup through $unwind with preserveNullAndEmptyArrays: true:

db.dept.aggregate([
  { $lookup: {
      from: "emp",
      localField: "deptno",
      foreignField: "deptno",
      as: "employees"
    }
  },
  { $unwind: {
       path: "$employees" ,
       preserveNullAndEmptyArrays: true
     }
  }
])

[
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f1'), empno: 7839, ename: 'KING', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f3'), empno: 7782, ename: 'CLARK', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0da'), deptno: 10, dname: 'ACCOUNTING', loc: 'NEW YORK',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0fe'), empno: 7934, ename: 'MILLER', deptno: 10 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f4'), empno: 7566, ename: 'JONES', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 'RESEARCH', loc: 'DALLAS',
    employees: { _id: ObjectId('691f1623f4a22d6613d4b0f5'), empno: 7788, ename: 'SCOTT', deptno: 20 }
  },
  {
    _id: ObjectId('691ef750f4a22d6613d4b0db'), deptno: 20, dname: 
                                    
                                    
                                    
                                    
                                

November 20, 2025

Performing Standby Datacentre Promotions of a Patroni Cluster

Unlike the standard multi-node Postgres replication cluster, when managed by Patroni, all failovers are automatically executed. However, this is not the case when dealing with inter-datacentre failovers when for instance a standby datacentre must take over from a failed primary. The following describes the mechanisms required to perform such a procedure when the case arises. […]

November 19, 2025

Data Retention Policy Implementation – How and Why

Is your PostgreSQL database Feeling Sluggish? Are SQL statements taking more time than in earlier days? Are you experiencing performance cliffs (Unexpected, sudden drops in performance)? Are backups taking a long time to complete? Are you getting a bigger bill for storage? Are standby rebuilds and development refreshes becoming a herculean task? Are the auditors […]

November 18, 2025

Introducing the GA Release of the New Percona Operator for MySQL: More Replication Options on Kubernetes

The Percona Cloud Native team is happy to announce the general availability of the Percona Operator for MySQL, based on Percona Server for MySQL. This release introduces an additional Kubernetes-native approach to deploying and managing MySQL clusters with synchronous Group Replication, delivering the consistency required for organizations with business continuity needs. With this release, Percona […]

November 17, 2025

Nested Loop and Hash Join for MongoDB $lookup

SQL databases offer several join algorithms. The query planner selects the most efficient one by evaluating cardinality and estimated cost. For example, a Nested Loop join is ideal when the outer table has few rows and an index allows fast access to the inner table. In contrast, a Hash Join is better suited for situations where the outer table contains many rows and the inner table must be fully scanned, resulting in fewer costly loops.

While MongoDB provides similar algorithms, adapted to flexible documents, being a NoSQL database means it shifts more responsibility to the developer. Developers must design for optimal data access, already in the data model, but has the advantage of resulting in more predictable performance.

I'll base my example on a question on Reddit: Optimizing a MongoDB JOIN with $lookup and $limit. I use a collection of users, where each user references a profile. The profile has a status. The query lists the users with no profile or with a profile with a status equal to 2.

In my demo, I set up two profile keys: "_id," which is automatically indexed in MongoDB, and an "ID" field, which is not. This setup illustrates both situations—an indexed lookup table versus a non-indexed one. Normally, you'd use just one method, depending on which join algorithm you favor.

db.profiles.drop()
db.users.drop()

db.profiles.insertMany([
  { _id:102, ID: 102, status: 2 },
  { _id:201, ID: 201, status: 1 },
  { _id:302, ID: 302, status: 2 },
  { _id:403, ID: 403, status: 3 }
]);

db.users.insertMany([
  { name: "Alice" , profileID: 403 },   // profile status = 3
  { name: "Bob", profileID: 102 },      // profile status = 2
  { name: "Charlie", profileID: 201 },  // profile status = 1
  { name: "Diana", profileID: 102 },    // profile status = 2
  { name: "Eve" },                      // no profile
  { name: "Franck" , profileID: 403 },  // profile status = 3
  { name: "Gaspar" , profileID: 403 },  // profile status = 3
  { name: "Hans" , profileID: 403 },    // profile status = 3
  { name: "Iona" , profileID: 403 },    // profile status = 3
  { name: "Jane" , profileID: 403 },    // profile status = 3
  { name: "Karl" , profileID: 403 },    // profile status = 3
  { name: "Lili" },                     // no profile
  { name: "Math" },                     // no profile
  { name: "Niall" },                    // no profile
  { name: "Oscar" , profileID: 403 },   // status = 3  
  { name: "Paula" , profileID: 102 },   // status = 2  
  { name: "Quentin" , profileID: 201 }, // status = 1  
  { name: "Ravi" , profileID: 102 },    // status = 2  
  { name: "Sofia" },                    // no profile  
  { name: "Takumi" , profileID: 403 },  // status = 3  
  { name: "Uma" , profileID: 403 },     // status = 3  
  { name: "Viktor" , profileID: 403 },  // status = 3  
  { name: "Wafa" , profileID: 403 },    // status = 3  
  { name: "Ximena" , profileID: 403 },  // status = 3  
  { name: "Yara" },                     // no profile  
  { name: "Zubair" },                   // no profile 
]);

Here is my query on this small data set:

db.users.aggregate([  
  {  
    $lookup: {  
      from: "profiles",  
      localField: "profileID",  
      foreignField: "ID",  
      as: "profile"  
    }  
  },  
  {  
    $match: {  
      $or: [  
        { profile: { $eq: [] } }, // no profile  
        { profile: { $elemMatch: { status: 2 } } } // profile status = 2  
      ]  
    }  
  },  
  {  
    $project: {  
      _id: 0,  
      name: 1,  
      "profile.status": 1 // keep only the status field from joined profile  
    }  
  }  
]);  

Here is the result:

[
  { name: 'Bob', profile: [ { status: 2 } ] },
  { name: 'Diana', profile: [ { status: 2 } ] },
  { name: 'Eve', profile: [] },
  { name: 'Lili', profile: [] },
  { name: 'Math', profile: [] },
  { name: 'Niall', profile: [] },
  { name: 'Paula', profile: [ { status: 2 } ] },
  { name: 'Ravi', profile: [ { status: 2 } ] },
  { name: 'Sofia', profile: [] },
  { name: 'Yara', profile: [] },
  { name: 'Zubair', profile: [] },
]
Type "it" for more
test>

To scale the number of users, I multiply each existing user by 10,000 using the following script:

const currentUsers = db.users.find({},{_id:0, name:1, profileID:1});
currentUsers.forEach(userDoc => {  
    print(`Inserting 10,000 documents for: ${JSON.stringify(userDoc)}`);  
    for (let i = 0; i < 10000; i++) { 
      const newUsers = [];
      const clone = { ...userDoc };
      clone.name=`${i}${clone.name}`
      newUsers.push(clone);
      db.users.insertMany(newUsers);
    }
})

I have now 260,026 users.

Indexed nested loop join

I run my query with explain("executionStats") and extract the most important statistics about the time and the number of documents examined:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "_id",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
]).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

The lookup stage has returned all documents, as it must be joined before filtering, in 2.5 seconds:

 nReturned: 260026
 executionTimeMillis: 2456
 totalKeysExamined: 190019
 totalDocsExamined: 450045
 EQ_LOOKUP strategy: IndexedLoopJoin

The equality lookup used an indexed loop join strategy, with an index scan for each document:

  • nReturned: 260026: All local documents with their joined profile arrays
  • executionTimeMillis: 2456: Total execution time including both join and filter stages
  • totalKeysExamined: 190019: Only keys that found matches in the profiles collection's index on "_id" are accounted (lookup_query_stats). The index can determine "key not found" without actually examining a key entry.
  • totalDocsExamined: 450045: users collection scan (260,026) + profile documents fetched (190,019)

The number of profiles examined is high compared to the number of profiles in the collection, then another algorithm can be faster by reading all profiles once and lookup from a hash table.

Hash Join on small unindexed table

MongoDB 8.0 chooses a hash join on the following conditions:

  • allowDiskUse: true is set (required for spilling if hash table exceeds memory)
  • Foreign collection is small enough - controlled by these parameters: internalQueryCollectionMaxNoOfDocumentsToChooseHashJoin (default: 10,000 docs), internalQueryCollectionMaxDataSizeBytesToChooseHashJoin (default: 100 MB), and internalQueryCollectionMaxStorageSizeBytesToChooseHashJoin (default: 100 MB)
  • No compatible index exists, disk use is allowed and hash join is more efficient

To show that, I use the "ID" field, that is not indexed, for the lookup:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
],{ allowDiskUse: true } ).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

The HashJoin completed 3.3x faster (750ms vs 2,456ms) with significantly different execution patterns:

 nReturned: 260026
 executionTimeMillis: 750
 totalKeysExamined: 0
 totalDocsExamined: 260030
 EQ_LOOKUP strategy: HashJoin

HashJoin Works in two phases, with no index required (totalKeysExamined: 0):

  • Build Phase - Scans the foreign collection once to build an in-memory hash table keyed by foreignField values. It has read the 4 profiles.
  • Probe Phase - Scans the local collection once, probing the hash table for each local key. It has read 260,026 users.

The total is 260,030 documents examined.

Nested loop join without index

A third option is a nested loop join that scans the collection in each loop, rather than using an index or building a hash table. I disable disk usage to avoid hash join and use the non-indexed field. to avoid indexed nested loop:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
],{ allowDiskUse: false } ).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

Here are the execution statistics:

 nReturned: 260026
 executionTimeMillis: 1618
 totalKeysExamined: 0
 totalDocsExamined: 1300130
 EQ_LOOKUP strategy: NestedLoopJoin

Like other algorithms, 260,026 users were read. Since there was no index on the foreign field, threre's no index scan at all (totalKeysExamined: 0). This caused the system to scan the 4 profiles for each user, resulting in a total of 260,026 + 4 × 260,026 = 1300130 documents examined.

In this example, this approach is five times more costly than IndexedLoopJoin in terms of documents examined, and three times more than HashJoin because NestedLoopJoin requires repeatedly scanning the foreign collection. Interestingly, because the lookup collection is very small and sequential scans are cache-friendly, the execution time surpasses that of the indexed nested loop in this case, as index seeks incur additional costs.

Summary

Like with any database, it is important to understand the join algorithms. When you use a lookup in the aggregtion pipleline, MongoDB selects the join strategy based on the query, collections and existing indexes. The three algorithms are:

  1. IndexedLoopJoin - Best when:

    • Compatible index exists on foreignField
    • Low to medium match rate
    • Foreign collection is large
  2. HashJoin - Best when:

    • allowDiskUse: true is set
    • Foreign collection is small (< 10,000 docs, < 100MB)
    • High match rate or no compatible index
  3. NestedLoopJoin - Fallback when:

    • No compatible index exists
    • allowDiskUse: false prevents HashJoin
    • Acceptable only for tiny foreign collections

Unlike SQL databases, where the optimizer makes all decisions but can also lead to surprises, MongoDB shifts responsibility to developers. You must:

  • Design your schema with join performance in mind. For bounded relstionships, embedding may be the best solution.
  • Understand your data (collection sizes, match rates) to predict which strategy will be be the best.
  • Test different strategies by creating/dropping indexes and toggling allowDiskUse
  • Measure performance using explain("executionStats") to validate your choices.

This design favors predictability and control over relying on automatic optimization. So when you hear statements like 'joins are slow' or 'lookups are fast', take time to understand the facts, how these operations are actually executed, before forming an opinion.

Nested Loop and Hash Join for MongoDB $lookup

SQL databases offer several join algorithms. The query planner selects the most efficient one by evaluating cardinality and estimated cost. For example, a nested loop join is ideal when the outer table has few rows and an index allows fast access to the inner table. In contrast, a hash join is better suited for situations where the outer table contains many rows and the inner table must be fully scanned, resulting in fewer costly loops.

While MongoDB provides similar algorithms, adapted to flexible documents, being a NoSQL database means it shifts more responsibility to the developer. Developers must design for optimal data access, already in the data model, but it has the advantage of resulting in more predictable performance.

I'll base my example on a question on Reddit: Optimizing a MongoDB JOIN with $lookup and $limit. I use a collection of users, where each user references a profile. The profile has a status. The query lists the users with no profile or with a profile with a status equal to 2.

In my demo, I set up two profile keys: "_id," which is automatically indexed in MongoDB, and an "ID" field, which is not. This setup illustrates both situations—an indexed lookup table versus a non-indexed one. Normally, you'd use just one method, depending on which join algorithm you favor.

db.profiles.drop()
db.users.drop()

db.profiles.insertMany([
  { _id:102, ID: 102, status: 2 },
  { _id:201, ID: 201, status: 1 },
  { _id:302, ID: 302, status: 2 },
  { _id:403, ID: 403, status: 3 }
]);

db.users.insertMany([
  { name: "Alice" , profileID: 403 },   // profile status = 3
  { name: "Bob", profileID: 102 },      // profile status = 2
  { name: "Charlie", profileID: 201 },  // profile status = 1
  { name: "Diana", profileID: 102 },    // profile status = 2
  { name: "Eve" },                      // no profile
  { name: "Franck" , profileID: 403 },  // profile status = 3
  { name: "Gaspar" , profileID: 403 },  // profile status = 3
  { name: "Hans" , profileID: 403 },    // profile status = 3
  { name: "Iona" , profileID: 403 },    // profile status = 3
  { name: "Jane" , profileID: 403 },    // profile status = 3
  { name: "Karl" , profileID: 403 },    // profile status = 3
  { name: "Lili" },                     // no profile
  { name: "Math" },                     // no profile
  { name: "Niall" },                    // no profile
  { name: "Oscar" , profileID: 403 },   // status = 3  
  { name: "Paula" , profileID: 102 },   // status = 2  
  { name: "Quentin" , profileID: 201 }, // status = 1  
  { name: "Ravi" , profileID: 102 },    // status = 2  
  { name: "Sofia" },                    // no profile  
  { name: "Takumi" , profileID: 403 },  // status = 3  
  { name: "Uma" , profileID: 403 },     // status = 3  
  { name: "Viktor" , profileID: 403 },  // status = 3  
  { name: "Wafa" , profileID: 403 },    // status = 3  
  { name: "Ximena" , profileID: 403 },  // status = 3  
  { name: "Yara" },                     // no profile  
  { name: "Zubair" },                   // no profile 
]);

Here is my query on this small data set:

db.users.aggregate([  
  {  
    $lookup: {  
      from: "profiles",  
      localField: "profileID",  
      foreignField: "ID",  
      as: "profile"  
    }  
  },  
  {  
    $match: {  
      $or: [  
        { profile: { $eq: [] } }, // no profile  
        { profile: { $elemMatch: { status: 2 } } } // profile status = 2  
      ]  
    }  
  },  
  {  
    $project: {  
      _id: 0,  
      name: 1,  
      "profile.status": 1 // keep only the status field from joined profile  
    }  
  }  
]);  

Note that the first optimization I did is replacing the join expression (let: { userId: "$_id" },pipeline: [ { $match: { $expr: { $eq: ["$userId", "$$userId"] } } } ]) with localField/foreignField to allow the push down of the join predicate (EQ_LOOKUP)

Here is the result:

[
  { name: 'Bob', profile: [ { status: 2 } ] },
  { name: 'Diana', profile: [ { status: 2 } ] },
  { name: 'Eve', profile: [] },
  { name: 'Lili', profile: [] },
  { name: 'Math', profile: [] },
  { name: 'Niall', profile: [] },
  { name: 'Paula', profile: [ { status: 2 } ] },
  { name: 'Ravi', profile: [ { status: 2 } ] },
  { name: 'Sofia', profile: [] },
  { name: 'Yara', profile: [] },
  { name: 'Zubair', profile: [] },
]
Type "it" for more
test>

To scale the number of users, I multiply each existing user by 10,000 using the following script:

const currentUsers = db.users.find({},{_id:0, name:1, profileID:1});
currentUsers.forEach(userDoc => {  
    print(`Inserting 10,000 documents for: ${JSON.stringify(userDoc)}`);  
    for (let i = 0; i < 10000; i++) { 
      const newUsers = [];
      const clone = { ...userDoc };
      clone.name=`${i}${clone.name}`
      newUsers.push(clone);
      db.users.insertMany(newUsers);
    }
})

I now have 260,026 users.

Indexed nested loop join

I run my query with explain("executionStats") and extract the most important statistics about the time and the number of documents examined:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "_id",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
]).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

The lookup stage has returned all documents, as it must be joined before filtering, in 2.5 seconds:

 nReturned: 260026
 executionTimeMillis: 2456
 totalKeysExamined: 190019
 totalDocsExamined: 450045
 EQ_LOOKUP strategy: IndexedLoopJoin

The equality lookup used an indexed loop join strategy, with an index scan for each document:

  • nReturned: 260026: All local documents with their joined profile arrays
  • executionTimeMillis: 2456: Total execution time including both join and filter stages
  • totalKeysExamined: 190019: Only keys that found matches in the profiles collection's index on "_id" are accounted (lookup_query_stats). The index can determine "key not found" without actually examining a key entry.
  • totalDocsExamined: 450045: Users collection scan (260,026) + profile documents fetched (190,019)

The number of profiles examined is high compared to the number of profiles in the collection. Another algorithm can be faster by reading all profiles once and lookup from a hash table.

Hash join on small unindexed table

MongoDB 8.0 chooses a hash join on the following conditions:

  • allowDiskUse: true is set (required for spilling if hash table exceeds memory)
  • Foreign collection is small enough—controlled by these parameters: internalQueryCollectionMaxNoOfDocumentsToChooseHashJoin (default: 10,000 docs), internalQueryCollectionMaxDataSizeBytesToChooseHashJoin (default: 100 MB), and internalQueryCollectionMaxStorageSizeBytesToChooseHashJoin (default: 100 MB)
  • No compatible index exists, disk use is allowed and hash join is more efficient

To show that, I use the "ID" field, that is not indexed, for the lookup:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
],{ allowDiskUse: true } ).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

The hash join completed 3.3x faster (750ms vs 2,456ms) with significantly different execution patterns:

 nReturned: 260026
 executionTimeMillis: 750
 totalKeysExamined: 0
 totalDocsExamined: 260030
 EQ_LOOKUP strategy: HashJoin

Hash join works in two phases, with no index required (totalKeysExamined: 0):

  • Build phase: Scans the foreign collection once to build an in-memory hash table keyed by foreignField values. It has read the four profiles.
  • Probe phase: Scans the local collection once, probing the hash table for each local key. It has read 260,026 users.

The total is 260,030 documents examined.

Nested loop join without index

A third option is a nested loop join that scans the collection in each loop, rather than using an index or building a hash table. I disable disk usage to avoid hash join and use the non-indexed field to avoid indexed nested loop:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
],{ allowDiskUse: false } ).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

Here are the execution statistics:

 nReturned: 260026
 executionTimeMillis: 1618
 totalKeysExamined: 0
 totalDocsExamined: 1300130
 EQ_LOOKUP strategy: NestedLoopJoin

Like other algorithms, 260,026 users were read. Since there was no index on the foreign field, there's no index scan at all (totalKeysExamined: 0). This caused the system to scan the four profiles for each user, resulting in a total of 260,026 + 4 × 260,026 = 1,300,130 documents examined.

In this example, this approach is five times more costly than indexed loop join in terms of documents examined, and three times more than hash join because nested loop join requires repeatedly scanning the foreign collection. Interestingly, because the lookup collection is very small and sequential scans are cache-friendly, the execution time surpasses that of the indexed nested loop in this case, as index seeks incur additional costs.

Indexed nested loop with covering index

If you have your key in another field than "_id", like here with "ID", you can create an index on it and the plan will use an indexed nested loop join. This has the same performance as the query using "_id" because, except for clustered collections, all indexes are secondary in MongoDB and access the document via an internal record identifier. Using "_id" avoids creating one more index. However, one reason to create another index is to get it to cover all necessary fields:

db.profiles.createIndex({ ID: 1, status: 1 })  

Now, the previous query using "ID" as the foreign field will use the index and get the same performance as the indexed nested loop join above when using "_id". However, it is not a covering index because there are no projections defined.

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
]).explain("executionStats")

xs=x["stages"][0]["$cursor"]["executionStats"];
xp=x["stages"][0]["$cursor"]["queryPlanner"]["winningPlan"]["queryPlan"];
print("nReturned:",           xs["nReturned"],
      "executionTimeMillis:", xs["executionTimeMillis"],
      "totalKeysExamined:",   xs["totalKeysExamined"],
      "totalDocsExamined:",   xs["totalDocsExamined"],
      xp["stage"]+" strategy:",  xp["strategy"],
)

Here are the execution statistics, the same as with the index on "_id", running in 2.5 seconds:

 nReturned: 260026
 executionTimeMillis: 2484
 totalKeysExamined: 190019
 totalDocsExamined: 450045
 EQ_LOOKUP strategy: IndexedLoopJoin

I can add a projection to the lookup pipeline, but be careful, it might be slower:

x=db.users.aggregate([
  { "$lookup" : {
    "from" : "profiles",
    "localField" : "profileID",
    "foreignField" : "ID",
    pipeline: [ 
     { $project: { _id: 0, ID: 1, status: 1 } } 
    ],
    "as" : "profile" }},
  {
    $match: {
      $or: [
        { "profile": { $eq: [] } }, 
        { "profile": { $elemMatch: { status: 2 } } },
      ]
    }
  },
] ).explain("executionStats")

xs=x["stages"][1];

The index is used as a covering index but unfortunately, this doesn't use the Slot-Based Query Execution Engine (SBE) and finally takes longer:

{
  '$lookup': {
    from: 'profiles',
    as: 'profile',
    localField: 'profileID',
    foreignField: 'ID',
    let: {},
    pipeline: [ { '$project': { _id: 0, ID: 1, status: 1 } } ]
  },
  totalDocsExamined: Long('0'),
  totalKeysExamined: Long('190019'),
  collectionScans: Long('0'),
  indexesUsed: [ 'ID_1_status_1' ],
  nReturned: Long('260026'),
  executionTimeMillisEstimate: Long('21309')
}

The absence of collection scan (collectionScans: 0) confirms that an index is used, and it's the index I've created (indexesUsed: [ 'ID_1_status_1' ]). The absence of document examined (totalDocsExamined: 0) confirms that it uses a covering index. However, it took 21 seconds to execute.

When a pipeline is added to the lookup stage, SBE can no longer be used, so the classic engine plans each lookup query at runtime, which explains the longer elapsed time. SBE’s optimized execution model for joins requires the join to be completely specified at planning time, with only localField/foreignField and no pipeline (determineSbeCompatibility()). In contrast, covering indexes need a pipeline for runtime projection control, making it impossible to use both optimizations together. Additionally, the current version (findSbeCompatibleStagesForPushdown) limits lookup to fully local collections and considers whether an unwind follows. If you know there will be only one element, use { $arrayElemAt:["$profiles.status",0]} instead of $unwind: "$profiles" for better efficiency.

Nested loop with materialization

There is on... (truncated)