Relational composition is to joins what the cartesian product is to tables: it produces every result that could be true, not just what is true. This often leads to SQL mistakes and can often be suspected when a SELECT DISTINCT is added after a query starts returning more rows than expected, without the root cause being understood.
In its mathematical definition, relational composition is the derived relation obtained by existentially joining two relations on a shared attribute and projecting away that attribute. In a database, it is meaningful only when a real‑world invariant ensures that the resulting pairs reflect actual facts. Otherwise, the result illustrates what E. F. Codd, in his 1970 paper A Relational Model of Data for Large Shared Data Banks, called the connection trap.
Codd uses two relations in his example: a supplier supplies parts, and a project uses parts. At an intuitive level, this connection trap mirrors a syllogism: if a supplier supplies a part and a project uses that part, a join can derive that the supplier supplies the project—even when that conclusion was never stated as a fact.
Codd observed that the connection trap was common in pre‑relational network data models, where users navigated data by following physical pointers. Path existence was often mistaken for semantic relationship. The relational model solved this problem by replacing navigational access with declarative queries over explicitly defined relations, and modern document models now do the same.
However, while the relational model removes pointer‑based navigation, it does not eliminate the trap entirely. Joins can still compute relational compositions, and without appropriate cardinality constraints or business invariants, such compositions may represent only possible relationships rather than actual ones. In this way, the connection trap can be reintroduced at query time, even in modern relational systems such as PostgreSQL, and similarly through $lookup operations in MongoDB.
PostgreSQL — reproducing the connection trap
This model declares suppliers, parts, projects, and two independent many‑to‑many relationships:
CREATE TABLE suppliers (
supplier_id TEXT PRIMARY KEY
);
CREATE TABLE parts (
part_id TEXT PRIMARY KEY
);
CREATE TABLE projects (
project_id TEXT PRIMARY KEY
);
-- Supplier supplies parts
CREATE TABLE supplier_part (
supplier_id TEXT REFERENCES suppliers,
part_id TEXT REFERENCES parts,
PRIMARY KEY (supplier_id, part_id)
);
-- Project uses parts
CREATE TABLE project_part (
project_id TEXT REFERENCES projects,
part_id TEXT REFERENCES parts,
PRIMARY KEY (project_id, part_id)
);
This follows Codd’s classic suppliers–parts–projects example, where suppliers supply parts and projects use parts as independent relationships.
The following data asserts that project Alpha uses parts P1 and P2, that supplier S1 supplies parts P1 and P2, and that supplier S2 supplies parts P2 and P3:
INSERT INTO suppliers VALUES ('S1'), ('S2');
INSERT INTO parts VALUES ('P1'), ('P2'), ('P3');
INSERT INTO projects VALUES ('Alpha');
-- Supplier capabilities
INSERT INTO supplier_part VALUES
('S1', 'P1'),
('S1', 'P2'),
('S2', 'P2'),
('S2', 'P3');
-- Project uses parts P1 and P2
INSERT INTO project_part VALUES
('Alpha', 'P1'),
('Alpha', 'P2');
The following query is valid SQL:
SELECT DISTINCT
sp.supplier_id,
pp.project_id
FROM supplier_part sp
JOIN project_part pp
ON sp.part_id = pp.part_id;
However, this query falls into the connection trap:
supplier_id | project_id
-------------+------------
S2 | Alpha
S1 | Alpha
(2 rows)
As we defined only supplier–part and project–part relationships, any derived supplier–project relationship is not a fact but a relational composition. We know that Alpha uses P1 and P2, and that part P2 can be supplied by either S1 or S2, but we have no record of which supplier actually supplies Alpha.
This query asserts “Supplier S1 supplies project Alpha”, but the data only says: “S1 and S2 supply P2” and “Alpha uses P2”.
This is the connection trap, expressed purely in SQL.
PostgreSQL — the correct relational solution
If a supplier actually supplies a part to a project, that fact must be represented directly. We need a new table:
CREATE TABLE supply (
supplier_id TEXT,
project_id TEXT,
part_id TEXT,
PRIMARY KEY (supplier_id, project_id, part_id),
FOREIGN KEY (supplier_id, part_id)
REFERENCES supplier_part (supplier_id, part_id),
FOREIGN KEY (project_id, part_id)
REFERENCES project_part (project_id, part_id)
);
These foreign keys encode subset constraints between relations and prevent inserting supplies of parts not supplied by the supplier or not used by the project.
This relation explicitly states who supplies what to which project. We assume that the real‑world fact is “Alpha gets part P2 from supplier S1”:
INSERT INTO supply VALUES
('S1', 'Alpha', 'P2');
The correct query reads from this relation:
SELECT supplier_id, project_id
FROM supply;
supplier_id | project_id
-------------+------------
S1 | Alpha
(1 row)
The relationship is now real and asserted, not inferred. In total, we have six tables:
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------
public | parts | table | postgres
public | project_part | table | postgres
public | projects | table | postgres
public | supplier_part | table | postgres
public | suppliers | table | postgres
public | supply | table | postgres
(6 rows)
In practice, you should either store the relationship explicitly or avoid claiming it exists. Although the relational model avoids pointers, it is still possible to join through an incorrect path, so the application must enforce the correct one.
In ad-hoc query environments such as data warehouses, data is typically organized into domains and modeled using a dimensional ("star schema") approach. Relationships like project–supplier are represented as fact tables within a single data mart, exposing only semantically valid join paths and preventing invalid joins.
MongoDB — reproducing the connection trap
The following MongoDB data mirrors the PostgreSQL example. MongoDB allows representing relationships either as separate collections or by embedding, depending on the bounded context. Here we start with separate collections to mirror the relational model:
db.suppliers.insertMany([
{ _id: "S1" },
{ _id: "S2" }
]);
db.parts.insertMany([
{ _id: "P1" },
{ _id: "P2" },
{ _id: "P3" }
]);
db.projects.insertMany([
{ _id: "Alpha" }
]);
// Supplier capabilities
db.supplier_parts.insertMany([
{ supplier: "S1", part: "P1" },
{ supplier: "S1", part: "P2" },
{ supplier: "S2", part: "P2" },
{ supplier: "S2", part: "P3" }
]);
// Project uses parts P1 and P2
db.project_parts.insertMany([
{ project: "Alpha", part: "P1" },
{ project: "Alpha", part: "P2" }
]);
Using the simple find() API, we cannot fall into the trap directly because there is no implicit connection between suppliers and projects. The application must issue two independent queries and combine the results explicitly.
Simulating the connection trap in a single query therefore requires explicit composition at the application level:
const partsUsedByAlpha = db.project_parts.find(
{ project: "Alpha" },
{ _id: 0, part: 1 }
).toArray();
const suppliersForParts = db.supplier_parts.find(
{ part: { $in: partsUsedByAlpha.map(p => p.part) } },
{ _id: 0, supplier: 1, part: 1 }
).toArray();
const supplierProjectPairs = suppliersForParts.map(sp => ({
supplier: sp.supplier,
project: "Alpha"
}));
print(supplierProjectPairs);
When forced by the application logic, here is the connection trap associating suppliers and projects:
[
{ supplier: 'S1', project: 'Alpha' },
{ supplier: 'S1', project: 'Alpha' },
{ supplier: 'S2', project: 'Alpha' }
]
As with SQL joins, a $lookup in an aggregation pipeline can fall in the same connection trap:
db.supplier_parts.aggregate([
{
$lookup: {
from: "project_parts",
localField: "part",
foreignField: "part",
as: "projects"
}
},
{ $unwind: "$projects" },
{
$project: {
_id: 0,
supplier: "$supplier",
project: "$projects.project"
}
}
]);
The result is similar and the projection removed the intermediate attributes:
{ "supplier": "S1", "project": "Alpha" }
{ "supplier": "S1", "project": "Alpha" }
{ "supplier": "S2", "project": "Alpha" }
We reproduced the connection trap by ignoring that $lookup produces a derived relationship, not a real one, and that matching keys does not carry business meaning.
MongoDB — normalized solution
As with SQL, we can add an explicit supplies collection that stores the relationship between projects and suppliers:
db.supplies.insertOne({
project: "Alpha",
supplier: "S1",
part: "P2"
});
Then we simply query this collection:
db.supplies.find(
{ project: "Alpha" },
{ _id: 0, supplier: 1, part: 1 }
);
[ { supplier: 'S1', part: 'P2' } ]
The document model is a superset of the relational model as relations can be stored as flat collections. The difference is that referential integrity is enforced by the application rather than in-database. To enforce relationships in the database, they must be embedded as sub-documents and arrays.
MongoDB — domain-driven solution
It's not the only solution in a document database, as we can store a schema based on the domain model rather than normalized. MongoDB allows representing this relationship as part of an aggregate. In a project‑centric bounded context, the project is the aggregate root, and the supplier information can be embedded as part of the supply fact:
db.projects.updateOne(
{ _id: "Alpha" },
{
$set: {
parts: [
{ part: "P2", supplier: "S1" },
{ part: "P1", supplier: null }
]
}
},
{ upsert: true }
);
The query doesn't need a join and cannot fall into the connection trap:
db.projects.find(
{ _id: "Alpha" },
{ _id: 1, "parts.supplier": 1 }
);
[
{
_id: 'Alpha',
parts: [
{ supplier: 'S1' },
{ supplier: null }
]
}
]
This avoids the connection trap by construction. It may look like data duplication—the same supplier name may appear in multiple project documents—and indeed this would be undesirable in a fully normalized model shared across all business domains. However, this structure represents a valid aggregate within a bounded context.
In this context, the embedded supplier information is part of the supply fact, not a reference to a global supplier record. If a supplier’s name changes, it is a business decision, not a database decision, whether that change should be propagated to existing projects or whether historical data should retain the supplier name as it was at the time of supply.
Even when propagation is desired, MongoDB allows updating embedded data efficiently:
db.projects.updateMany(
// document filter
{ "parts.supplier": "S1" },
// document update using the array's item from array filter
{
$set: {
"parts.$[p].supplier": "Supplier One"
}
},
// array filter defining the array's item for the update
{
arrayFilters: [{ "p.supplier": "S1" }]
}
);
This update is not atomic across documents, but each document update is atomic and the operation is idempotent and can be safely retried or executed within an explicit transaction if full atomicity is required.
Conclusion
The connection trap occurs whenever relationships are inferred from shared keys, at query time, instead of being explicitly represented as facts, at write time. In SQL, this means introducing explicit association tables and enforcing integrity constraints, rather than deriving then though joins. In MongoDB, it means modeling relationships as explicit documents or embedded subdocuments rather than deriving them through lookups.
In a relational database, the schema is designed to be normalized and independent of specific use cases. All many‑to‑many and fact‑bearing relationships must be declared explicitly, and queries must follow the correct relational path. Referential and cardinality constraints are essential to restrict to only actual facts.
In MongoDB, the data model is typically driven by the domain and the application’s use cases. In a domain-driven design (DDD), strong relationships are modeled as aggregates, embedding related data directly within a document in MongoDB collections. This makes the intended semantics explicit and avoids inferring relationships at query time. Apparent duplication is not a flaw here, but a deliberate modeling choice within a bounded context.
Ultimately, the connection trap is not fully avoided by the data model, but can be a query-time error with joins and projections: deriving relationships that were never asserted. Whether using normalized relations or domain‑driven documents, the rule is the same—if a relationship is a fact, it must be stored as one.
by Franck Pachot