You may have learned about normal forms when databases were designed before the applications that used them. At that time, relational data models focused on enterprise-wide entities, defined before access patterns were known, so future applications could share a stable, normalized schema.
Today, we design databases for specific applications or bounded domains. Instead of defining a full model up front, we add features incrementally, gather feedback, and let the schema evolve with the application.
Normal forms aren't just relational theory—they describe real data dependencies. MongoDB's document model doesn't exempt you from thinking about normalization—it gives you more flexibility in how you apply it.
Example: Pizzerias
We're starting a new business: a large network of pizzerias across many areas with a wide variety of pizzas. But let's start small.
Tabular: One Pizza, One Area
As a minimal viable product (MVP), each pizzeria has one manager, sells only one variety, and delivers to one area. You can choose any database for this: key-value, relational, document, or even a spreadsheet. The choice will matter only when your product evolves.
Here is our first pizzeria:
{
name: "A1 Pizza",
manager: "Bob",
variety: "Thick Crust",
area: "Springfield"
}
With no repeating groups or multi-valued attributes, the model is already in First Normal Form (1NF). Because the MVP data model is simple—one value per attribute and a single key—there are no dependencies that would violate higher normal forms.
Many database designs start out fully normalized, not because the designer worked through every normal form, but because the initial dataset is too simple for complex dependencies to exist.
Normalization becomes necessary later, as business rules evolve and new varieties, areas, and independent attributes introduce dependencies that higher normal forms address.
1NF: More Menu Options
The business started quite well and evolves. A pizzeria can now offer several varieties.
The following, adding multiple varieties in a single field, would violate 1NF:
{
name: "A1 Pizza",
manager: "Bob",
varieties: "Thick Crust, Stuffed Crust",
area: "Springfield"
}
1NF requires atomic values—each field should hold one indivisible piece of data. A comma-separated string breaks this rule: you can't easily query, index, or update individual varieties. You can manipulate it as a character string, but you can't treat each entry as a distinct pizza variety, and you can't index it efficiently.
SQL and NoSQL databases avoid this pattern for different reasons. In a relational database, the logical model must be independent of cardinalities and access patterns. Because the relational model doesn't know whether there are two or one million pizza varieties, it treats every one-to-many relationship as unbounded and stores it in a separate table as a set of pizzeria–variety relationships rather than embedding varieties within the pizzeria entity.
Once we understand the application domain, we can set realistic bounds. Thousands of pizza varieties in the menu would be impractical from a business perspective well before hitting database limits, so storing the varieties together can be acceptable. When object-oriented applications use richer structures than two-dimensional tables, it's better to represent such lists as arrays rather than comma-separated strings:
{
name: "A1 Pizza",
manager: "Bob",
email: "bob@a1-pizza.it",
varieties: ["Thick Crust", "Stuffed Crust"]
}
Arrays of atomic values satisfy a document-oriented equivalent of 1NF—each element is atomic and independently addressable—even though the document model isn't bound by the relational requirement of flat tuples. While SQL databases provide abstraction and logical-physical data independence, MongoDB keeps data colocated down to the storage and CPU caches for more predictable performance.
Normal form definitions assume keys for each 1NF relation. In a document model, multiple relations can appear as embedded sub-documents or arrays. Treating the parent key and the array element together as a composite key lets us apply higher normal forms to analyze partial and transitive dependencies within a single document.
2NF: Pizza Pricing
We want to add the price of the pizzas to our database. If each pizzeria defines its own base price, it can be added to the varieties items:
{
name: "A1 Pizza",
manager: "Bob",
email: "bob@a1-pizza.it",
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
]
}
Second Normal Form (2NF) builds on 1NF by requiring that every non-key attribute depends on the entire primary key, not just part of it. This only becomes relevant when dealing with composite keys.
In our embedded model, consider the composite key ("pizzeria", "variety") for each item in the varieties array. If the price depends on the pizzeria and variety together—meaning different pizzerias can set different prices for the same variety—then "basePrice" depends on the full composite key, and we satisfy 2NF.
However, if prices are standardized across all pizzerias—the same variety costs the same everywhere—then a partial dependency exists: "basePrice" depends only on "variety", not on the full ("pizzeria", "variety") key. This violates 2NF.
To resolve this, we define pricing in a separate collection where the base price depends only on the pizza variety:
{ variety: "Thick Crust", basePrice: 10 }
{ variety: "Stuffed Crust", basePrice: 12 }
We can remove the base price from the pizzeria's varieties array and retrieve it from the pricing collection at query time:
db.createView(
"pizzeriasWithPrices",
"pizzerias",
[
{ $unwind: "$varieties" },
{
$lookup: {
from: "pricing",
localField: "varieties.name",
foreignField: "variety",
as: "priceInfo"
}
},
{ $unwind: "$priceInfo" },
{ $addFields: { "varieties.basePrice": "$priceInfo.basePrice" } },
{ $project: { priceInfo: 0 } }
]
);
Alternatively, we can use the pricing collection as a reference, where the application retrieves the price and stores it in the pizzeria document for faster reads.
To avoid update anomalies, the application updates all affected documents when a variety's price changes:
const session = db.getMongo().startSession();
const sessionDB = session.getDatabase(db.getName());
session.startTransaction();
sessionDB.getCollection("pricing").updateOne(
{ variety: "Thick Crust" },
{ $set: { basePrice: 11 } }
);
sessionDB.getCollection("pizzerias").updateMany(
{ "varieties.name": "Thick Crust" },
{ $set: { "varieties.$[v].basePrice": 11 } },
{ arrayFilters: [{ "v.name": "Thick Crust" }] }
);
session.commitTransaction();
SQL databases avoid such multiple updates because they're designed for direct end-user access, sometimes bypassing the application layer. Without applying normal forms to break dependencies into multiple tables, there's a risk of overlooking replicated data. A document database is updated by an application service responsible for maintaining consistency.
While normalizing to 2NF is possible, it may not always be the best choice in a domain-driven design. Keeping the price embedded in each pizzeria allows asynchronous updates and supports future requirements where some pizzerias may offer different prices—without breaking integrity, as the application enforces updates atomically.
In practice, many applications accept this controlled duplication when price changes are infrequent and prefer fast single-document reads over perfectly normalized writes.
3NF: Manager's Contacts
When we started, each pizzeria had a single email contact:
{
name: "A1 Pizza",
manager: "Bob",
email: "bob@a1-pizza.it",
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
]
}
Third Normal Form (3NF) builds on 2NF by requiring that non-key attributes depend only on the primary key, not on other non-key attributes. When a non-key attribute depends on another non-key attribute, we have a transitive dependency.
Here, the email actually belongs to the manager, not the pizzeria directly. This creates a transitive dependency: "pizzeria" → "manager" → "email". Since "email" depends on "manager" (a non-key attribute) rather than directly on the pizzeria, this violates 3NF.
We can normalize this by grouping the manager's attributes into an embedded subdocument:
{
name: "A1 Pizza",
manager: { name: "Bob", email: "bob@a1-pizza.it" },
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
]
}
Now the email is clearly an attribute of the manager entity embedded within the pizzeria. If a pizzeria has multiple managers, we can simply use an array of subdocuments without creating new collections or changing index definitions.
A generic relational model would probably split this into multiple tables, with manager being a foreign key to a "contacts" table. However, in our business domain, we don't manage contacts outside of pizzerias. Even if the same person manages multiple pizzerias, they're recorded as separate manager entries. Bob may have multiple emails and use different ones for each of his pizzerias.
4NF: Delivery Areas
We want to record the areas where a pizzeria can deliver its pizza varieties:
{
name: "A1 Pizza",
manager: { name: "Bob", email: "bob@a1-pizza.it" },
offerings: [
{ variety: { name: "Thick Crust", basePrice: 10 }, area: "Springfield" },
{ variety: { name: "Thick Crust", basePrice: 10 }, area: "Shelbyville" }
]
}
Fourth Normal Form (4NF) addresses multi-valued dependencies. A multi-valued dependency exists when one attribute determines a set of values for another attribute, independent of all other attributes. 4NF requires that a relation have no non-trivial multi-valued dependencies except on superkeys.
If varieties and areas were dependent—for example, if certain varieties were only available in certain areas—then storing ("variety", "area") combinations would represent a single multi-valued fact, and there would be no 4NF violation.
However, since our pizzerias deliver all varieties to all areas, these are independent multi-valued dependencies: "pizzeria" →→ "variety" and "pizzeria →→ area". Storing all combinations creates redundancy—if we add a new area, we must add entries for every variety.
We normalize by storing each independent fact in a separate array:
{
name: "A1 Pizza",
manager: { name: "Bob", email: "bob@a1-pizza.it" },
varieties: [
{ name: "Thick Crust", basePrice: 10 },
{ name: "Stuffed Crust", basePrice: 12 }
],
deliveryAreas: ["Springfield", "Shelbyville"]
}
With this schema, we avoid violating 4NF because delivery areas and varieties are stored independently—even though the document model allows us to embed them together.
BCNF: Per-Area Pricing
Our network grows further. Some pizzerias now charge different prices depending on the delivery area—distant areas cost more:
{
name: "A1 Pizza",
manager: { name: "Bob", email: "bob@a1-pizza.it" },
offerings: [
{ variety: "Thick Crust", area: "Springfield", price: 10 },
{ variety: "Thick Crust", area: "Shelbyville", price: 11 },
{ variety: "Stuffed Crust", area: "Springfield", price: 12 },
{ variety: "Stuffed Crust", area: "Shelbyville", price: 13 }
]
}
The composite key for each offering is ("pizzeria", "variety", "area"). The price depends on the full key, satisfying 2NF and 3NF.
Now our franchise assigns an area manager to each area—one manager per area, regardless of pizzeria. We add it to our offerings:
offerings: [
{ variety: "Thick Crust", area: "Springfield", price: 10, areaManager: "Alice" },
{ variety: "Stuffed Crust", area: "Springfield", price: 12, areaManager: "Alice" },
{ variety: "Thick Crust", area: "Shelbyville", price: 11, areaManager: "Eve" },
{ variety: "Stuffed Crust", area: "Shelbyville", price: 13, areaManager: "Eve" }
]
Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF. It requires that for every non-trivial functional dependency X → Y, the determinant X must be a superkey. Unlike 3NF, BCNF doesn't make an exception for dependencies where the dependent attribute is part of a candidate key.
This model passes 3NF but fails BCNF: the dependency "area" → "areaManager" has a determinant ("area") that is not a superkey of the offerings relation. The area alone doesn't uniquely identify an offering—you need the full ("pizzeria", "variety", "area") key for that.
The practical problem: if Alice is replaced by Carol for Springfield, we must update every offering for that area across every pizzeria. The relational solution is to extract area managers to a separate table.
In MongoDB, we can keep the embedded structure and handle updates explicitly:
db.pizzerias.updateMany(
{ "offerings.area": "Springfield" },
{ $set: { "offerings.$[o].areaManager": "Carol" } },
{ arrayFilters: [{ "o.area": "Springfield" }] }
)
This trades strict BCNF compliance for simpler queries and faster reads. The application ensures consistency during updates.
5NF: Adding Pizza Sizes
We now offer multiple sizes (Small, Medium, Large). Sizes, varieties, and delivery areas are all independent—any combination is valid.
Storing every combination explodes quickly:
offerings: [
{ variety: "Thick Crust", size: "Large", area: "Springfield" },
{ variety: "Thick Crust", size: "Large", area: "Shelbyville" },
{ variety: "Thick Crust", size: "Medium", area: "Springfield" },
// ... 150 entries for 5 varieties × 3 sizes × 10 areas
]
Fifth Normal Form (5NF), also called Project-Join Normal Form, addresses join dependencies. A relation is in 5NF if it cannot be decomposed into smaller relations that, when joined, reconstruct the original—without losing information or introducing spurious tuples.
When valid combinations can be reconstructed from independent sets (the Cartesian product of varieties, sizes, and areas), storing all combinations explicitly creates redundancy and risks inconsistency. This violates 5NF.
The fix stores each independent fact separately:
{
name: "A1 Pizza",
varieties: ["Thick Crust", "Stuffed Crust"],
sizes: ["Large", "Medium"],
deliveryAreas: ["Springfield", "Shelbyville"]
}
Adding a new size requires updating one array—not hundreds of entries. The application or query logic reconstructs valid combinations when needed.
6NF: Tracking Price History
Our finance team needs to track price changes over time. We could embed the history:
offerings: [
{
variety: "Thick Crust",
area: "Springfield",
currentPrice: 12,
priceHistory: [
{ price: 10, effectiveDate: ISODate("2024-01-01") },
{ price: 11, effectiveDate: ISODate("2024-03-15") },
{ price: 12, effectiveDate: ISODate("2024-06-01") }
]
}
]
This works for moderate history but grows unboundedly over time.
Sixth Normal Form (6NF) decomposes relations so that each stores a single non-key attribute along with its time dimension. Every row represents one fact at one point in time:
// price_history collection
{ pizzeria: "A1 Pizza", variety: "Thick Crust", area: "Springfield", price: 10, effectiveDate: ISODate("2024-01-01") }
{ pizzeria: "A1 Pizza", variety: "Thick Crust", area: "Springfield", price: 11, effectiveDate: ISODate("2024-03-15") }
{ pizzeria: "A1 Pizza", variety: "Thick Crust", area: "Springfield", price: 12, effectiveDate: ISODate("2024-06-01") }
6NF is rarely used for operational data because it requires extensive joins for common queries. However, for auditing, analytics, and temporal queries—where you need to answer "what was the price on March 10th?"—it provides a clean model for tracking changes over time.
Summary
Normal forms are not a relic of relational theory. They describe fundamental data dependencies present in any system, regardless of storage technology. MongoDB’s document model does not remove the need to consider normalization. Instead, it lets you decide where, when, and how strictly to apply it, based on domain boundaries and access patterns.
In relational/SQL databases, schemas are usually designed as enterprise-wide information models. Many applications and users share the same database, accepting ad hoc SQL. To avoid update, insertion, and deletion anomalies in this shared environment, the schema must enforce functional dependencies, making higher normal forms essential. Because the database is the system of record, normalization centralizes integrity rules in the data model.
Modern architectures, by contrast, often follow Domain-Driven Design (DDD). Each bounded context owns its data model, which evolves with the application. With CQRS and microservices, each aggregate is updated only through a single application service that encapsulates business rules. Here, the database is not a shared integration point but a private persistence detail of the service.
MongoDB fits this style well:
- Documents model aggregates as they exist in the domain
- Arrays capture bounded one-to-many relationships
- Denormalization and controlled duplication improve read performance and scalability
- Consistency is enforced by application logic, not global database constraints
Because one service owns all updates, violating higher normal forms can be acceptable—and sometimes beneficial—provided the service preserves its invariants. Normalization becomes a design tool, not a rigid checklist.
In short:
- Use relational normalization when the database is a shared, queryable system of record accessed by many applications and users via SQL.
- Use document modeling with selective denormalization when building domain-aligned services with clear ownership, CQRS, and microservices.
Normal forms still matter—but in MongoDB, they guide your choices instead of dictating your schema.
by Franck Pachot