a curated list of database news from authoritative sources

March 18, 2025

March 17, 2025

Vibe data engineering

You're probably sick of hearing people talk about "vibe coding", but what about "vibe data engineering?" ;)

Normalization and Relational Division in SQL and MongoDB

MongoDB promotes document data modeling rather than the highest normal forms of a relational model. To understand this approach, it's interesting to examine the reasons why E. F. Codd introduced normalization, as he describes in his article NORMALIZED DATA BASE STRUCTURE: A BRIEF TUTORIAL (1971), and compare with modern applications design:

make formatted data bases readily accessible to users (especially casual users) who have little or no training in programming

casual user at a terminal often has occasion to require tables to be printed out or displayed

Normalizing data structures into a set of two-dimensional tables was originally intended to simplify these structures for non-programmers and interactive users, a role taken by data scientists today. They accessed databases through early computer devices (TTY on CRT terminals and dot matrix printers).
This approach contrasts with the modern graphical user interface experience, where users can navigate and unfold nested structures using a mouse click.

Moreover, when the database client is a program rather than a human, it interacts with object-oriented structures that form a graph of interconnected objects instead of independent tables. In both cases, a document model, whether normalized or not, can be a better fit than the set of tables described by Codd (as long as data consistency and schema validation is maintained, of course).

It's also important to note that Codd's description of normalization focuses on the logical structure and the API, not the physical storage:

remember we are not discussing the physical representation

Data model to process data

Some critiques of the document model proposed by MongoDB sometimes mention Codd's relational theory as if the document model were heretical to established norms. It's important to clarify that physically storing nested structures doesn't violate the principles of normalized logical structures.
For modern applications, a document model API can be the best fit, just as highly normalized structures were once ideal for supporting casual users' ad-hoc queries. Both models serve different needs, and understanding this flexibility is crucial for leveraging the right approach in modern database applications.

Representing data as a set of independent tables makes it easy to display, but how do we efficiently process it, given the absence of navigable pointers and procedural languages?
To address this challenge, Codd introduced relational algebra, with powerful set-based operations such as projection (ρ), selection(σ), join (⋈), union (∪), intersection (∩), difference (−), product (×), and division (÷):

Now, it may be contended that tables are inadequate data structures to support facile manipulation of relationships between and within the tables. Application of elementary operations on relations (operations such as projection, join, and division) shows that this contention is false.

SQL is widely recognized as the declarative, English-based language that implements the operations defined by Codd's relational algebra, designed primarily for human users to interact with databases. However, the same operations can be achieved using MongoDB's aggregation pipelines on documents.

Notably, one operation, relational division — cited as essential for efficient data processing on normalized tables — can be quite complex to achieve in SQL databases but is more straightforward in MongoDB.

Relational Division

Relational division is essential for identifying relationships between tables based on subsets instead of individual rows. It finds records linked to every item in a specific subset. Practical business examples include:

  • Finding employees who possess all required certifications.
  • Identifying customers who purchase all items in a bundle.
  • Locating students who have completed all required courses.
  • Determining vendors supplying all contract materials.
  • Discovering authors collaborating with all team members.
  • Selecting athletes participating in all league events.
  • Finding tutors who cover every curriculum topic.
  • Identifying devices compatible with all software platforms.
  • Find the pilots who can fly all planes in the hangar

This operation is vital for scenarios where comprehensive coverage or participation across a set of criteria is necessary, and while complex in SQL, it's more straightforward to implement in MongoDB.

Relational Division in SQL (normalized - 3NF)

The pilots who can fly all planes in the hangar is an example taken from the excellent article Divided We Stand: The SQL of Relational Division by Joe Celko (Redgate).

In PostgreSQL, the following normalized tables can store pilots and planes entities, as well as the relations for planes in hangar and pilot skills:

CREATE TABLE pilots (
    pilot_id SERIAL PRIMARY KEY,
    pilot_name VARCHAR NOT NULL
);

CREATE TABLE planes (
    plane_id SERIAL PRIMARY KEY,
    plane_name VARCHAR NOT NULL
);

CREATE TABLE pilot_skills (
    pilot_id INT REFERENCES pilots(pilot_id),
    plane_id INT REFERENCES planes(plane_id),
    PRIMARY KEY (pilot_id, plane_id)
);

CREATE TABLE hangar (
    plane_id INT PRIMARY KEY REFERENCES planes(plane_id)
);

I insert the same data as in the Redgate article:

INSERT INTO pilots (pilot_name) VALUES
    ('Celko'),
    ('Higgins'),
    ('Jones'),
    ('Smith'),
    ('Wilson')
;

INSERT INTO planes (plane_name) VALUES
    ('Piper Cub'),
    ('B-1 Bomber'),
    ('B-52 Bomber'),
    ('F-14 Fighter'),
    ('F-17 Fighter')
;

INSERT INTO pilot_skills (pilot_id, plane_id) VALUES
    (1, 1), -- Celko: Piper Cub
    (2, 3), (2, 4), (2, 1), -- Higgins: B-52 Bomber, F-14 Fighter, Piper Cub
    (3, 3), (3, 4), -- Jones: B-52 Bomber, F-14 Fighter
    (4, 2), (4, 3), (4, 4), -- Smith: B-1 Bomber, B-52 Bomber, F-14 Fighter
    (5, 2), (5, 3), (5, 4), (5, 5)  -- Wilson: B-1 Bomber, B-52 Bomber, F-14 Fighter, F-17 Fighter
;

INSERT INTO hangar (plane_id) VALUES
    (2), -- B-1 Bomber
    (3), -- B-52 Bomber
    (4)  -- F-14 Fighter
;

To find pilots who can fly all planes in the hangar, use a double negation: identify all pilots for whom no plane in the hangar exists that they cannot fly. This corresponds to a double NOT EXISTS in SQL:

postgres=> SELECT p.pilot_id, p.pilot_name
FROM pilots p
WHERE NOT EXISTS (
    SELECT *
    FROM hangar h
    WHERE NOT EXISTS (
        SELECT *
        FROM pilot_skills ps
        WHERE ps.pilot_id = p.pilot_id AND ps.plane_id = h.plane_id
    )
);

 pilot_id | pilot_name 
----------+------------
        4 | Smith
        5 | Wilson
(2 rows)

If you find it more readable, the second negation can be declared with EXCEPT or MINUS instead of NOT EXISTS:

postgres=> SELECT p.pilot_id, p.pilot_name
FROM pilots p
WHERE NOT EXISTS (
    SELECT plane_id FROM hangar
    EXCEPT
    SELECT plane_id FROM pilot_skills ps WHERE ps.pilot_id = p.pilot_id
);

 pilot_id | pilot_name
----------+------------
        4 | Smith
        5 | Wilson
(2 rows)

If you feel more like an accountant than a logician, you can bypass SQL's lack of relational division by counting pilot skills and comparing them to the number of planes in the hangar:

postgres=> SELECT p.pilot_id, p.pilot_name
FROM pilots p
JOIN pilot_skills ps ON p.pilot_id = ps.pilot_id
JOIN hangar h ON ps.plane_id = h.plane_id
GROUP BY p.pilot_id, p.pilot_name
HAVING COUNT(DISTINCT h.plane_id) = (SELECT COUNT(*) FROM hangar);

 pilot_id | pilot_name 
----------+------------
        4 | Smith
        5 | Wilson
(2 rows)

In mathematics, relational division may have a remainder. How can you list pilots with additional skills (planes they can fly that are not in the hangar)? How would you identify pilots lacking required skills and specify the missing skills?

E. F. Codd stated that processing normalized tables would not be difficult due to relational operations, but what happens if no SQL database implements them? A document model may be easier.

Relational Division in SQL (unnormalized - 0NF)

Let's violate the first normal form and store the skills as an array (which is a PostgreSQL datatype) embedded with the pilot:

CREATE TABLE pilotsWithSkills (
    pilot_name VARCHAR PRIMARY KEY,
    planes VARCHAR[] -- An array of plane names
);

INSERT INTO pilotsWithSkills (pilot_name, planes)
VALUES
 ('Celko', ARRAY['Piper Cub']),
 ('Higgins', ARRAY['B-52 Bomber', 'F-14 Fighter', 'Piper Cub']),  
 ('Jones', ARRAY['B-52 Bomber', 'F-14 Fighter']),
 ('Smith', ARRAY['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter']),
 ('Wilson', ARRAY['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter', 'F-17 Fighter']);

I can use the <@ operator with a subquery to find the pilots who can fly all planes in the hangar:

SELECT pilot_name
FROM pilotsWithSkills
WHERE array(
   select plane_name from hangar join planes using (plane_id)
) <@ planes
;

With a GIN index on the array, the following execution plan with Bitmap Index Scan and Recheck Cond can optimize the search:

I can also create the list in JSONB instead of ARRAY and use jsonb_agg() instead of array(). But another possibility is to use a document database.

Relational Division in MongoDB

To normalize without the complexity of embedding arrays or JSON in SQL tables, MongoDB can store and index documents natively:

db.pilotSkills.insertMany([
  { pilot_name: 'Celko', planes: ['Piper Cub'] },
  { pilot_name: 'Higgins', planes: ['B-52 Bomber', 'F-14 Fighter', 'Piper Cub'] },
  { pilot_name: 'Jones', planes: ['B-52 Bomber', 'F-14 Fighter'] },
  { pilot_name: 'Smith', planes: ['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter'] },
  { pilot_name: 'Wilson', planes: ['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter', 'F-17 Fighter'] }
]);

db.hangar.insertMany([
  { plane_name: 'B-1 Bomber' },
  { plane_name: 'B-52 Bomber' },
  { plane_name: 'F-14 Fighter' }
]);

db.pilotSkills.createIndex({ planes: 1 });

Here is how I find the pilots who can fly all planes in the hangar:

mdb> db.pilotSkills.find({
      planes: { $all: db.hangar.distinct("plane_name") }
    } , { pilot_name: 1, _id: 0})
;

[ { pilot_name: 'Smith' }, { pilot_name: 'Wilson' } ]

If you look at the execution plan (adding .explain()) you will see that the MongoDB Multi-Plan optimizer has considered using the index for one of each value, in case one is more selective, or the combination of the three with an AND_SORTED

It is possible to do it in one query, and even a more complex query, like adding the extra skills of the pilot (the planes not in the hangar) for a relational division with reminder:

db.pilotSkills.aggregate([
  // Lookup hangar data to include required planes
  {
    $lookup: {
      from: "hangar",
      pipeline: [
        { $group: { _id: null, plane_names: { $addToSet: "$plane_name" } } }
      ],
      as: "hangar_planes"
    }
  },
  // Unwind the hangar_planes array
  {
    $unwind: "$hangar_planes"
  },
  // Match pilots who can fly all planes in the hangar
  {
    $match: {
      $expr: {
        $setIsSubset: ["$hangar_planes.plane_names", "$planes"]
      }
    }
  },
  // Add a field for extra skills beyond the required planes
  {
    $addFields: {
      extraSkills: {
        $filter: {
          input: "$planes",
          as: "plane",
          cond: { $not: { $in: ["$$plane", "$hangar_planes.plane_names"] } }
        }
      }
    }
  },
  // Project only the desired fields
  {
    $project: {
      pilot_name: 1,
      extraSkills: 1,
      _id: 0
    }
  }
]);

[
  { pilot_name: 'Smith', extraSkills: [] },
  { pilot_name: 'Wilson', extraSkills: [ 'F-17 Fighter' ] }
]

With $setIsSubset MongoDB implements relational division with better developer experience than SQL's subqueries with double negations or counting. PostgreSQL JSONB adds operations on documents, but can become more complex, for example for the division with remainder. MongoDB simplifies this through a series of steps in an aggregation pipeline: retrieving planes from the hangar, unwinding into an array to compare to the pilot's array of skills, applying division logic between the two, calculating the remainders, and projecting pilot names with skills.

It may be surprising, but SQL must use complex subqueries, while MongoDB provides a direct method for this relational algebra operator, enabling efficient processing of normalized collections.

Normalization and Relational Division in SQL and MongoDB

MongoDB promotes document data modeling rather than the highest normal forms of a relational model. To understand this approach, it's interesting to examine the reasons why E. F. Codd introduced normalization, as he describes in his article NORMALIZED DATA BASE STRUCTURE: A BRIEF TUTORIAL (1971), and compare with modern applications design:

make formatted data bases readily accessible to users (especially casual users) who have little or no training in programming

casual user at a terminal often has occasion to require tables to be printed out or displayed

Normalizing data structures into a set of two-dimensional tables was originally intended to simplify these structures for non-programmers and interactive users, a role taken by data scientists today. They accessed databases through early computer devices (TTY on CRT terminals and dot matrix printers).
This approach contrasts with the modern graphical user interface experience, where users can navigate and unfold nested structures using a mouse click.

Moreover, when the database client is a program rather than a human, it interacts with object-oriented structures that form a graph of interconnected objects instead of independent tables. In both cases, a document model, whether normalized or not, can be a better fit than the set of tables described by Codd (as long as data consistency and schema validation is maintained, of course).

It's also important to note that Codd's description of normalization focuses on the logical structure and the API, not the physical storage:

remember we are not discussing the physical representation

Data model to process data

Some critiques of the document model proposed by MongoDB sometimes mention Codd's relational theory as if the document model were heretical to established norms. It's important to clarify that physically storing nested structures doesn't violate the principles of normalized logical structures.
For modern applications, a document model API can be the best fit, just as highly normalized structures were once ideal for supporting casual users' ad-hoc queries. Both models serve different needs, and understanding this flexibility is crucial for leveraging the right approach in modern database applications.

Representing data as a set of independent tables makes it easy to display, but how do we efficiently process it, given the absence of navigable pointers and procedural languages?
To address this challenge, Codd introduced relational algebra, with powerful set-based operations such as projection (ρ), selection(σ), join (⋈), union (∪), intersection (∩), difference (−), product (×), and division (÷):

Now, it may be contended that tables are inadequate data structures to support facile manipulation of relationships between and within the tables. Application of elementary operations on relations (operations such as projection, join, and division) shows that this contention is false.

SQL is widely recognized as the declarative, English-based language that implements the operations defined by Codd's relational algebra, designed primarily for human users to interact with databases. However, the same operations can be achieved using MongoDB's aggregation pipelines on documents.

Notably, one operation, relational division — cited as essential for efficient data processing on normalized tables — can be quite complex to achieve in SQL databases but is more straightforward in MongoDB.

Relational Division

Relational division is essential for identifying relationships between tables based on subsets instead of individual rows. It finds records linked to every item in a specific subset. Practical business examples include:

  • Finding employees who possess all required certifications.
  • Identifying customers who purchase all items in a bundle.
  • Locating students who have completed all required courses.
  • Determining vendors supplying all contract materials.
  • Discovering authors collaborating with all team members.
  • Selecting athletes participating in all league events.
  • Finding tutors who cover every curriculum topic.
  • Identifying devices compatible with all software platforms.
  • Find the pilots who can fly all planes in the hangar

This operation is vital for scenarios where comprehensive coverage or participation across a set of criteria is necessary, and while complex in SQL, it's more straightforward to implement in MongoDB.

Relational Division in SQL (normalized - 3NF)

The pilots who can fly all planes in the hangar is an example taken from the excellent article Divided We Stand: The SQL of Relational Division by Joe Celko (Redgate).

In PostgreSQL, the following normalized tables can store pilots and planes entities, as well as the relations for planes in hangar and pilot skills:

CREATE TABLE pilots (
    pilot_id SERIAL PRIMARY KEY,
    pilot_name VARCHAR NOT NULL
);

CREATE TABLE planes (
    plane_id SERIAL PRIMARY KEY,
    plane_name VARCHAR NOT NULL
);

CREATE TABLE pilot_skills (
    pilot_id INT REFERENCES pilots(pilot_id),
    plane_id INT REFERENCES planes(plane_id),
    PRIMARY KEY (pilot_id, plane_id)
);

CREATE TABLE hangar (
    plane_id INT PRIMARY KEY REFERENCES planes(plane_id)
);

I insert the same data as in the Redgate article:

INSERT INTO pilots (pilot_name) VALUES
    ('Celko'),
    ('Higgins'),
    ('Jones'),
    ('Smith'),
    ('Wilson')
;

INSERT INTO planes (plane_name) VALUES
    ('Piper Cub'),
    ('B-1 Bomber'),
    ('B-52 Bomber'),
    ('F-14 Fighter'),
    ('F-17 Fighter')
;

INSERT INTO pilot_skills (pilot_id, plane_id) VALUES
    (1, 1), -- Celko: Piper Cub
    (2, 3), (2, 4), (2, 1), -- Higgins: B-52 Bomber, F-14 Fighter, Piper Cub
    (3, 3), (3, 4), -- Jones: B-52 Bomber, F-14 Fighter
    (4, 2), (4, 3), (4, 4), -- Smith: B-1 Bomber, B-52 Bomber, F-14 Fighter
    (5, 2), (5, 3), (5, 4), (5, 5)  -- Wilson: B-1 Bomber, B-52 Bomber, F-14 Fighter, F-17 Fighter
;

INSERT INTO hangar (plane_id) VALUES
    (2), -- B-1 Bomber
    (3), -- B-52 Bomber
    (4)  -- F-14 Fighter
;

To find pilots who can fly all planes in the hangar, use a double negation: identify all pilots for whom no plane in the hangar exists that they cannot fly. This corresponds to a double NOT EXISTS in SQL:

postgres=> SELECT p.pilot_id, p.pilot_name
FROM pilots p
WHERE NOT EXISTS (
    SELECT *
    FROM hangar h
    WHERE NOT EXISTS (
        SELECT *
        FROM pilot_skills ps
        WHERE ps.pilot_id = p.pilot_id AND ps.plane_id = h.plane_id
    )
);

 pilot_id | pilot_name 
----------+------------
        4 | Smith
        5 | Wilson
(2 rows)

If you find it more readable, the second negation can be declared with EXCEPT or MINUS instead of NOT EXISTS:

postgres=> SELECT p.pilot_id, p.pilot_name
FROM pilots p
WHERE NOT EXISTS (
    SELECT plane_id FROM hangar
    EXCEPT
    SELECT plane_id FROM pilot_skills ps WHERE ps.pilot_id = p.pilot_id
);

 pilot_id | pilot_name
----------+------------
        4 | Smith
        5 | Wilson
(2 rows)

If you feel more like an accountant than a logician, you can bypass SQL's lack of relational division by counting pilot skills and comparing them to the number of planes in the hangar:

postgres=> SELECT p.pilot_id, p.pilot_name
FROM pilots p
JOIN pilot_skills ps ON p.pilot_id = ps.pilot_id
JOIN hangar h ON ps.plane_id = h.plane_id
GROUP BY p.pilot_id, p.pilot_name
HAVING COUNT(DISTINCT h.plane_id) = (SELECT COUNT(*) FROM hangar);

 pilot_id | pilot_name 
----------+------------
        4 | Smith
        5 | Wilson
(2 rows)

In mathematics, relational division may have a remainder. How can you list pilots with additional skills (planes they can fly that are not in the hangar)? How would you identify pilots lacking required skills and specify the missing skills?

E. F. Codd stated that processing normalized tables would not be difficult due to relational operations, but what happens if no SQL database implements them? A document model may be easier.

Relational Division in SQL (unnormalized - 0NF)

Let's violate the first normal form and store the skills as an array (which is a PostgreSQL datatype) embedded with the pilot:

CREATE TABLE pilotsWithSkills (
    pilot_name VARCHAR PRIMARY KEY,
    planes VARCHAR[] -- An array of plane names
);

INSERT INTO pilotsWithSkills (pilot_name, planes)
VALUES
 ('Celko', ARRAY['Piper Cub']),
 ('Higgins', ARRAY['B-52 Bomber', 'F-14 Fighter', 'Piper Cub']),  
 ('Jones', ARRAY['B-52 Bomber', 'F-14 Fighter']),
 ('Smith', ARRAY['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter']),
 ('Wilson', ARRAY['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter', 'F-17 Fighter']);

I can use the <@ operator with a subquery to find the pilots who can fly all planes in the hangar:

SELECT pilot_name
FROM pilotsWithSkills
WHERE array(
   select plane_name from hangar join planes using (plane_id)
) <@ planes
;

With a GIN index on the array, the following execution plan with Bitmap Index Scan and Recheck Cond can optimize the search:

I can also create the list in JSONB instead of ARRAY and use jsonb_agg() instead of array(). But another possibility is to use a document database.

Relational Division in MongoDB

To normalize without the complexity of embedding arrays or JSON in SQL tables, MongoDB can store and index documents natively:

db.pilotSkills.insertMany([
  { pilot_name: 'Celko', planes: ['Piper Cub'] },
  { pilot_name: 'Higgins', planes: ['B-52 Bomber', 'F-14 Fighter', 'Piper Cub'] },
  { pilot_name: 'Jones', planes: ['B-52 Bomber', 'F-14 Fighter'] },
  { pilot_name: 'Smith', planes: ['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter'] },
  { pilot_name: 'Wilson', planes: ['B-1 Bomber', 'B-52 Bomber', 'F-14 Fighter', 'F-17 Fighter'] }
]);

db.hangar.insertMany([
  { plane_name: 'B-1 Bomber' },
  { plane_name: 'B-52 Bomber' },
  { plane_name: 'F-14 Fighter' }
]);

db.pilotSkills.createIndex({ planes: 1 });

Here is how I find the pilots who can fly all planes in the hangar:

mdb> db.pilotSkills.find({
      planes: { $all: db.hangar.distinct("plane_name") }
    } , { pilot_name: 1, _id: 0})
;

[ { pilot_name: 'Smith' }, { pilot_name: 'Wilson' } ]

If you look at the execution plan (adding .explain()) you will see that the MongoDB Multi-Plan optimizer has considered using the index for one of each value, in case one is more selective, or the combination of the three with an AND_SORTED

It is possible to do it in one query, and even a more complex query, like adding the extra skills of the pilot (the planes not in the hangar) for a relational division with reminder:

db.pilotSkills.aggregate([
  // Lookup hangar data to include required planes
  {
    $lookup: {
      from: "hangar",
      pipeline: [
        { $group: { _id: null, plane_names: { $addToSet: "$plane_name" } } }
      ],
      as: "hangar_planes"
    }
  },
  // Unwind the hangar_planes array
  {
    $unwind: "$hangar_planes"
  },
  // Match pilots who can fly all planes in the hangar
  {
    $match: {
      $expr: {
        $setIsSubset: ["$hangar_planes.plane_names", "$planes"]
      }
    }
  },
  // Add a field for extra skills beyond the required planes
  {
    $addFields: {
      extraSkills: {
        $filter: {
          input: "$planes",
          as: "plane",
          cond: { $not: { $in: ["$$plane", "$hangar_planes.plane_names"] } }
        }
      }
    }
  },
  // Project only the desired fields
  {
    $project: {
      pilot_name: 1,
      extraSkills: 1,
      _id: 0
    }
  }
]);

[
  { pilot_name: 'Smith', extraSkills: [] },
  { pilot_name: 'Wilson', extraSkills: [ 'F-17 Fighter' ] }
]

With $setIsSubset MongoDB implements relational division with better developer experience than SQL's subqueries with double negations or counting. PostgreSQL JSONB adds operations on documents, but can become more complex, for example for the division with remainder. MongoDB simplifies this through a series of steps in an aggregation pipeline: retrieving planes from the hangar, unwinding into an array to compare to the pilot's array of skills, applying division logic between the two, calculating the remainders, and projecting pilot names with skills.

It may be surprising, but SQL must use complex subqueries, while MongoDB provides a direct method for this relational algebra operator, enabling efficient processing of normalized collections.

Percona Server for MySQL 8.4.2 vs 8.0.40: Comparison of Variables and Keywords

In this blog, we will look at the differences between LTS (Long Term Stable) versions of Percona Server for MySQL. Released in April 2019, MySQL 8.0 represented a major change from the previous version, 5.7, introducing significant changes to the data dictionary and enabling many features and enhancements. It also provided no direct downgrade path, […]

Local first.

Run Tinybird on your machine. Deploy your data project locally or to the cloud in the exact same way, as it should be.

March 16, 2025

At what level of concurrency do MySQL 5.7 and 8.0 become faster than 5.6?

Are MySQL 5.7 and 8.0 faster than 5.6? That depends a lot on the workload -- both types of SQL and amount of concurrency. Here I summarize results from sysbench on a larger server (48 cores) using 1, 4, 6, 8, 10, 20 and 40 clients to show how things change.

tl;dr

  • the workload here is microbenchmarks with a database cached by InnoDB
  • 5.7.44 is faster than 8.0.x at all concurrency levels on most microbenchmarks
  • for 5.6.51 vs 8.0.x
    • for point queries, 5.6.51 is faster at <= 8 clients
    • for range queries without aggregation 5.6.51 is always faster
    • for range queries with aggregation 5.6.51 is faster except at 40 clients
    • for writes, 5.6.51 is almost always faster at 10 or fewer clients (excluding update-index)
Performance summaries

For point queries:
  • 5.7.44 is always faster than 8.0
  • 8.0.28 suffers from bug 102037 - found by me with sysbench, fixed in 8.0.31
  • at what level of concurrency do most things get faster in 5.7 & 8.0 vs 5.6?
    • 5.7.44 becomes faster than 5.6.51 at 6+ clients
    • 8.0.x becomes faster than 5.6.51 at between 10 and 20 clients
    • Two of the microbenchmarks are always faster in 5.6.51 - both do non-covering queries on a secondary index
For range queries without aggregation
  • 5.7.44 is always faster than 8.0x
  • 5.6.51 is always faster than 5.7.44 and 8.0.x
For range queries with aggregation
  • 5.7.44 is almost always faster than 8.0.x
  • 5.7.44 becomes faster than 5.6.51 at 6+ clients
  • 8.0.x becomes faster than 5.6.51 at 40 clients
For writes
  • For update-index
    • 5.7.44 and 8.0.x are always faster than 5.6.51 at 4+ clients
    • There is an odd drop from ~6X to ~3X for 8.0.32 and 8.0.39 at 20 clients
  • 5.7.44 is mostly faster than 8.0.x for 1 to 20 clients and they have similar results at 40 clients
  • 5.7.44 & 8.0.x are always faster than 5.6.51 at 20+ clients
Builds, configuration and hardware

I compiled MySQL from source for versions 5.6.51, 5.7.44, 8.0.28, 8.0.32, 8.0.39 and 8.0.41.

The server is an ax162-s from Hetzner with 48 cores (AMD EPYC 9454P), 128G RAM and AMD SMT disabled. It uses Ubuntu 22.04 and storage is ext4 with SW RAID 1 over 2 locally attached NVMe devices. More details on it are here. At list prices a similar server from Google Cloud costs 10X more than from Hetzner.

The configuration files are named my.cnf.cz11a_c32r128 and here for 5.6.51, 5.7.44, 8.0.28, 8.0.32, 8.0.39 and 8.0.41.

Benchmark

I used sysbench and my usage is explained here. To save time I only run 27 of the 42 microbenchmarks and most test only 1 type of SQL statement. Benchmarks are run with the database cached by InnoDB.

The tests run with 8 tables and 10M rows/table. There are 40 client threads, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds.

The command lines to run all tests are:
bash r.sh 8 10000000 180 300 md2 1 1 1
bash r.sh 8 10000000 180 300 md2 1 1 4
bash r.sh 8 10000000 180 300 md2 1 1 6
bash r.sh 8 10000000 180 300 md2 1 1 8
bash r.sh 8 10000000 180 300 md2 1 1 10
bash r.sh 8 10000000 180 300 md2 1 1 20
bash r.sh 8 10000000 180 300 md2 1 1 40

Results

For the results below I split the microbenchmarks into 4 groups: point queries, range queries without aggregation, range queries with queries, writes. The spreadsheet with all data is here. Files with performance summaries for relative and absolute QPS are hereValues from iostat and vmstat per microbenchmark are here for 1 client, 4 clients, 6 clients, 8 clients, 10 clients, 20 clients and 40 clients. These help to explain why something is faster or slower because it shows how much HW is used per query.

The relative QPS is the following where $version is >= 5.7.44.
(QPS for $version) / (QPS for MySQL 5.6.51)
The numbers in the spreadsheets are the relative QPS. When the relative QPS is > 1 then $version is faster than MySQL 5.6.51. When it is 3.0 then $version is 3X faster than the base case.

Results: charts 

Notes on the charts

  • the y-axis shows the relative QPS
  • the y-axis starts at 0.80 to make it easier to see differences
  • in some cases the y-axis truncates the good outliers, cases where the relative QPS is greater than 1.5. I do this to improve readability for values near 1.0. Regardless, the improvements are nice.
Results: point queries

Summary
  • 5.7.44 is always faster than 8.0
  • 8.0.28 suffers from bug 102037 - found by me with sysbench, fixed in 8.0.31
  • at what level of concurrency do most things get faster in 5.7 & 8.0 vs 5.6?
    • 5.7.44 becomes faster than 5.6.51 at 6+ clients
    • 8.0.x becomes faster than 5.6.51 at between 10 and 20 clients
    • Two of the microbenchmarks are always faster in 5.6.51 - both do non-covering queries on a secondary index
Results: range queries without aggregation

Summary
  • 5.7.44 is always faster than 8.0x
  • 5.6.51 is always faster than 5.7.44 and 8.0.x
Results: range queries with aggregation

Summary
  • 5.7.44 is almost always faster than 8.0.x
  • 5.7.44 becomes faster than 5.6.51 at 6+ clients
  • 8.0.x becomes faster than 5.6.51 at 40 clients
Results: writes

The relative speedup for the update-index microbenchmark is frequently so large that it obscures the smaller changes on other microbenchmarks. So here I truncate the y-axis for some of the charts (for 6+ clients) and the section that follows has the charts without truncation.

Summary
  • For update-index
    • 5.7.44 and 8.0.x are always faster than 5.6.51 at 4+ clients
    • There is an odd drop from ~6X to ~3X for 8.0.32 and 8.0.39 at 20 clients but you can't see that on the charts in this section because of the truncation. It is visible in the next section. From vmstat I see an increase in CPU/operation (cpu/o) and context switches /operation (cs/o) at 20 clients but not at 40 clients.
  • 5.7.44 is mostly faster than 8.0.x for 1 to 20 clients and they have similar results at 40 clients
  • 5.7.44 & 8.0.x are always faster than 5.6.51 at 20+ clients
Results: charts for writes without truncation

The y-axis is truncated the the charts for writes in the previous section for 6+ clients. This section has those charts without truncation.

March 15, 2025

Postgres 17.4 vs sysbench on a large server, revisited

I recently shared results for Postgres vs sysbench on a large server. The results were mostly boring (it is rare for me to spot regressions in Postgres) but there was one microbenchmark where there was a problem. The problem microbenchmark does a range scan with aggregation and the alleged regression arrived in Postgres 11. With advice from Postgres experts it looked like the problem was an intermittent change in the query plan.

In this post I explain additional tests that I did and in this case the alleged regression was still there, but like many things in DBMS-land it depends, there is nuance. For now I assume the problem is from a change in the query plan and I will run more tests with more instrumentation to investigate that. Here the alleged regression might be ~5% and only at the highest concurrency level (40. clients).

Postgres the DBMS and community are not fans of query plan hints and the sysbench tests that I use don't add hints for queries. Query plan hints are possible in Postgres via the pg_hint_plan extension.  Query plan hints have been good to me with web-scale MySQL. For some of the web-scale workloads that I support the SQL and schema doesn't change much and query plan hints have two benefits -- plan stability and CPU reduction. By CPU reduction I mean that the CPU overhead from the optimizer is reduced because it has less work to do.

tl;dr

  • There might be a regression for some range queries, but it is small here (~5%) and only occurs at the highest concurrency level (40 clients). I assume this is from a change in the query plan.
  • I have yet to explain the alleged regression
  • I like query plan hints

The problem microbenchmark

The microbenchmarks in question do a range scan with aggregation and have a name like read-only_range=$X where X has been 10, 100 and 1000. The value of X is the length of the range scan. These are run via oltp_read_only.lua and uses these SQL statements.

Build, configuration, hardware, benchmark

These are described in my previous post. But a few things have changed for this report

  • I only tested Postgres versions 10.23, 11.0 and 11.10
  • I repeated the benchmark for 1, 10, 20 and 40 client threads. Previously I only ran it for 40.
  • I ran the read-only_range=$X microbenchmark for X in 10, 100, 1000, 2000, 4000, 8000, 16000 and 32000. Previously I ran it for X in 10, 100, 10000.
What didn't change is that the tests are run with 8 tables and 10M rows/table, read-heavy microbenchmarks run for 180 seconds and write-heavy run for 300 seconds.

Results: overview

For the results below I split the microbenchmarks into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, one group has queries without aggregation and the other has queries with aggregation. The spreadsheet with all data and charts is here. It has a tab for 1, 10, 20 and 40 clients (named dop=$X for X in 1, 10, 20 and 40).

Files with performance summaries are here. These include summaries of results from vmstat and iostat for each microbenchmark which are here for 1 client, 10 clients, 20 clients and 40 clients.

The relative QPS is the following where $version is either 11.0 or 11.10.
(QPS for $version) / (QPS for Postgres 10.23)
The numbers in the spreadsheets are the relative QPS. When the relative QPS is > 1 then $version is faster than Postgres 10.23.  When it is 3.0 then $version is 3X faster than the base case.

Notes on the charts

  • the y-axis shows the relative QPS
  • the y-axis starts at 0.90 to make it easier to see differences
  • there are 4 charts per section, one for each of 1, 10, 20 and 40 clients
Results: point queries

Summary
  • Postgres 11.x is always faster than 10.23 and usually about 3% faster
Results: range queries without aggregation

Summary
  • There are no regressions
  • Postgres 11.0 & 11.10 get up to 11% more QPS than 10.23 for the range-covered and range-notcovered microbenchmarks
  • For the scan microbenchmark QPS is mostly unchanged between Postgres 10.23, 11.0 and 11.10 but in one cases Postgres 11 was slightly slower (relative QPS for 11.0 was 0.99 at 20 clients). 
Results: range queries with aggregation

I repeated the read-only microbenchmark using range queries of length 10, 100, 1000, 2000, 4000, 8000, 16000 and 32000.

Summary:
  • In general, the advantage for Postgres 11.0 & 11.10 vs 10.23 was largest for the longest range scans (16000 & 32000) and next largest for the shortest range scans (10 & 100).
  • The comparison for range scans of length 1000, 2000, 4000 and 8000 was interesting. Here the benefit for Postgres 11.0 & 11.10 was not as large and in one case (range=8000 at 40 clients) there was a small regression (~5%). Perhaps there is a change in the query plan.
Results: writes

Summary:
  • Postgres 11.0 & 11.10 were almost always faster than 10.23 and up to 1.75X faster
  • In one case (update-one microbenchmark at 20 clients) Postgres 11.0 & 11.10 were ~5% slower than 10.23. And this is odd because 11.0 and 11.10 were ~1.7X faster at 40 clients on the same microbenchmark. I can only wave my hands for this one. But don't think this is a regression.
    • The update-one microbenchmark is run by oltp_update_non_index.lua (the name means it updates non-indexed columns), the SQL for the update is here and the schema is here.
    • From vmstat and iostat metrics for 20 clients and for 40 clients and looking at the CPU /operation (cpu/o) and context switches /operation (cs/o)
      • For 20 clients these are slightly larger for 11.0 & 11.10 vs 10.23
      • For 40 clients these are significantly small for 11.0 & 11.10 vs 10.23
    • The microbenchmarks that aren't read-only are run for 600 seconds each and it is possible for performance variance to come from write debt (writeback, vacuum, etc) inherited from microbenchmarks that preceded update-one. The order in which the update microbenchmarks are run is here and is update-inlist, update-index, update-nonindex, update-one, update-zipf. From the dop=20 tab on the spreadsheet, throughput is 1.1X to 1.3X larger for the 3 update microbenchmarks that precede update-one so inherited write debt might explain this results.

March 14, 2025

Enterprise Readiness with Percona Monitoring and Management: A Look at PMM 3.0.0

Database disasters come with a hefty price tag. According to Information Technology Intelligence Consulting’s 2024 Hourly Cost of Downtime Report, 41% of enterprises face downtime costs ranging from $1 million to over $5 million per hour of outage. How can enterprises ensure their critical database infrastructure remains healthy and performant? This question keeps IT leaders […]

March 13, 2025

MongoDB Multi-Planner Optimizer and Plan Cache

Database systems utilize query planners to optimize data retrieval, primarily through two methods: Rule-Based Optimizers (RBO) and Cost-Based Optimizers (CBO).

  • Rule-Based Optimizer (RBO): This method employs predefined rules to select execution plans, resulting in stable but simplistic query plans. It often struggle with complex queries involving multiple joins but with a document model, the join ordering is solved upfront.

  • Cost-Based Optimizer (CBO): CBO analyzes data distribution statistics to evaluate potential plans and choose the most cost-effective option. Its reliance on possibly outdated statistics can lead to suboptimal decisions, so gathering statistics is crucial. Planning complex queries can take time, so it either relies on a shared plan cache or switches back to simple genetic algorithms when there are many joins.

MongoDB utilizes a document model that minimizes joins, focusing primarily on selecting the appropriate index. It defers execution decisions until runtime instead of relying on RBO or CBO to pick one. Through a shared plan cache and multi-planner mechanism, MongoDB dynamically evaluates and adjusts query plans for optimal execution.

Let's explain with a simple example and use the most challenging situation for a query planner: column skew, where a field has some unique values and a few popular ones. A business case I encountered was at a coffee capsule vendor. Online customers use their customer ID and order some capsules every three months on average. Shops have a special customer ID and record thousands of orders every day. For a query per customer ID and some other criteria, the index to use must be different, but the application obeys the 'parse one execute many' best practice, and the first who runs determines the execution plan for the others. Tom Kyte told the story of the database being slow when it rains (because the first user to parse the statement depended on a user coming by bike or car).

Initialize a collection

I'll build a simple example, a collection of ten million documents with two fields: a and b:

  • a is very selective for a few documents (with value less than 50) but the millions of remaining documents all have the same value of 50.
  • b is uniform, with values from 1 to 10, and good selectivity: one value returns ten documents.

Here here how I generated this with a loop on i, a is generated with Math.min(i,50) and b is generated with i%1e6:

// Drop the collection if it already exists
db.franck.drop();
// Insert documents with different data distributions
const bulk = db.franck.initializeUnorderedBulkOp();
const num=1e7;
for (let i = 0; i < num; i++) {
    bulk.insert({ a: Math.min(i,50), b: i%1e6 });
}
const r = bulk.execute();

console.log(`Bulk Operation Summary: Inserted: ${r.insertedCount}, Matched: ${r.matchedCount}, Modified: ${r.modifiedCount}, Deleted: ${r.deletedCount}, Upserted: ${r.upsertedCount}`);

This inserted ten million documents:

test> console.log(`Bulk Operation Summary: Inserted: ${r.insertedCount}, Matched: ${r.matchedCount}, Modified: ${r.modifiedCount}, Deleted: ${r.deletedCount}, Upserted: ${r.upsertedCount}`);             
Bulk Operation Summary: Inserted: 10000000, Matched: 0, Modified: 0, Deleted: 0, Upserted: 0

Create Indexes

As the goal is to show two possible execution plans, I create two indexes, one on each field:

test> db.franck.createIndex({ a: 1 });
a_1
test> db.franck.createIndex({ b: 1 });
b_1

Verify Data

test> // "b" has a good selectivity

test> db.franck.countDocuments({ b: 42 });
10

test> // "a" has a better selectivity when a<50

test> db.franck.countDocuments({ a: 42         });
1

test> // "a" has a very bad selectivity when a=50 (popular value)

test> db.franck.countDocuments({ a: 50         });
9999950

My query will have a predicate on each column. I'll test { a: 42, b: 42 } which returns one document, and { a: 50, b: 42 } which returns ten documents.

Execution profile and plan cache

I'll look at the statistics executions with profiling and at the query plan cache, so I reset them in my lab:

db.franck.getPlanCache().clear();
db.franck.getPlanCache().list();
db.setProfilingLevel(0);
db.system.profile.drop();

First execution and plan cache

I set profiling, and I run a first execution that returns one document:

test> db.setProfilingLevel(2 );
{ was: 0, slowms: 0, sampleRate: 1, ok: 1 }
test> db.franck.find({ a: 42, b: 42 });
[ { _id: ObjectId('67d34e6128ca7c6f95a00acb'), a: 42, b: 42 } ]
test> db.setProfilingLevel(0);
{ was: 2, slowms: 0, sampleRate: 1, ok: 1 }

The initial execution of the MongoDB query planner generated all possible plans stored in the cache: one plan executes an IXSCAN on index a, another performs an IXSCAN on index b, and a third combines both indexes with an AND_SORTED operation:

test> db.franck.getPlanCache().list();

[
  {
    version: '1',
    queryHash: '20F294D4',
    planCacheKey: '2AAF6E88',
    isActive: false,
    works: Long('2'),
    timeOfCreation: ISODate('2025-03-13T21:43:31.189Z'),
    createdFromQuery: { query: { a: 42, b: 42 }, sort: {}, projection: {} },
    cachedPlan: {
      stage: 'FETCH',
      filter: { b: { '$eq': 42 } },
      inputStage: {
        stage: 'IXSCAN',
        keyPattern: { a: 1 },
        indexName: 'a_1',
        isMultiKey: false,
        multiKeyPaths: { a: [] },
        isUnique: false,
        isSparse: false,
        isPartial: false,
        indexVersion: 2,
        direction: 'forward',
        indexBounds: { a: [ '[42, 42]' ] }
      }
    },
    creationExecStats: [
      {
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        totalKeysExamined: 1,
        totalDocsExamined: 1,
        executionStages: {
          stage: 'FETCH',
          filter: { b: { '$eq': 42 } },
          nReturned: 1,
          executionTimeMillisEstimate: 0,
          works: 2,
          advanced: 1,
          needTime: 0,
          needYield: 0,
          saveState: 0,
          restoreState: 0,
          isEOF: 1,
          docsExamined: 1,
          alreadyHasObj: 0,
          inputStage: {
            stage: 'IXSCAN',
            nReturned: 1,
            executionTimeMillisEstimate: 0,
            works: 2,
            advanced: 1,
            needTime: 0,
            needYield: 0,
            saveState: 0,
            restoreState: 0,
            isEOF: 1,
            keyPattern: { a: 1 },
            indexName: 'a_1',
            isMultiKey: false,
            multiKeyPaths: { a: [] },
            isUnique: false,
            isSparse: false,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: { a: [Array] },
            keysExamined: 1,
            seeks: 1,
            dupsTested: 0,
            dupsDropped: 0
          }
        }
      },
      {
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        totalKeysExamined: 2,
        totalDocsExamined: 2,
        executionStages: {
          stage: 'FETCH',
          filter: { a: { '$eq': 42 } },
          nReturned: 1,
          executionTimeMillisEstimate: 0,
          works: 2,
          advanced: 1,
          needTime: 1,
          needYield: 0,
          saveState: 0,
          restoreState: 0,
          isEOF: 0,
          docsExamined: 2,
          alreadyHasObj: 0,
          inputStage: {
            stage: 'IXSCAN',
            nReturned: 2,
            executionTimeMillisEstimate: 0,
            works: 2,
            advanced: 2,
            needTime: 0,
            needYield: 0,
            saveState: 0,
            restoreState: 0,
            isEOF: 0,
            keyPattern: { b: 1 },
            indexName: 'b_1',
            isMultiKey: false,
            multiKeyPaths: { b: [] },
            isUnique: false,
            isSparse: false,
            isPartial: false,
            indexVersion: 2,
            direction: 'forward',
            indexBounds: { b: [Array] },
            keysExamined: 2,
            seeks: 1,
            dupsTested: 0,
            dupsDropped: 0
          }
        }
      },
      {
        nReturned: 1,
        executionTimeMillisEstimate: 0,
        totalKeysExamined: 2,
        totalDocsExamined: 1,
        executionStages: {
          stage: 'FETCH',
          filter: { '$and': [ [Object], [Object] ] },
          nReturned: 1,
          executionTimeMillisEstimate: 0,
          works: 2,
          advanced: 1,
          needTime: 1,
          needYield: 0,
          saveState: 0,
          restoreState: 0,
          isEOF: 0,
          docsExamined: 1,
          alreadyHasObj: 0,
          inputStage: {
            stage: 'AND_SORTED',
            nReturned: 1,
            executionTimeMillisEstimate: 0,
            works: 2,
            advanced: 1,
            needTime: 1,
            needYield: 0,
            saveState: 0,
            restoreState: 0,
            isEOF: 0,
            failedAnd_0: 0,
            failedAnd_1: 0,
            inputStages: [ [Object], [Object] ]
          }
        }
      }
    ],
    candidatePlanScores: [ 2.5002, 1.5002, 1.5001 ],
    indexFilterSet: false,
    estimatedSizeBytes: Long('4693'),
    host: '4b49f6ca6400:27017'
  }
]

Those plans were evaluated (creationExecStats) with { a: { '$eq': 42 } and { b: { '$eq': 42 } and scored. The best score goes to the most selective one, the index on a, which examines one key and fetches one document.

Next executions with the same values

I run the same nine more times:

db.setProfilingLevel(2 );
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.franck.find({ a: 42, b: 42 });
db.setProfilingLevel(0);

I extract some interesting information from the profiling of those ten executions:

test> db.system.profile.find({ ns: "test.franck" }).sort({ ts: 1 }).limit(100).forEach((doc) => { if (doc.execStats) console.log(`${doc.ts.toISOString().slice(11, 23)} works: ${doc.execStats.works.toString().padStart(5)} keys: ${doc.keysExamined.toString().padStart(5)} docs: ${doc.docsExamined.toString().padStart(5)} ret: ${doc.nreturned.toString().padStart(5)} ${doc.execStats.stage.padStart(12)} ${doc.planSummary.padStart(12)} exec(plan): ${doc.millis.toString().padStart(5)}ms (${doc.planningTimeMicros.toString().padStart(8)}us) query/plan ${doc.queryHash}/${doc.planCacheKey} ${doc.queryFramework} ${doc.fromPlanCache ? 'fromPlanCache ' : ''}${doc.fromMultiPlanner ? 'fromMultiPlanner ' : ''}${doc.replanned ? doc.replanReason : ''}`); });

21:43:31.191 works:     3 keys:     1 docs:     1 ret:     1        FETCH IXSCAN { a: 1 } exec(plan):     7ms (    7063us) query/plan 20F294D4/2AAF6E88 classic fromMultiPlanner                                                             
21:56:49.938 works:     3 keys:     1 docs:     1 ret:     1        FETCH IXSCAN { a: 1 } exec(plan):     9ms (    8899us) query/plan 20F294D4/2AAF6E88 classic fromMultiPlanner                                                             
21:56:49.982 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     1ms (    1264us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.014 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     233us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.053 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     240us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.085 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     235us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.108 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     254us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.133 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     230us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.157 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     246us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache                                                                
21:56:50.188 works:     2 keys:     1 docs:     1 ret:     1  CACHED_PLAN IXSCAN { a: 1 } exec(plan):     0ms (     246us) query/plan 20F294D4/2AAF6E88 classic fromPlanCache     

In the first two executions, all plans were evaluated using the Multi-Planner. From the third execution onward, only the winning plan from the Plan Cache was executed, resulting in decreased execution time, including planning time. The best index, which was fully executed, is the one on a.

More executions with different values

I execute the same query shape but with a different value, for wich I know that the index on a is not good:

db.setProfilingLevel(2);
db.franck.find({ a: 50, b<... (truncated)
                                    

Using Barman Cloud Utilities for Backups and WAL Archiving in PostgreSQL

In this brief blog post, we will talk about Barman cloud utilities, which greatly ease the process of storing backups on cloud platforms like GCP, AWS, Azure, etc. Backups are of paramount importance, and in PostgreSQL, we also need to retain the WAL files, which can be used for various purposes like incremental backups or […]

IO devices and latency

Take an interactive journey through the history of IO devices, and learn how IO device latency affects performance.

March 12, 2025

HAProxy-Patroni Setup Using Health Check Endpoints and Debugging

Patroni has a REST API that allows HAProxy and other kinds of load balancers to perform HTTP health checks. This blog post explains how HAProxy uses Health check endpoints with Patroni and how to debug the status issue. HAProxy and Patroni setup: Sample configuration: [crayon-67ee89abc7af9510089155/]   [crayon-67ee89abc7b04427155149/]   OPTIONS /primary: This is the primary health […]