database.news | database technology news aggregator

August 18, 2025

Franck Pachot

MongoDB arrays: sort order and comparison

Arrays in MongoDB are ordered lists of values, and querying them can be confusing because a field might be a scalar value in one document and an array in another. Unlike SQL databases, there's no need to choose between one-to-one, one-to-many, or many-to-many relationships once for all future data. Within the limits of schema validation, each document can have a unique schema, allowing fields to be any data type, scalar or array. When indexing or querying a collection, field names are used, and operations like filtering or sorting must work for all documents, regardless of its data type. This flexibility can lead to behavior that may seem strange to developers accustomed to the strict schemas of SQL databases.

Sort and Comparison in SQL databases
MongoDB and Arrays (comparison and sort)
Another example of MongoDB arrays (sort on min/max)
Comparison with PostgreSQL
Sort semantics is close to how indexes work
Conclusion and how to remind it

Sort and Comparison in SQL databases

I've heard users familiar with PostgreSQL being surprised to learn that MongoDB employs a different semantic approach when it comes to sorting and comparing data. For instance, when sorting fruits alphabetically in ascending order, 'Apple' appears before 'Banana' and you also expect 'Apple' > 'Banana' to be true. I was not so surprised because I know other databases and other languages. For example, I create a table of fruits in Oracle Autonomous Database in Zurich:

SQL> CREATE TABLE fruits_de (
  2      icon NVARCHAR2(4),
  3      name NVARCHAR2(50)
  4* );

Table FRUITS_DE erstellt.

SQL> INSERT INTO fruits_de (icon, name) VALUES
  2    (N'🍌', N'Banane'),
  3    (N'🍏', N'Äpfel'),
  4    (N'🍇', N'Traube'),
  5*   (N'🍊', N'Orange');

4 Zeilen eingefügt.

SQL> COMMIT;
Commit abgeschlossen.

A sort shows the apple (Äpfel) first:

SQL> SELECT icon, name 
     FROM fruits_de 
     ORDER BY name
;

ICON    NAME
_______ _________
🍏      Äpfel
🍌      Banane
🍊      Orange
🍇      Traube

If comparison and sort worked the same, I would expect a filter on name > 'Banane' to show only Orange and Traube, but Äpfel is still there:

SQL> SELECT icon, name FROM fruits_de WHERE name >'Banane' order by name ;

ICON    NAME
_______ _________
🍏      Äpfel
🍊      Orange
🍇      Traube

By default, NLS_SORT determines ordering based on language (in German, 'Ä' is treated as 'Ae'), while NLS_COMP, defaulting to binary, dictates comparison (in UTF8, 'Ä' follows all ASCII characters).

In Oracle Database, this difference happens with text, as it depends on the language, and can be configured. With text, MongoDB is consistent by default between comparison and sorting:

db.fruits_de.insertMany([
      { icon: "🍌", name: "Banane" },  { icon: "🍏", name: "Äpfel" },  { icon: "🍇", name: "Traube" },  { icon: "🍊", name: "Orange" }
    ]);

db.fruits_de.find(
  { _id:0 }
).collation({ locale: "de", strength: 1 }).sort({ name: 1 })
      { }  ,
      { _id:0 }
    ).collation({ locale: "de", strength: 1 }).sort({ name: 1 })

[
  { icon: '🍏', name: 'Äpfel' },
  { icon: '🍌', name: 'Banane' },
  { icon: '🍊', name: 'Orange' },
  { icon: '🍇', name: 'Traube' }
]

db.fruits_de.find(
  { name: { $gt: "Banane" } }  ,
  { _id:0 }
).collation({ locale: "de", strength: 1 }).sort({ name: 1 })

[ { icon: '🍊', name: 'Orange' }, { icon: '🍇', name: 'Traube' } ]

MongoDB supports arrays that require specific definitions for ordering and comparison. Is [Cherry, Apple] greater or less than [Banana, Orange]? Which comes first in ascending order: the one that starts with Banana (which is before Cherry), or the one with Apple (which is before both Banana and Orange)?

MongoDB and Arrays (comparison and sort)

I have a collection of fruits, identified by a text name ("txt" field) and an indexed array of characters ("arr" field):

db.fruits.createIndex({txt:1});

db.fruits.createIndex({arr:1});

db.fruits.insertMany([
  { _id: "🍏", txt: "apple",              arr: ["a","p","p","l","e"] },
  { _id: "🍐", txt: "pear",               arr: ["p","e","a","r"] },
  { _id: "🍊", txt: "tangerine",          arr: ["t","a","n","g","e","r","i","n","e"] },
  { _id: "🍋", txt: "lemon",              arr: ["l","e","m","o","n"] },
  { _id: "🍌", txt: "banana",             arr: ["b","a","n","a","n","a"] },
  { _id: "🍉", txt: "watermelon",         arr: ["w","a","t","e","r","m","e","l","o","n"] },
  { _id: "🍇", txt: "grapes",             arr: ["g","r","a","p","e","s"] },
  { _id: "🍓", txt: "strawberry",         arr: ["s","t","r","a","w","b","e","r","r","y"] },
  { _id: "🫐", txt: "blueberries",        arr: ["b","l","u","e","b","e","r","r","i","e","s"] },
  { _id: "🍈", txt: "melon",              arr: ["m","e","l","o","n"] },
  { _id: "🍒", txt: "cherries",           arr: ["c","h","e","r","r","i","e","s"] },
  { _id: "🍑", txt: "peach",              arr: ["p","e","a","c","h"] },
  { _id: "🥭", txt: "mango",              arr: ["m","a","n","g","o"] },
  { _id: "🍍", txt: "pineapple",          arr: ["p","i","n","e","a","p","p","l","e"] },
  { _id: "🥥", txt: "coconut",            arr: ["c","o","c","o","n","u","t"] },
  { _id: "🥝", txt: "kiwi fruit",         arr: ["k","i","w","i"," ","f","r","u","i","t"] }
]);

Here are the documents ordered by their identifier:

mdb> db.fruits.find( ).
              sort( { "_id": 1 } ).
              forEach(doc => {  print(JSON.stringify(doc));  })

{"_id":"🍇","txt":"grapes","arr":["g","r","a","p","e","s"]}
{"_id":"🍈","txt":"melon","arr":["m","e","l","o","n"]}
{"_id":"🍉","txt":"watermelon","arr":["w","a","t","e","r","m","e","l","o","n"]}
{"_id":"🍊","txt":"tangerine","arr":["t","a","n","g","e","r","i","n","e"]}
{"_id":"🍋","txt":"lemon","arr":["l","e","m","o","n"]}
{"_id":"🍌","txt":"banana","arr":["b","a","n","a","n","a"]}
{"_id":"🍍","txt":"pineapple","arr":["p","i","n","e","a","p","p","l","e"]}
{"_id":"🍏","txt":"apple","arr":["a","p","p","l","e"]}
{"_id":"🍐","txt":"pear","arr":["p","e","a","r"]}
{"_id":"🍑","txt":"peach","arr":["p","e","a","c","h"]}
{"_id":"🍒","txt":"cherries","arr":["c","h","e","r","r","i","e","s"]}
{"_id":"🍓","txt":"strawberry","arr":["s","t","r","a","w","b","e","r","r","y"]}
{"_id":"🥝","txt":"kiwi fruit","arr":["k","i","w","i"," ","f","r","u","i","t"]}
{"_id":"🥥","txt":"coconut","arr":["c","o","c","o","n","u","t"]}
{"_id":"🥭","txt":"mango","arr":["m","a","n","g","o"]}
{"_id":"🫐","txt":"blueberries","arr":["b","l","u","e","b","e","r","r","i","e","s"]}

The identifiers are in binary order of their Unicode point:
🍇 U+1F347 < 🍈 U+1F348 < 🍉 U+1F349 < 🍊 U+1F34A < 🍋 U+1F34B < 🍌 U+1F34C < 🍍 U+1F34D < 🍏 U+1F34F < 🍐 U+1F350 < 🍑 U+1F351 < 🍒 U+1F352 < 🍓 U+1F353 < 🥝 U+1F95D < 🥥 U+1F965 < 🥭 U+1F96D < 🫐 U+1FAD0

Here are the documents ordered by their name in text:

mdb> db.fruits.find( ).
              sort( { "txt": 1 } ).
              forEach(doc => {  print(JSON.stringify(doc));  })

{"_id":"🍏","txt":"apple","arr":["a","p","p","l","e"]}
{"_id":"🍌","txt":"banana","arr":["b","a","n","a","n","a"]}
{"_id":"🫐","txt":"blueberries","arr":["b","l","u","e","b","e","r","r","i","e","s"]}
{"_id":"🍒","txt":"cherries","arr":["c","h","e","r","r","i","e
                                    
                                    
                                    
                                        by Franck Pachot


                            
                                
                                    
                                        Franck Pachot
                                    
                                    
                                        MongoDB arrays: sort order and comparison
                                    
                                    
                                    
                                        Arrays in MongoDB are ordered lists of values, and querying them can be confusing because a field might be a scalar value in one document and an array in another. Unlike SQL databases, there's no need to choose between one-to-one, one-to-many, or many-to-many relationships once for all future data. Within the limits of schema validation, each document can have a unique schema, allowing fields to be any data type, scalar or array. When indexing or querying a collection, field names are used, and operations like filtering and sorting must work for all documents, regardless of their data types. This flexibility can lead to behavior that may seem strange to developers accustomed to the strict schemas of SQL databases.

This post covers the following:



Sort and comparison in SQL databases
MongoDB and arrays (comparison and sort)
Another example of MongoDB arrays (sort on min/max)
Comparison with PostgreSQL
Sort semantics is close to how indexes work
Conclusion





  
  
  Sort and comparison in SQL databases


I've heard users familiar with PostgreSQL being surprised to learn that MongoDB employs a different semantic approach when it comes to sorting and comparing data. For instance, when sorting fruits alphabetically in ascending order, 'Apple' appears before 'Banana' and you also expect 'Apple' > 'Banana' to be true. I was not so surprised because I know other databases and other languages. For example, I create a table of fruits in Oracle Autonomous Database in Zurich:




SQL> CREATE TABLE fruits_de (
  2      icon NVARCHAR2(4),
  3      name NVARCHAR2(50)
  4* );

Table FRUITS_DE erstellt.

SQL> INSERT INTO fruits_de (icon, name) VALUES
  2    (N'🍌', N'Banane'),
  3    (N'🍏', N'Äpfel'),
  4    (N'🍇', N'Traube'),
  5*   (N'🍊', N'Orange');

4 Zeilen eingefügt.

SQL> COMMIT;
Commit abgeschlossen.






A sort shows the apple (Äpfel) first:




SQL> SELECT icon, name 
     FROM fruits_de 
     ORDER BY name
;

ICON    NAME
_______ _________
🍏      Äpfel
🍌      Banane
🍊      Orange
🍇      Traube






If comparison and sort worked the same, I would expect a filter on name > 'Banane' to show only Orange and Traube, but Äpfel is still there:




SQL> SELECT icon, name FROM fruits_de WHERE name >'Banane' order by name ;

ICON    NAME
_______ _________
🍏      Äpfel
🍊      Orange
🍇      Traube






By default, NLS_SORT determines ordering based on language (in German, 'Ä' is treated as 'Ae'), while NLS_COMP, defaulting to binary, dictates comparison (in UTF8, 'Ä' follows all ASCII characters).

In Oracle Database, this difference happens with text, as it depends on the language, and can be configured (the default is there for historical reason and better performance). With text, MongoDB is consistent by default between comparison and sorting:




db.fruits_de.insertMany([
      { icon: "🍌", name: "Banane" },  { icon: "🍏", name: "Äpfel" },  { icon: "🍇", name: "Traube" },  { icon: "🍊", name: "Orange" }
    ]);

db.fruits_de.find(
  { _id:0 }
).collation({ locale: "de", strength: 1 }).sort({ name: 1 })
      { }  ,
      { _id:0 }
    ).collation({ locale: "de", strength: 1 }).sort({ name: 1 })

[
  { icon: '🍏', name: 'Äpfel' },
  { icon: '🍌', name: 'Banane' },
  { icon: '🍊', name: 'Orange' },
  { icon: '🍇', name: 'Traube' }
]

db.fruits_de.find(
  { name: { $gt: "Banane" } }  ,
  { _id:0 }
).collation({ locale: "de", strength: 1 }).sort({ name: 1 })

[ { icon: '🍊', name: 'Orange' }, { icon: '🍇', name: 'Traube' } ]







When the field that is sorted or compared is an array, there's not one obvious rule that can be used for sorting and comparing. Is [Cherry, Apple] greater or less than [Banana, Orange]? Which comes first in ascending order: the one that starts with Banana (which is before Cherry), or the one with Apple (which is before both Banana and Orange)?


  
  
  MongoDB and arrays (comparison and sort)


I have a collection of fruits, identified by a text name ("txt" field) and an indexed array of characters ("arr" field):




db.fruits.createIndex({txt:1});

db.fruits.createIndex({arr:1});

db.fruits.insertMany([
  { _id: "🍏", txt: "apple",              arr: ["a","p","p","l","e"] },
  { _id: "🍐", txt: "pear",               arr: ["p","e","a","r"] },
  { _id: "🍊", txt: "tangerine",          arr: ["t","a","n","g","e","r","i","n","e"] },
  { _id: "🍋", txt: "lemon",              arr: ["l","e","m","o","n"] },
  { _id: "🍌", txt: "banana",             arr: ["b","a","n","a","n","a"] },
  { _id: "🍉", txt: "watermelon",         arr: ["w","a","t","e","r","m","e","l","o","n"] },
  { _id: "🍇", txt: "grapes",             arr: ["g","r","a","p","e","s"] },
  { _id: "🍓", txt: "strawberry",         arr: ["s","t","r","a","w","b","e","r","r","y"] },
  { _id: "🫐", txt: "blueberries",        arr: ["b","l","u","e","b","e","r","r","i","e","s"] },
  { _id: "🍈", txt: "melon",              arr: ["m","e","l","o","n"] },
  { _id: "🍒", txt: "cherries",           arr: ["c","h","e","r","r","i","e","s"] },
  { _id: "🍑", txt: "peach",              arr: ["p","e","a","c","h"] },
  { _id: "🥭", txt: "mango",              arr: ["m","a","n","g","o"] },
  { _id: "🍍", txt: "pineapple",          arr: ["p","i","n","e","a","p","p","l","e"] },
  { _id: "🥥", txt: "coconut",            arr: ["c","o","c","o","n","u","t"] },
  { _id: "🥝", txt: "kiwi fruit",         arr: ["k","i","w","i"," ","f","r","u","i","t"] }
]);






Here are the documents ordered by their identifier:




mdb> db.fruits.find( ).
              sort( { "_id": 1 } ).
              forEach(doc => {  print(JSON.stringify(doc));  })

{"_id":"🍇","txt":"grapes","arr":["g","r","a","p","e","s"]}
{"_id":"🍈","txt":"melon","arr":["m","e","l","o","n"]}
{"_id":"🍉","txt":"watermelon","arr":["w","a","t","e","r","m","e","l","o","n"]}
{"_id":"🍊","txt":"tangerine","arr":["t","a","n","g","e","r","i","n","e"]}
{"_id":"🍋","txt":"lemon","arr":["l","e","m","o","n"]}
{"_id":"🍌","txt":"banana","arr":["b","a","n","a","n","a"]}
{"_id":"🍍","txt":"pineapple","arr":["p","i","n","e","a","p","p","l","e"]}
{"_id":"🍏","txt":"apple","arr":["a","p","p","l","e"]}
{"_id":"🍐","txt":"pear","arr":["p","e","a","r"]}
{"_id":"🍑","txt":"peach","arr":["p","e","a","c","h"]}
{"_id":"🍒","txt":"cherries","arr":["c","h","e","r","r","i","e","s"]}
{"_id":"🍓","txt":"strawberry","arr":["s","t","r","a","w","b","e","r","r","y"]}
{"_id":"🥝","txt":"kiwi fruit","arr":["k","i","w","i"," ","f","r","u","i","t"]}
{"_id":"🥥","txt":"coconut","arr":["c","o","c","o","n","u","t"]}
{"_id":"🥭","txt":"mango","arr":["m","a","n","g","o"]}
{"_id":"🫐","txt":"blueberries","arr":["b","l","u","e","b","e","r","r","i","e","s"]}






The identifiers are in binary order of their Unicode point:

🍇 U+1F347 < 🍈 U+1F348 < 🍉 U+1F349 < 🍊 U+1F34A < 🍋 U+1F34B < 🍌 U+1F34C < 🍍 U+1F34D < 🍏 U+1F34F < 🍐 U+1F350 < 🍑 U+1F351 < 🍒 U+1F352 < 🍓 U+1F353 < 🥝 U+1F95D < 🥥 U+1F965 < 🥭 U+1F96D < 🫐 U+1FAD0

Here are the documents ordered by their name in text:




mdb> db.fruits.find( ).
              sort( { "txt": 1 } ).
              forEach(doc => {  print(JSON.stringify(doc));  })

{"_id":"🍏","txt":"apple","arr":["a","p","p","l","e"]}
{"_id":"🍌","txt":"banana","arr":["b","a","n","a","n","a"]}
{"_id":"🫐","txt":"blueberries","arr":["b","l","u","e","b","e","r","r","i","e","s"]}
{"_id":"🍒","txt":"cherries","arr":["c","h","e","r","r","
                                    
                                    
                                    
                                        by Franck Pachot


                    
                        
                            August 16, 2025
                            
                            
                                
                                    
                                        Stack - Convex Blog
                                    
                                    
                                        Efficient COUNT, SUM, MAX with the Aggregate Component
                                    
                                    
                                    
                                        Convex omits built-in aggregates because full-table scans don’t scale; the video shows how @convex-dev/aggregate (B-Tree powered) enables fast pagination, ranking, per-user stats, and randomization with fully reactive queries. It also covers keeping aggregates in sync via triggers/custom functions, backfilling with migrations, and the trade-offs that hint at possible platform-level support.
                                    
                                    
                                    
                                
                            
                        
                    
                        
                            August 15, 2025
                            
                            
                                
                                    
                                        AWS Database Blog - Amazon Aurora
                                    
                                    
                                        Securing Amazon Aurora DSQL: Access control best practices
                                    
                                    
                                    
                                        You can access an Amazon Aurora DSQL cluster by using a public endpoint and AWS PrivateLink endpoints. In this post, we demonstrate how to control access to your Aurora DSQL cluster by using public endpoints and private VPC endpoints through PrivateLink, both from inside and outside AWS.
                                    
                                    
                                    
                                    
                                        by Ranjan Burman
                                    
                                    
                                
                            
                                
                                    
                                        Franck Pachot
                                    
                                    
                                        Why doesn't Oracle Multi-Value Index optimize .sort() like MongoDB does with its multi-key index?
                                    
                                    
                                    
                                        Until recently, Oracle Database offered very limited options for creating inverted indexes, primarily restricted to special indexing for text and spatial data. Unlike PostgreSQL with Generalized Inverted Index (GIN), Oracle lacked a general solution. However, in response to the growing demand for document databases and to provide a MongoDB emulation, Oracle introduced the Multi-Value Index (MVI) in version 23ai. Let's create one and examine its internal structure.


  
  
  Lab setup


I start a docker container with Oracle 23ai running and connect with SQL*Plus:




docker run --name ora -d -e ORACLE_PWD=OracleDatabase_23ai container-registry.oracle.com/database/free:latest

docker exec -it ora bash -c 'until grep -B100 "Completed: ALTER DATABASE OPEN" /opt/oracle/diag/rdbms/free/FREE/trace/alert_FREE.log ; do sleep 1 ; done'

docker exec -it ora sqlplus / as sysdba







I create a table to store some JSON documents:




DROP TABLE if exists franck;
CREATE TABLE franck (data JSON);

INSERT INTO franck VALUES
  ('{ field1: "x", field2: [ "jkl" , "mno" , "jkl" ] }') -- 78 , [ 6a 6b 6c , 6d 6e 6f , 6a 6b 6c ]
 ,('{ field1: "y", field2: [ "mno" , "jkl"         ] }') -- 79 , [ 6d 6e 6f , 6a 6b 6c ]
 ,('{ field1: "z", field2:   "jkl"                   }') -- 7a ,   6a 6b 6c
);






The structure is flexible, and for the second field, I inserted scalar (in the third document) and arrays, some with redundant values, and with different orders.


  
  
  Multi-Value Index


I create a multi-value index similar to what the MongoDB API would create. It uses JSON_TABLE to extract fields to columns to be indexed. It uses JSON_MKMVI to make an index entry for the multi-value index:




CREATE MULTIVALUE INDEX FRANCK_MVI ON FRANCK (
 JSON_MKMVI(
  JSON_TABLE(
    "DATA", '$' COLUMNS (
     "SCALAR" PATH '$."field1"' ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH,
     NESTED PATH '$."field2"[*]' COLUMNS (
       "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
     )
    )
  )
 )
);






ORA_RAWCOMPARE is a clue that the ordering of nested values is binary. It would be nice to run those functions and see how they make index entries from JSON, but we cannot use them directly:




  select JSON_TABLE(
         *
ERROR at line 2:
ORA-61712: Can't use json_table syntax to create non-MVI(MultiValue Index): .
Help: https://docs.oracle.com/error-help/db/ora-61712/






The index entries will have a compound key with values extracted from the DATA column:



field1 as a scalar value (not used by the MongoDB emulation as all fields can be an array)

field2[*] as the values within an array, and NESTED PATH unnests the values (like $unwind) 


The indexes created by the MongoDB API emulation are all NESTED PATH to be compatible with MongoDB's flexible schema, where any field in a document can be a scalar or an array, but I created one scalar to see the behavior.


  
  
  Index internals


The index is created, making it easy to dump what's inside. I get the block identifier from DBA_SEGMENTS and dump the block:




SQL> -- get the block offset in datafile 1 ( I run this as sysdba, it goes so SYSTEM tablespace )
SQL> column block new_value block
SQL> select header_block+1 as block from dba_segments where segment_name='FRANCK_MVI' and owner=user;

     BLOCK
----------
    114553

SQL> -- set tracefile identifier and dump the block
SQL> alter session set tracefile_identifier = 'franck&block';
old   1: alter session set tracefile_identifier = 'franck&block'
new   1: alter session set tracefile_identifier = 'franck    114553'

Session altered.

SQL> -- get tracefile name and print
SQL> column tracefile new_value tracefile
SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid'));

TRACEFILE
--------------------------------------------------------------------------------
/opt/oracle/diag/rdbms/free/FREE/trace/FREE_ora_3518_franck    114553.trc

SQL> host cat "&tracefile."
Trace file /opt/oracle/diag/rdbms/free/FREE/trace/FREE_ora_3518_franck    114553.trc
Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free
Version 23.8.0.25.04
Build label:    RDBMS_23.8.0.25.04DBRU_LINUX.X64_250423
ORACLE_HOME:    /opt/oracle/product/23ai/dbhomeFree
System name:    Linux
Node name:      6fc22939038c
Release:        4.14.35-2047.540.4.1.el7uek.x86_64
Version:        #2 SMP Thu Sep 5 12:06:13 PDT 2024
Machine:        x86_64
VM name:        KVM CPUID feature flags: 0x01003afb
CLID:   P
Instance name: FREE
Instance number: 1
Database name: FREE
Database unique name: FREE
Database id: 1471336774
Database role: PRIMARY
Redo thread mounted by this instance: 1
Oracle binary inode:    1036542
Oracle process number: 63
Unix process pid:       3518
             NID:       4026532223
             created:   2025-08-15T09:12:36.-2739213311+00:00
             image:     oracle@6fc22939038c (TNS V1-V3)


*** 2025-08-15T09:13:47.601001+00:00 (CDB$ROOT(1))
*** SESSION ID:(213.17644) 2025-08-15T09:13:47.601014+00:00
*** CLIENT ID:() 2025-08-15T09:13:47.601024+00:00
*** SERVICE NAME:(SYS$USERS) 2025-08-15T09:13:47.601033+00:00
*** MODULE NAME:(sqlplus@6fc22939038c (TNS V1-V3)) 2025-08-15T09:13:47.601043+00:00
*** ACTION NAME:() 2025-08-15T09:13:47.601052+00:00
*** CLIENT DRIVER:(SQL*PLUS) 2025-08-15T09:13:47.601060+00:00
*** CONTAINER ID:(1) 2025-08-15T09:13:47.601069+00:00
*** CLIENT IP:(N/A) 2025-08-15T09:13:47.601078+00:00
*** CONNECTION ID:(PGTKhibdDbfgYwIAEaztFA==) 2025-08-15T09:13:47.601088+00:00

Start dump data blocks tsn: 0 file#:1 minblk 114553 maxblk 114553
Block dump from cache:

Dump of buffer cache at level 3 for pdb=1 tsn=0 rdba=114553

BH (0xbdf5b740) file#: 1 rdba: 0x0001bf79 (1024/114553) class: 1 ba: 0xbd134000
  set: 6 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: 72614 objn: 72614 tsn: [1/0] afn: 1 hint: 0xf
  hash: [0xc3f52b50,0x77de4520] lru: [0xbdf5d2e0,0xbdf5b990]
  ckptq: [NULL] fileq: [NULL]
  objq: [0x76c0c790,0x76c0c790] objaq: [0x76c0c780,0x76c0c780] qhead: 0x76c0c770
  st: XCURRENT md: NULL fpin: 'ktswh23: ktsfbkl' fscn: 0x2f14be fcur_vldr: 0x3 tch: 1
  flags: block_written_once
  LRBA: [0x0.0.0] LSCN: [0x0] HSCN: [0x2f14c0] HSUB: [1]
  Printing buffer operation history (latest change first): (cnt 9)
  01. sid:11 L192:kcbbic2:bic:FBD     02. sid:11 L191:kcbbic2:bic:FBW
  03. sid:11 L602:bic1_int:bis:FWC    04. sid:11 L822:bic1_int:ent:rtn
  05. sid:11 L832:oswmqbg1:clr:WRT    06. sid:11 L930:kubc:sw:mq
  07. sid:11 L913:bxsv:sw:objq        08. sid:11 L608:bxsv:bis:FBW
  09. sid:11 L607:bxsv:bis:FFW        10. sid:05 L464:chg1_mn:bic:FMS
  11. sid:05 L778:chg1_mn:bis:FMS     12. sid:05 L552:chg_main:bic:CLN
  13. sid:05 L353:gcur:set:MEXCL      14. sid:05 L353:gcur:set:MEXCL

  buffer tsn: 0 rdba: 0x0001bf79 (1024/114553)
  scn: 0x2f14c0 seq: 0x01 flg: 0x04 tail: 0x14c00601
  frmt: 0x02 chkval: 0x2a1a type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dumping 'block' addr=0xbd134000 size=8192 bytes
Dump of memory from 0xbd134000 to 0xbd136000
0BD134000 0000A206 0001BF79 002F14C0 04010000  [....y...../.....]
0BD134010 00002A1A 00400102 00011BA6 002F14BE  [.*....@......./.]
0BD134020 00008000 00020002 00000000 00000000  [................]
0BD134030 00000000 00000000 00000000 00000000  [................]
0BD134040 00000000 0000FFFF 00000000 00000000  [................]
0BD134050 00000000 80008000 002F14BE 03800000  [........../.....]
0BD134060 00000000 002E0005 1ED31F01 00000000  [................]
0BD134070 00000000 00000000 00000002 00001F60  [............`...]
0BD134080 1F3A1F4D 1F141F27 00001F01 00000000  [M.:.'...........]
0BD134090 00000000 00000000 00000000 00000000  [................]
        Repeat 492 times
0BD135F60 7A040201 6B6A0404 0000066C 02004108  [...z..jkl....A..]
0BD135F70 02000000 04790402 6F6E6D04 08000006  [......y..mno....]
0BD135F80 00010041 02020000 04047904 066C6B6A  [A........y..jkl.]
0BD135F90 41080000 00000100 04020200 6D040478  [...A........x..m]
0BD135FA0 00066F6E 00410800 00000000 78040202  [no....A........x]
0BD135FB0 6B6A0404 0000066C 00004108 00000000  [..jkl....A......]
0BD135FC0 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
0BD135FF0 00000000 00000000 00000000 14C00601  [................]

Block scn: 0x2f14c0
Block header dump:  0x0001bf79
 Object id on Block? Y
 seg/obj: 0x11ba6  csc:  0x00000000002f14be  itc: 2  flg: --  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn  0x00000000002f14be
Leaf block dump
===============
header address 3172155484=0xbd13405c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 5
kdxcofbo 46=0x2e
kdxcofeo 7937=0x1f01
kdxcoavs 7891
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 2
kdxlebksz 8032
row#0[8013] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 78
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 00
row#1[7994] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 78
col 1; len 4; (4):  04 6d 6e 6f
col 2; len 6; (6):  00 00 08 41 00 00
row#2[7975] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 79
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 01
row#3[7956] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 79
col 1; len 4; (4):  04 6d 6e 6f
col 2; len 6; (6):  00 00 08 41 00 01
row#4[7937] flag: -------, lock: 0, len=19, data:(2):  00 01
col 0; len 2; (2):  04 7a
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 02
----- end of leaf block Logical dump -----
----- end of leaf block dump -----






There is one index entry per unique value in the array, so 6 entries in total:




entry
key 0 (hex)
field1
key 1 (hex)
field2
rowid
row




0
04 78
'x'
04 6a 6b 6c
'jkl'
00 00 08 41 00 00
1


1
04 78
'x'
04 6d 6e 6f
'mno'
00 00 08 41 00 00
1


2
04 79
'y'
04 6a 6b 6c
'jkl'
00 00 08 41 00 01
2


3
04 79
'y'
04 6d 6e 6f
'mno'
00 00 08 41 00 01
2


4
04 7a
'z'
04 6a 6b 6c
'jkl'
00 00 08 41 00 02
3




The index appears to be suitable for ordered entries in MongoDB, which sorts the documents in ascending order by the minimum value of an array. However, Oracle Multi-Value indexes do not preserve the sort as they add a deduplication step. We will see that in execution plans.


  
  
  Execution Plan in MongoDB


Here is an example, with result and execution plan, in MongoDB:




db.franck.drop();

db.franck.createIndex({ field1: 1, field2: 2});

db.franck.insertMany([
     { field1: "x", field2: [ "jkl" , "mno" , "jkl" ] },
     { field1: "y", field2: [ "mno" , "jkl"         ] },
     { field1: "z", field2:   "jkl"                   },
    ]);

db.franck.find({ field1:"x" }).sort({ field1: 1, field2: 1 });

[
  {
    _id: ObjectId('689f03ed160c7ec59cd4b0ce'),
    field1: 'x',
    field2: [ 'jkl', 'mno', 'jkl' ]
  }
]

db.franck.find(
 { field1:"x" }
).sort(
 { field1: 1, field2: 1 }
).explain("executionStats").executionStats;

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 2,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 3,
    advanced: 1,
    needTime: 1,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    docsExamined: 1,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 3,
      advanced: 1,
      needTime: 1,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { field1: 1, field2: 2 },
      indexName: 'field1_1_field2_2',
      isMultiKey: true,
      multiKeyPaths: { field1: [], field2: [ 'field2' ] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { field1: [ '["x", "x"]' ], field2: [ '[MinKey, MaxKey]' ] },
      keysExamined: 2,
      seeks: 1,
      dupsTested: 2,
      dupsDropped: 1
    }
  }
}






The execution plan shows an IXSCAN and a FETCH, but no SORT operation because the index entries are scanned in the expected order.


  
  
  Execution Plan in MongoDB emulation


I run the same on the MongoDB API emulation of Oracle Autonomous Database 23ai:




ora> db.franck.drop();

true

ora> db.franck.createIndex({ field1: 1, field2: 2});

field1_1_field2_2

ora> db.franck.insertMany([
         { field1: "x", field2: [ "jkl" , "mno" , "jkl" ] },
         { field1: "y", field2: [ "mno" , "jkl"         ] },
         { field1: "z", field2:   "jkl"                   },
        ]);

{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId('689f041f23cd9d33b9d4b0c5'),
    '1': ObjectId('689f041f23cd9d33b9d4b0c6'),
    '2': ObjectId('689f041f23cd9d33b9d4b0c7')
  }
}

ora> db.franck.find(
 { field1:"x" }
).sort(
 { field1: 1, field2: 1 }
).explain("executionStats")

{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'ora.franck',
    indexFilterSet: false,
    parsedQuery: {
      '$query': { field1: { '$stringOnly': 'x' } },
      '$orderby': {
        '$fields': [
          { path: 'field1', order: 'asc', sortByMinMax: true },
          { path: 'field2', order: 'asc', sortByMinMax: true }
        ],
        '$lax': true
      }
    },
    rewrittenQuery: {
      '$and': [
        {
          '$query': { '$value': { field1: { '
                                    
                                    
                                    
                                        by Franck Pachot
                                    
                                    
                                

                            
                                
                                    
                                        Franck Pachot
                                    
                                    
                                        Why doesn't Oracle Multi-Value Index optimize .sort() like MongoDB does with its multi-key index? RecordId deduplication.
                                    
                                    
                                    
                                        Until recently, Oracle Database offered very limited options for creating inverted indexes, primarily restricted to special indexing for text and spatial data. Unlike PostgreSQL with Generalized Inverted Index (GIN), Oracle lacked a general solution. However, in response to the growing demand for document databases and to provide a MongoDB emulation, Oracle introduced the Multi-Value Index (MVI) in version 23ai. Let's create one and examine its internal structure and hot it differs from MongoDB indexes.


  
  
  Lab setup


I start a Docker container with Oracle 23ai running and connect with SQL*Plus:




docker run --name ora -d -e ORACLE_PWD=OracleDatabase_23ai container-registry.oracle.com/database/free:latest

docker exec -it ora bash -c 'until grep -B100 "Completed: ALTER DATABASE OPEN" /opt/oracle/diag/rdbms/free/FREE/trace/alert_FREE.log ; do sleep 1 ; done'

docker exec -it ora sqlplus / as sysdba







I create a table to store some JSON documents:




DROP TABLE if exists franck;
CREATE TABLE franck (data JSON);

INSERT INTO franck VALUES
  ('{ field1: "x", field2: [ "jkl" , "mno" , "jkl" ] }') -- 78 , [ 6a 6b 6c , 6d 6e 6f , 6a 6b 6c ]
 ,('{ field1: "y", field2: [ "mno" , "jkl"         ] }') -- 79 , [ 6d 6e 6f , 6a 6b 6c ]
 ,('{ field1: "z", field2:   "jkl"                   }') -- 7a ,   6a 6b 6c
);






The structure is flexible, and for the second field, I inserted scalar (in the third document) and arrays, some with redundant values, and with different orders.


  
  
  Multi-Value Index


I create a multi-value index similar to what the MongoDB API emulation would create. It uses JSON_TABLE to extract fields to columns to be indexed. It uses JSON_MKMVI to make an index entry for the multi-value index:




CREATE MULTIVALUE INDEX FRANCK_MVI ON FRANCK (
 JSON_MKMVI(
  JSON_TABLE(
    "DATA", '$' COLUMNS (
     "SCALAR" PATH '$."field1"' ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH,
     NESTED PATH '$."field2"[*]' COLUMNS (
       "K0" ANY ORA_RAWCOMPARE PATH '$' ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
     )
    )
  )
 )
);






ORA_RAWCOMPARE is a clue that the ordering of nested values is binary. It would be nice to run those functions and see how they make index entries from JSON, but we cannot use them directly:




  select JSON_TABLE(
         *
ERROR at line 2:
ORA-61712: Can't use json_table syntax to create non-MVI(MultiValue Index): .
Help: https://docs.oracle.com/error-help/db/ora-61712/






The index entries will have a compound key with values extracted from the DATA column:



field1 as a scalar value (not used by the MongoDB emulation as all fields can be an array)

field2[*] as the values within an array, and NESTED PATH unnests the values (like $unwind) 


The indexes created by the MongoDB API emulation are all NESTED PATH to be compatible with MongoDB's flexible schema, where any field in a document can be a scalar or an array, but I created one scalar to see the behavior.


  
  
  Index internals


The index is created, making it easy to dump what's inside. I get the block identifier from DBA_SEGMENTS and dump the block:




SQL> -- get the block offset in datafile 1 ( I run this as sysdba, it goes so SYSTEM tablespace )
SQL> column block new_value block
SQL> select header_block+1 as block from dba_segments where segment_name='FRANCK_MVI' and owner=user;

     BLOCK
----------
    114553

SQL> -- set tracefile identifier and dump the block
SQL> alter session set tracefile_identifier = 'franck&block';
old   1: alter session set tracefile_identifier = 'franck&block'
new   1: alter session set tracefile_identifier = 'franck    114553'

Session altered.

SQL> -- get tracefile name and print
SQL> column tracefile new_value tracefile
SQL> select tracefile from v$process where addr=(select paddr from v$session where sid=sys_context('userenv','sid'));

TRACEFILE
--------------------------------------------------------------------------------
/opt/oracle/diag/rdbms/free/FREE/trace/FREE_ora_3518_franck    114553.trc

SQL> host cat "&tracefile."
Trace file /opt/oracle/diag/rdbms/free/FREE/trace/FREE_ora_3518_franck    114553.trc
Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free
Version 23.8.0.25.04
Build label:    RDBMS_23.8.0.25.04DBRU_LINUX.X64_250423
ORACLE_HOME:    /opt/oracle/product/23ai/dbhomeFree
System name:    Linux
Node name:      6fc22939038c
Release:        4.14.35-2047.540.4.1.el7uek.x86_64
Version:        #2 SMP Thu Sep 5 12:06:13 PDT 2024
Machine:        x86_64
VM name:        KVM CPUID feature flags: 0x01003afb
CLID:   P
Instance name: FREE
Instance number: 1
Database name: FREE
Database unique name: FREE
Database id: 1471336774
Database role: PRIMARY
Redo thread mounted by this instance: 1
Oracle binary inode:    1036542
Oracle process number: 63
Unix process pid:       3518
             NID:       4026532223
             created:   2025-08-15T09:12:36.-2739213311+00:00
             image:     oracle@6fc22939038c (TNS V1-V3)


*** 2025-08-15T09:13:47.601001+00:00 (CDB$ROOT(1))
*** SESSION ID:(213.17644) 2025-08-15T09:13:47.601014+00:00
*** CLIENT ID:() 2025-08-15T09:13:47.601024+00:00
*** SERVICE NAME:(SYS$USERS) 2025-08-15T09:13:47.601033+00:00
*** MODULE NAME:(sqlplus@6fc22939038c (TNS V1-V3)) 2025-08-15T09:13:47.601043+00:00
*** ACTION NAME:() 2025-08-15T09:13:47.601052+00:00
*** CLIENT DRIVER:(SQL*PLUS) 2025-08-15T09:13:47.601060+00:00
*** CONTAINER ID:(1) 2025-08-15T09:13:47.601069+00:00
*** CLIENT IP:(N/A) 2025-08-15T09:13:47.601078+00:00
*** CONNECTION ID:(PGTKhibdDbfgYwIAEaztFA==) 2025-08-15T09:13:47.601088+00:00

Start dump data blocks tsn: 0 file#:1 minblk 114553 maxblk 114553
Block dump from cache:

Dump of buffer cache at level 3 for pdb=1 tsn=0 rdba=114553

BH (0xbdf5b740) file#: 1 rdba: 0x0001bf79 (1024/114553) class: 1 ba: 0xbd134000
  set: 6 pool: 3 bsz: 8192 bsi: 0 sflg: 2 pwc: 0,0
  dbwrid: 0 obj: 72614 objn: 72614 tsn: [1/0] afn: 1 hint: 0xf
  hash: [0xc3f52b50,0x77de4520] lru: [0xbdf5d2e0,0xbdf5b990]
  ckptq: [NULL] fileq: [NULL]
  objq: [0x76c0c790,0x76c0c790] objaq: [0x76c0c780,0x76c0c780] qhead: 0x76c0c770
  st: XCURRENT md: NULL fpin: 'ktswh23: ktsfbkl' fscn: 0x2f14be fcur_vldr: 0x3 tch: 1
  flags: block_written_once
  LRBA: [0x0.0.0] LSCN: [0x0] HSCN: [0x2f14c0] HSUB: [1]
  Printing buffer operation history (latest change first): (cnt 9)
  01. sid:11 L192:kcbbic2:bic:FBD     02. sid:11 L191:kcbbic2:bic:FBW
  03. sid:11 L602:bic1_int:bis:FWC    04. sid:11 L822:bic1_int:ent:rtn
  05. sid:11 L832:oswmqbg1:clr:WRT    06. sid:11 L930:kubc:sw:mq
  07. sid:11 L913:bxsv:sw:objq        08. sid:11 L608:bxsv:bis:FBW
  09. sid:11 L607:bxsv:bis:FFW        10. sid:05 L464:chg1_mn:bic:FMS
  11. sid:05 L778:chg1_mn:bis:FMS     12. sid:05 L552:chg_main:bic:CLN
  13. sid:05 L353:gcur:set:MEXCL      14. sid:05 L353:gcur:set:MEXCL

  buffer tsn: 0 rdba: 0x0001bf79 (1024/114553)
  scn: 0x2f14c0 seq: 0x01 flg: 0x04 tail: 0x14c00601
  frmt: 0x02 chkval: 0x2a1a type: 0x06=trans data
Hex dump of block: st=0, typ_found=1
Dumping 'block' addr=0xbd134000 size=8192 bytes
Dump of memory from 0xbd134000 to 0xbd136000
0BD134000 0000A206 0001BF79 002F14C0 04010000  [....y...../.....]
0BD134010 00002A1A 00400102 00011BA6 002F14BE  [.*....@......./.]
0BD134020 00008000 00020002 00000000 00000000  [................]
0BD134030 00000000 00000000 00000000 00000000  [................]
0BD134040 00000000 0000FFFF 00000000 00000000  [................]
0BD134050 00000000 80008000 002F14BE 03800000  [........../.....]
0BD134060 00000000 002E0005 1ED31F01 00000000  [................]
0BD134070 00000000 00000000 00000002 00001F60  [............`...]
0BD134080 1F3A1F4D 1F141F27 00001F01 00000000  [M.:.'...........]
0BD134090 00000000 00000000 00000000 00000000  [................]
        Repeat 492 times
0BD135F60 7A040201 6B6A0404 0000066C 02004108  [...z..jkl....A..]
0BD135F70 02000000 04790402 6F6E6D04 08000006  [......y..mno....]
0BD135F80 00010041 02020000 04047904 066C6B6A  [A........y..jkl.]
0BD135F90 41080000 00000100 04020200 6D040478  [...A........x..m]
0BD135FA0 00066F6E 00410800 00000000 78040202  [no....A........x]
0BD135FB0 6B6A0404 0000066C 00004108 00000000  [..jkl....A......]
0BD135FC0 00000000 00000000 00000000 00000000  [................]
        Repeat 2 times
0BD135FF0 00000000 00000000 00000000 14C00601  [................]

Block scn: 0x2f14c0
Block header dump:  0x0001bf79
 Object id on Block? Y
 seg/obj: 0x11ba6  csc:  0x00000000002f14be  itc: 2  flg: --  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01

 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000
0x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn  0x00000000002f14be
Leaf block dump
===============
header address 3172155484=0xbd13405c
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 5
kdxcofbo 46=0x2e
kdxcofeo 7937=0x1f01
kdxcoavs 7891
kdxlespl 0
kdxlende 0
kdxlenxt 0=0x0
kdxleprv 0=0x0
kdxledsz 2
kdxlebksz 8032
row#0[8013] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 78
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 00
row#1[7994] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 78
col 1; len 4; (4):  04 6d 6e 6f
col 2; len 6; (6):  00 00 08 41 00 00
row#2[7975] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 79
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 01
row#3[7956] flag: -------, lock: 0, len=19, data:(2):  00 02
col 0; len 2; (2):  04 79
col 1; len 4; (4):  04 6d 6e 6f
col 2; len 6; (6):  00 00 08 41 00 01
row#4[7937] flag: -------, lock: 0, len=19, data:(2):  00 01
col 0; len 2; (2):  04 7a
col 1; len 4; (4):  04 6a 6b 6c
col 2; len 6; (6):  00 00 08 41 00 02
----- end of leaf block Logical dump -----
----- end of leaf block dump -----






There is one index entry per unique value in the array, so five entries in total:




entry
key 0 (hex)
field1
key 1 (hex)
field2
rowid
row




0
04 78
'x'
04 6a 6b 6c
'jkl'
00 00 08 41 00 00
1


1
04 78
'x'
04 6d 6e 6f
'mno'
00 00 08 41 00 00
1


2
04 79
'y'
04 6a 6b 6c
'jkl'
00 00 08 41 00 01
2


3
04 79
'y'
04 6d 6e 6f
'mno'
00 00 08 41 00 01
2


4
04 7a
'z'
04 6a 6b 6c
'jkl'
00 00 08 41 00 02
3




The index appears to be suitable for ordered entries in MongoDB, which sorts the documents in ascending order by the minimum value of an array. However, Oracle multi-value indexes do not preserve the sort as they add a deduplication step. We will see that in execution plans.


  
  
  Execution plan in MongoDB


Here is an example, with result and execution plan, in MongoDB:




db.franck.drop();

db.franck.createIndex({ field1: 1, field2: 2});

db.franck.insertMany([
     { field1: "x", field2: [ "jkl" , "mno" , "jkl" ] },
     { field1: "y", field2: [ "mno" , "jkl"         ] },
     { field1: "z", field2:   "jkl"                   },
    ]);

db.franck.find({ field1:"x" }).sort({ field1: 1, field2: 1 });

[
  {
    _id: ObjectId('689f03ed160c7ec59cd4b0ce'),
    field1: 'x',
    field2: [ 'jkl', 'mno', 'jkl' ]
  }
]

db.franck.find(
 { field1:"x" }
).sort(
 { field1: 1, field2: 1 }
).explain("executionStats").executionStats;

{
  executionSuccess: true,
  nReturned: 1,
  executionTimeMillis: 0,
  totalKeysExamined: 2,
  totalDocsExamined: 1,
  executionStages: {
    isCached: false,
    stage: 'FETCH',
    nReturned: 1,
    executionTimeMillisEstimate: 0,
    works: 3,
    advanced: 1,
    needTime: 1,
    needYield: 0,
    saveState: 0,
    restoreState: 0,
    isEOF: 1,
    docsExamined: 1,
    alreadyHasObj: 0,
    inputStage: {
      stage: 'IXSCAN',
      nReturned: 1,
      executionTimeMillisEstimate: 0,
      works: 3,
      advanced: 1,
      needTime: 1,
      needYield: 0,
      saveState: 0,
      restoreState: 0,
      isEOF: 1,
      keyPattern: { field1: 1, field2: 2 },
      indexName: 'field1_1_field2_2',
      isMultiKey: true,
      multiKeyPaths: { field1: [], field2: [ 'field2' ] },
      isUnique: false,
      isSparse: false,
      isPartial: false,
      indexVersion: 2,
      direction: 'forward',
      indexBounds: { field1: [ '["x", "x"]' ], field2: [ '[MinKey, MaxKey]' ] },
      keysExamined: 2,
      seeks: 1,
      dupsTested: 2,
      dupsDropped: 1
    }
  }
}






The execution plan shows an IXSCAN and a FETCH, but no SORT operation because the index entries are scanned in the expected order.

Important statistics to note are: dupsTested: 2 and dupsDropped: 1. A multi-key index contains multiple keys for one document, but only one key is needed to fetch the document, resulting in the dropping of duplicates. MongoDB IXSCAN can eliminate duplicate keys from the same document while maintaining key order, keeping the key that matters (the minimum for ascending key order) and eliminating the need for an additional sort.


  
  
  Execution plan in MongoDB emulation


I run the same on the MongoDB API emulation of Oracle Autonomous Database 23ai:




ora> db.franck.drop();

true

ora> db.franck.createIndex({ field1: 1, field2: 2});

field1_1_field2_2

ora> db.franck.insertMany([
         { field1: "x", field2: [ "jkl" , "mno" , "jkl" ] },
         { field1: "y", field2: [ "mno" , "jkl"         ] },
         { field1: "z", field2:   "jkl"                   },
        ]);

{
  acknowledged: true,
  insertedIds: {
    '0': ObjectId('689f041f23cd9d33b9d4b0c5'),
    '1': ObjectId('689f041f23cd9d33b9d4b0c6'),
    '2': ObjectId('689f041f23cd9d33b9d4b0c7')
  }
}

ora> db.franck.find(
 { field1:"x" }
).sort(
 { field1: 1, field2: 1 }
).explain("executionStats")

{
  queryPlanner: {
    plannerVersion: 1,
    namespace: 'ora.franck',
    indexFilterSet: false,
    parsedQuery: {
      '$query': { field1: { '$stringOnly': 'x' } },
      '$orderby': {
        '$fields': [
          { path: 'field1', order: 'asc', sortByMinMax: true },
          { path: 'field2', order: 'asc', sortByMinMax: true }
        ],
        '$lax': true
      }
    },
    rewrittenQuery: {
      '
                                    
                                    
                                    
                                        by Franck Pachot
                                    
                                    
                                

                            
                        

                    
                        
                            August 14, 2025
                            
                            
                                
                                    
                                        Percona Database Performance Blog
                                    
                                    
                                        Top 5 Security Risks of Running MySQL 8.0 After Its EOL
                                    
                                    
                                    
                                        Your MySQL database has been running smoothly for years. Your team knows it inside and out. Everything just… works. Why rock the boat with an upgrade? Here’s why: MySQL 8.0 reaches its end-of-life date in April 2026. After this date, there’s no safety net; staying on end-of-life software means you’re taking on all the responsibility […]
                                    
                                    
                                    
                                    
                                        by David Quilty
                                    
                                    
                                
                            
                        
                    
                        
                            August 13, 2025
                            
                            
                                
                                    
                                        Murat Demirbas
                                    
                                    
                                        Vive la Difference: Practical Diff Testing of Stateful Applications
                                    
                                    
                                    
                                        This Google paper (to appear in VLDB'25) is about not blowing up your production system. That is harder than it sounds, especially with stateful applications with memories. When rolling out new versions of stateful applications, the "shared, persistent, mutable" data means bugs can easily propagate across versions. Modern rollout tricks (canaries, blue/green deployments) don't save you from this. Subtle cross-version issues often slip through pre-production testing and surface in production, sometimes days or weeks later.
These bugs can be severe, and the paper categorizes them as data corruption, data incompatibility, and false data assumptions. The paper mentions real-world incidents from Google and open-source projects to emphasize these bugs' long detection and resolution times, and the production outages and revenue loss they cause.
So, we need tooling that directly tests v1/v2 interactions on realistic data before a rollout. The paper delivers a prototype of a stateful diff testing framework, which extends the familiar idea of diff testing (comparing outputs between two implementations) to handle persistent state. So, you take v1 (the old code) and v2 (the shiny new code). You feed them the same requests on the same kind of data you'll see in production. Then you compare two things: the answers they give, and the state they leave behind. This framework aims to solve three problems:
Comparing versions on production-scale data without risking live systems.
Testing subtle cross-version interactions, including request interleavings that occur during rollouts.
Capturing and efficiently reasoning about changes to persistent state.

The framework (code available on Github) takes three inputs: an initial database state (preferably a snapshot from production in a test environment), a set of representative requests (sampled, synthetic, or hand-crafted), and two binaries: v1 (current production) and v2 (candidate release).
The framework operates across three stages: branching, replaying, diffing.

Branching
They bolt a branching layer on PostgreSQL to create instant, isolated, writable branches of the database without copying the entire dataset. Two special tables (R⁺, R⁻) track changes and a derived view (R′) is used as the new database. Postgres triggers on views intercept every insert, delete, and update, keeping the base table untouched. Common integrity constraints (primary key, uniqueness, non-null, foreign key) are enforced per branch, but certain features (CHECK constraints, pre-existing triggers) are unsupported, limiting coverage for some schemas. The approach avoids changes to Postgres internals or application queries, making it minimally invasive.
I liked this approach a lot. It uses monotonicity principles to branch the database instantly and enables branching in time/work proportional to the size of the diffs rather than the size of the whole table. This is a bolt-on approach, so they don't have the full benefits/features of versioned databases, but they can also boast that this bolt-on approach is applicable to other databases as well (MySQL, Oracle, MongoDB). 

Replaying
Requests are executed serially, in three patterns: entirely on v1, entirely on v2, and in interleaved sequences between the two versions. Interleaving is crucial for catching compatibility issues where one version reads or writes data in a way the other cannot handle. The framework generates multiple random interleavings for each request sequence, trading exhaustive coverage for practical runtime. Concurrency bugs are explicitly out of scope in this framework, as it is hard to deal with the heavy/exponential nondeterminism they induce.

Diffing
The database diffing algorithm is designed for efficiency: rather than scanning full tables, the diffing inspects only the delta tables (R⁺, R⁻) , making runtime proportional to the number of changed rows. It produces three-way diffs: original state, v1 result, v2 result. This gives crucial context absent in two-way comparisons. This shows not only what changed, but how each version diverged from the common starting point. 

Evaluation
The authors implemented the framework in Go and tested it using synthetic workloads and the Bank of Anthos, a friendly little demo application. In these tests, they introduced the three bug categories described earlier and showed that the framework can catch them and produce clear diffs that explained the discrepancies.
They had a lot of performance benchmarks to show that branch creation is ~100 ms and independent of database size, read/write overheads are modest, and diff computation time scales with the size of the diff rather than the full dataset. They show that compared to naive Postgres SQL queries, diffing is up to 100x faster. They also show that compared to Dolt (a versioned database), branching is slower but read/write performance is better for large tables.

Discussion
As in the past two weeks, Aleksey and I live-recorded our reading and discussion of this paper to show the thought process and mechanics of how experts read papers in real time. You can watch the discussion video here (1.5x speed is best). The paper I annotated during our live-read is also available here.
Below are some of the main points from our discussion.
We initially expected the paper to be more of an experience report on bug-catching at Google scale. Instead, the evaluation presented a toy example and focused on microbenchmarks measuring latency of framework components. In that sense, the evaluation did not connect to the production-safety problem tightly. Something to look forward to in a follow-up paper.
The approach applies to a single-node database, but Google operates large-scale distributed databases. How would this method translate to Spanner? Although the current framework does not address concurrency, if one could capture the exact sequence of production inputs and replay them in a test setting, could that provide meaningful results?
The paper says: "The exact procedure by which inputs (the sequence of client requests) are generated is out of scope for this paper. Typically, they are generated randomly or drawn from a pool of hand-crafted inputs." But this sidesteps the important question: how do you generate the right inputs to catch bugs? And how do you measure coverage, improve it, and gain confidence before rolling out to production?
We also wanted a more explicit, qualitative comparison between their delta-table branching method and alternative techniques such as copy-on-write branching.
Another concern was that manual diff inspection could become a bottleneck. Could LLMs help triage diffs and surface the most suspicious ones?
Overall, the paper is in a strong place. It bridges the database and testing fields, and even touches cloud operations, making it well-positioned for long-term impact. It would be good to check on this prediction in 5-10 years.
Finally, shout out to a great guy, Michael Whittaker. We had collaborated on the "Compartmentalized  Paxos" paper, and we were impressed with his clear on-the-fly iPad illustrations of protocols/ideas in our Zoom meetings. He is likely responsible for the clean figures in this paper. Fun fact: he skipped formal slides for his PhD thesis defense, instead drew the protocols live on his iPad (which comes from hundreds of hours of prior practice in meetings).
                                    
                                    
                                    
                                    
                                        by Murat (noreply@blogger.com)
                                    
                                    
                                
                            
                                
                                    
                                        Stack - Convex Blog
                                    
                                    
                                        Top 5 Real-Time Databases Compared
                                    
                                    
                                    
                                        Compare top real-time databases: Convex for TypeScript teams, Firebase for MVPs, Supabase for SQL, Realm for mobile, InfluxDB for analytics.
                                    
                                    
                                    
                                
                            
                                
                                    
                                        Percona Database Performance Blog
                                    
                                    
                                        Webinar Q&A: No More Workarounds: Open Source PostgreSQL TDE Is Here
                                    
                                    
                                    
                                        I would like to thank everyone who attended my webinar, “No More Workarounds: Open Source PostgreSQL TDE Is Here,” on July 23, 2025. If you missed it, you can view the recording and slides. Here are the questions I had no time to answer during the live session. If you have more questions, post them in […]
                                    
                                    
                                    
                                    
                                        by Michal Nosek
                                    
                                    
                                
                            
                                
                                    
                                        Murat Demirbas
                                    
                                    
                                        Towards Optimal Transaction Scheduling
                                    
                                    
                                    
                                        This paper (VLDB'2024) looks at boosting transaction throughput through better scheduling. The idea is to explore the schedule-space more systematically and pick execution orders that reduce conflicts.
The paper's two main contributions are a scheduling policy called Shortest Makespan First (SMF) and a MVTSO concurrency control variant called MVSchedO.
SMF uses a greedy heuristic to pick low-conflict schedules that minimize the increase in total execution time (makespan) at each step. MVSchedO enforces the chosen schedule at a fine-grained level by adapting multi-version timestamp ordering (MVTSO). The authors implement SMF and MVSchedO in RocksDB (R-SMF) and show up to 3.9x higher throughput and 3.2x lower tail latency on benchmarks as well as Meta's TAO workload.
I mean, this is a stellar systems work, and it makes a convincing case that search-based scheduling is a promising direction for extracting higher throughput from database systems.
Motivation
Reducing contention is a big deal. Throughput drops when transactions contend for the same data. But, finding the optimal throughput schedule is NP-complete (Papadimitriou-1970), so database systems use mostly FIFO arrival order among pending transactions (i.e., the transactions made available to the scheduler at a given time). How much can we reduce conflicts by reordering, and is it worth going to all that trouble?
This paper explores the full schedule space more systematically, even when access patterns are only partially known. It tries to find good schedules quickly enough to be practical, and then execute transactions exactly in that schedule order.
They use SMF for the first challenge and MVSchedO for the second, and implement both on RocksDB to show substantial gains.

Searching for Fast Schedules
To define a framework to measure the effect of transaction ordering, the paper proposes makespan: the total time for a batch to complete under idealized execution (equal-length operations, no failures, no bottlenecks). Minimizing makespan maximizes throughput for a finite batch. A toy example with four transactions under MVTSO shows that the FIFO schedule has makespan 8, while another that groups compatible transactions reduces makespan to 6, achieving a 25% improvement in throughput.

Shortest Makespan First (SMF) is a greedy scheduling algorithm to reduce makespan. You start with one transaction, then repeatedly add the one from a random sample of k unscheduled transactions that causes the smallest increase in makespan. This gives O(nk) complexity. Small k (like 5) works well in real workloads because there's a good chance of picking a low-conflict transaction.
The key insight is that most contention comes from a few hot keys. By separating transactions that hit those keys, SMF gets most of the benefit without needing full access-set knowledge. The paper illustrates this with TPC-C: it shows that knowing only Warehouse/District keys gets within 6% of the performance of having complete read/write knowledge.

An online scheduling-first database
To bring SMF into an online database, R-SMF splits the work into three steps:
predict the small set of hot-key accesses using a classifier,
schedule transactions using SMF based on those predictions, and
enforce the order during execution using MVSchedO, which modifies MVTSO.

The classifier is simple and efficient. It clusters metadata vectors (type + hot keys) using KNN to get canonical hot-key sets. At runtime, a transaction's hints map to the predicted hot keys. This loses some precision but keeps overhead low, and is accurate in OLTP workloads where most contention comes from obvious keys. The model is retrained periodically to track drift.
The classifier is simple and efficient. To produce canonical hot-key sets, it employs kNN to cluster transaction metadata, such as the transaction type (new order, transfer funds) and explicit key parameters (warehouse ID, account number). At runtime, a transaction's hints (derived from metadata) are mapped to a predicted hot-key set. This sacrifices some precision but keeps overhead low, and works well in OLTP workloads where most contention comes from a few obvious keys. The model is periodically retrained to adapt to workload drift.
The online SMF only tracks in-flight transactions and ignores non-hot keys for makespan. For each hot key, it only tracks the latest conflicting op. This cuts memory and compute cost while keeping most of the scheduling benefit. The scheduler runs in O(nk) per batch, where n = in-flight txns, k is a small constant like 5.
MVSchedO adapts MVTSO to enforce SMF's chosen ordering. MVTSO already assigns timestamps and allows maximal concurrency, but typically assigns timestamps by arrival and does not prevent later-timestamp operations from running before earlier ones (causing aborts). To integrate SMF into MVTSO, MVSchedO enforces SMF's order by giving SMF's timestamps to MVTSO and delays hot-key ops until prior scheduled ops finish. The rest of MVTSO's read/write/commit logic remains unchanged. (In Algorithm 1, only the asterisk-marked lines are updated from the original MVTSO algorithm.) Non-hot keys run as usual. If predictions are wrong, blocked transactions resume when the awaited ops finish or abort. This keeps enforcement overhead low and ensures maximum concurrency possible under the chosen schedule.
MVSchedO preserves serializability because any execution it allows is a valid MVTSO execution under some arrival order. SMF only changes timestamp assignment. The paper argues MVSchedO achieves optimal concurrency for the given schedule with early write visibility.

Evaluation
The evaluation tries to address these questions: (1) how much does scheduling help, (2) what are the overheads, and (3) how does SMF compare to other search methods. They implement R-SMF in RocksDB, plus bolt-on versions (SMF-OCC, SMF-Lock) and a TAO prototype. Workloads include Epinions, SmallBank, TPC-C, TAOBench, and YCSB. Experiments use separate client/server VMs, closed-loop clients, and averaged results.
R-SMF shows big gains on skewed and mixed workloads: up to 3.9x throughput over RocksDB OCC and 3.2x lower tail latency on benchmarks and TAO. SMF reduces wasted work and aborts by picking low-conflict orders; MVSchedO prevents aborts from out-of-order execution after timestamp assignment. The bolt-on SMF variants also help, though less. Abort rates drop sharply across workloads.
Overhead is small when contention is low (<5% throughput drop) since most transactions skip hot-key logic. At medium/high contention, R-SMF improves throughput significantly. I like this "pay as you need" approach. The classifier accuracy seems to be the key: when hints are good, performance improves; with poor hints (50% wrong), performance can drop. The paper argues that system can fall back to FIFO if schedule quality degrades.


Comparison to Morty
Morty (which we had reviewed recently) and R-SMF both integrate ordering into concurrency control and build on MVTSO ideas, but they differ in approach. R-SMF focuses on avoiding conflicts up front: it predicts hot keys, uses SMF to pick a low-conflict order, and enforces it for those keys via MVSchedO. Execution sticks to the schedule from the start, so there's no speculative work. Mispredictions just cause some idle waiting. This works best when hot-key prediction is accurate and contention is skewed, and it can fall back to FIFO if needed.
Morty instead uses a speculate-and-repair model for geo-replicated, interactive workloads. It assigns a speculative total order at arrival but allows reads from uncommitted data to boost concurrency. At commit-time validation, if a read was stale, Morty rewinds to that point and re-executes only the affected part. This handles mispredictions naturally and works well in high-latency WAN settings, where speculative overlap hides delays.
I think a natural extension would be to combine the two: use R-SMF's SMF+MVSchedO to avoid most conflicts through proactive hot-key scheduling, but when unexpected conflicts slip through (due to mispredictions or non-hot-key contention) apply Morty-style selective re-execution instead of full aborts. This hybrid could cut abort rates further and preserve more work per transaction when hot-key prediction accuracy degrades.
                                    
                                    
                                    
                                    
                                        by Murat (noreply@blogger.com)
                                    
                                    
                                
                            
                        
                    
                        
                            August 12, 2025
                            
                            
                                
                                    
                                        Franck Pachot
                                    
                                    
                                        Does PostgreSQL support as much "schema flexibility" as MongoDB? Not for indexing!
                                    
                                    
                                    
                                        Another day, another myth. Someone on Medium claims PostgreSQL offers as much "schema flexibility" as MongoDB, and tells me that I'm misinformed when I warn about such misconception: you-are-misinformed. I think I'm informed, but I always test and none of his examples work: wrong results, and the index not used. 

Let's go though it. Others' mistakes can be valuable learning material and PostgreSQL with JSON can be tricky.

The article provides the following example of document:




{
  name: "Amit",
  email: "amit@example.com",
  order: [
    { amount: 1200, status: "paid"  , createdAt:  "2025-08-15" },
    { amount:  300, status: "failed", createdAt: "2025-08-16" }
  ]
}






I create a PostgreSQL table to store this document, as well as another with older dates, to validate the query filtering:




postgres=# CREATE TABLE my_table (
    id BIGSERIAL PRIMARY KEY,
    jsonb_col JSONB
);
CREATE TABLE

postgres=# INSERT INTO my_table(jsonb_col) VALUES  ('
{
  "name": "Amit",
  "order": [
    { "amount": 1200, "status": "paid",   "createdAt":  "2025-08-15" },
    { "amount":  300, "status": "failed", "createdAt": "2025-08-16" }
  ]
}
'),('
{
  "name": "Older",
  "order": [
    { "amount": 1200, "status": "paid",   "createdAt":  "2024-08-15" },
    { "amount":  300, "status": "failed", "createdAt": "2024-08-16" }
  ]
}
');

INSERT 0 2






I used a string for the date because JSONB data types are limited to object, array, string, number, boolean, and null, unlike MongoDB's BSON types.


  
  
  1. Incorrect operator


The response on medium starts like this:


You are misinformed.

CREATE INDEX my_index ON my_table USING GIN (jsonb_col);

SELECT * FROM my_table WHERE jsonb_col @? '$.order.createdAt >= "2025-08-12"';


I copy/paste the example and it provides a wrong result as it includes a user with all orders created before "2025-08-12":




postgres=# CREATE INDEX my_index ON my_table USING GIN (jsonb_col);

CREATE INDEX

postgres=# SELECT * FROM my_table 
           WHERE jsonb_col @? '$.order.createdAt >= "2025-08-12"';

 id |                                                                           jsonb_col
----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"name": "Amit", "order": [{"amount": 1200, "status": "paid", "createdAt": "2025-08-15"}, {"amount": 300, "status": "failed", "createdAt": "2025-08-16"}]}
  2 | {"name": "Older", "order": [{"amount": 1200, "status": "paid", "createdAt": "2024-08-15"}, {"amount": 300, "status": "failed", "createdAt": "2024-08-16"}]}
(2 rows)






Did you spot the error? PostgreSQL JSON operators are tricky:



@? is a jsonb_path_exists operator

@@ is a jsonb_path_match operator


The query returned all rows because the path exists. What the author wanted was filtering on the date in this path. Here is the query corrected:




postgres=# SELECT * FROM my_table 
           WHERE jsonb_col @@ '$.order.createdAt >= "2025-08-12"';
 id |                                                                         jsonb_col
----+------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"name": "Amit", "order": [{"amount": 1200, "status": "paid", "createdAt": "2025-08-15"}, {"amount": 300, "status": "failed", "createdAt": "2025-08-16"}]}
(1 row)






The result is the expected one: only one user ordered after "2025-08-12". Let's check if the index was used.


  
  
  2. Index unused


With such small dataset, the query runs a full table scan, but that may be a cost-based decision:




postgres=# explain (analyze, costs off, summary off)
           SELECT * FROM my_table
           WHERE jsonb_col @@ '$.order.createdAt >= "2025-08-12"'
;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Seq Scan on my_table (actual time=0.016..0.018 rows=1.00 loops=1)
   Filter: (jsonb_col @@ '($."order"."createdAt" >= "2025-08-12")'::jsonpath)
   Rows Removed by Filter: 1
   Buffers: shared hit=1
 Planning:
   Buffers: shared hit=1






I disable sequential scan and the index is used:




postgres=# set enable_seqscan to off;
SET

postgres=# explain (analyze, costs off, summary off)
           SELECT * FROM my_table
           WHERE jsonb_col @@ '$.order.createdAt >= "2025-08-12"'
;

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_table (actual time=0.023..0.025 rows=1.00 loops=1)
   Recheck Cond: (jsonb_col @@ '($."order"."createdAt" >= "2025-08-12")'::jsonpath)
   Rows Removed by Index Recheck: 1
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on my_index (actual time=0.009..0.009 rows=2.00 loops=1)
         Index Cond: (jsonb_col @@ '($."order"."createdAt" >= "2025-08-12")'::jsonpath)
         Index Searches: 1
         Buffers: shared hit=3
 Planning:
   Buffers: shared hit=1






However, the index was not used to find the rows, despite seeing "createdAt" >= "2025-08-12" in Index Cond. Look at the number of rows returned by the index scan: rows=2.00.

All rows were read, it is an index full scan, the index condition was not used because it cannot be used on an GIN index. The rows were filtered later when accessing the table, with Rows Removed by Index Recheck in order to return rows=1. If you had ten million rows in the table, and only a few with recent orders, it would have done the same: read all index entries, read all rows from the table, and finally reduce the result.

Reading PostgreSQL execution plans is tricky, but the numbers of explain analyze do not lie. The GIN index was not used. You can also read the documentation:


The standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators:

<@   @>   =   &&


Those are only equality operators:



@> - Contains: arrayA @> arrayB means that Array A contains all elements of Array B.

<@ - Is contained by: arrayA <@ arrayB means that Array A is contained in Array B (all elements of A are in B).

= - Equals: arrayA = arrayB means that arrays are equal (same elements, same order).

&& - Overlap: arrayA && arrayB means that arrays have any elements in common (intersection is not empty).


GIN indexes cannot access ranges of values or return ordered values. They are used to find exact values.

  
  
  3. Expression-based index: wrong result


The author adds another solution: use an expression based index rather than a GIN index:


Alternatively you could just make a more specific (and faster) index just on the creation time.

CREATE INDEX my_index ON my_table ((jsonb_col #>> '{order,createdAt}')::timestamptz);

SELECT * FROM my_table WHERE (jsonb_col #>> '{order,createdAt}')::timestamptz > now() - '2 days'::interval;


I copy/paste the index creation but it fails:




postgres=# CREATE INDEX my_index ON my_table ((jsonb_col #>> '{order,createdAt}')::timestamptz);

ERROR:  syntax error at or near "::"
LINE 1: ...x ON my_table ((jsonb_col #>> '{order,createdAt}')::timestam...






Expression-based indexes must include the expression in parentheses:




postgres=# CREATE INDEX my_index ON my_table ( ( (jsonb_col #>> '{order,createdAt}')::timestamptz) );

ERROR:  functions in index expression must be marked IMMUTABLE






Casting to a timestamp is non-deterministic, as it depends on the timezone, so I keep it as a character string in the index:




postgres=# CREATE INDEX my_index2 ON my_table ((jsonb_col #>> '{order,createdAt}'));

CREATE INDEX






The index is created, let's run the query:




postgres=# SELECT * FROM my_table 
           WHERE (jsonb_col #>> '{order,createdAt}') > '2025-08-12'
;

 id | jsonb_col
----+-----------
(0 rows)






The result is empty because jsonb_col #>> '{order,createdAt}' reads a scalar value at "order.createdAt" and none exist. It is different from the field in the array, which is, when there are two items in the array:



"order.0.createdAt" or jsonb_col #>> '{order,0,createdAt}'


"order.1.createdAt" or jsonb_col #>> '{order,1,createdAt}'



In MongoDB, if you create an index on "order.createdAt" it will transparently index all those possible paths, because MongoDB is a document database that supports flexible schema. However, in PostgreSQL, you need to use different index types for scalars and arrays.

In short, the solution proposed by the author uses the index but returns a wrong result (rows=0.00):




postgres=# explain (analyze, costs off, summary off)
           SELECT * FROM my_table
           WHERE (jsonb_col #>> '{order,createdAt}') > '2025-08-12'
;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Scan using my_index2 on my_table (actual time=0.012..0.012 rows=0.00 loops=1)
   Index Cond: ((jsonb_col #>> '{order,createdAt}'::text[]) > '2025-08-12'::text)
   Index Searches: 1
   Buffers: shared hit=1
(4 rows)






With expression-based index, you can get equality, sort or ranges, but they can be used only when there's no array in the path.


  
  
  4. Immutable function-based index: wrong result


The author proposes an alternative solution, to force the cast to timestamp being considered immutable (which it is not) and index it:


Making a function for that lookup cleans things up nicely.

CREATE FUNCTION order_created(jsonb) RETURNS timestamptz IMMUTABLE STRICT LANGUAGE sql AS $$

SELECT ($1 #>> '{order,createdAt}')::timestamptz;

$$;

Then your index and query look like:

CREATE INDEX my_index ON my_table (order_created(jsonb_col));

SELECT * FROM my_table WHERE order_created(jsonb_col) > now() - '2 days'::interval;


Of course, this is not better as the function still reads the "order" array as if it was a scalar value. In addition, the function-based index materializes a value calculated for a specific timezone, but the function that will use it may be in a different timezone, so the comparison will be wrong. 

To demonstrate that it is not immutable, despite the declaration, I create the function for the first item of the array, adding .0 to the path:




postgres=# CREATE FUNCTION order_created(jsonb) 
           RETURNS timestamptz IMMUTABLE STRICT LANGUAGE sql AS $$
            SELECT ($1 #>> '{order,0,createdAt}')::timestamptz;
           $$;

CREATE FUNCTION






For the same value, the one in the table, the function returns different values:




postgres=# SET TIMEZONE TO 'UTC';
SET
postgres=# select now() - order_created( jsonb_col ) from my_table
;
         ?column?
--------------------------
 -2 days -02:19:56.097701
 362 days 21:40:03.902299
(2 rows)

postgres=# SET TIMEZONE TO 'Pacific/Apia';
SET
postgres=# select now() - order_created( jsonb_col ) from my_table
;
         ?column?
--------------------------
 -1 days -13:19:42.152896
 363 days 10:40:17.847104
(2 rows)






If you ever create an immutable function for an expression that is not immutable, you must add enough tests to be sure it raises an error when used in a different context. Anyway, expression and function-based indexes cannot be used on documents with an array in the path.


  
  
  Conclusion


PostgreSQL provides extended datatypes and indexing options, with JSONB enabling some non-relational data modeling. However, attempting MongoDB-like schema flexibility in PostgreSQL reveals limitations in query result reliability, index efficiency, and type safety.

For data models with arrays, nested fields, and complex types, especially dates, MongoDB’s BSON and indexing ensure correctness and performance. In contrast, PostgreSQL’s JSONB often requires cumbersome path expressions and workarounds, leading to potential errors from operator confusion, array/scalar mismatches, or timezone issues. JSON and JSONB datatypes are useful to store and retreive documents in a PostgreSQL transactions, but it's still a SQL database and not a document database.

Be cautious of what you read on Medium or elsewhere if it lacks demos and documentation. Many people accept information as correct without verification. By fact-checking, you’ll make better decisions and learn from others' mistakes. I've a blog post series about learning from the Myths about MongoDB
                                    
                                    
                                    
                                    
                                        by Franck Pachot
                                    
                                    
                                
                            
                                
                                    
                                        Franck Pachot
                                    
                                    
                                        Does PostgreSQL support as much "schema flexibility" as MongoDB? Not for indexing!
                                    
                                    
                                    
                                        Another day, another myth. Someone on Medium claims PostgreSQL offers as much "schema flexibility" as MongoDB, and tells me that I'm misinformed when I warn about such misconception: you-are-misinformed. I think I'm informed, but I always test and none of his examples work: wrong results, and the index not used. 

Let's go though it. Others' mistakes can be valuable learning material and PostgreSQL with JSON can be tricky.

The article provides the following example of document:




{
  name: "Amit",
  email: "amit@example.com",
  order: [
    { amount: 1200, status: "paid"  , createdAt:  "2025-08-15" },
    { amount:  300, status: "failed", createdAt: "2025-08-16" }
  ]
}






I create a PostgreSQL table to store this document, as well as another with older dates, to validate the query filtering:




postgres=# CREATE TABLE my_table (
    id BIGSERIAL PRIMARY KEY,
    jsonb_col JSONB
);
CREATE TABLE

postgres=# INSERT INTO my_table(jsonb_col) VALUES  ('
{
  "name": "Amit",
  "order": [
    { "amount": 1200, "status": "paid",   "createdAt":  "2025-08-15" },
    { "amount":  300, "status": "failed", "createdAt": "2025-08-16" }
  ]
}
'),('
{
  "name": "Older",
  "order": [
    { "amount": 1200, "status": "paid",   "createdAt":  "2024-08-15" },
    { "amount":  300, "status": "failed", "createdAt": "2024-08-16" }
  ]
}
');

INSERT 0 2






I used a string for the date because JSONB data types are limited to object, array, string, number, boolean, and null, unlike MongoDB's BSON types.


  
  
  1. Incorrect operator


The response on medium starts like this:


You are misinformed.

CREATE INDEX my_index ON my_table USING GIN (jsonb_col);

SELECT * FROM my_table WHERE jsonb_col @? '$.order.createdAt >= "2025-08-12"';


I copy/paste the example and it provides a wrong result as it includes a user with all orders created before "2025-08-12":




postgres=# CREATE INDEX my_index ON my_table USING GIN (jsonb_col);

CREATE INDEX

postgres=# SELECT * FROM my_table 
           WHERE jsonb_col @? '$.order.createdAt >= "2025-08-12"';

 id |                                                                           jsonb_col
----+----------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"name": "Amit", "order": [{"amount": 1200, "status": "paid", "createdAt": "2025-08-15"}, {"amount": 300, "status": "failed", "createdAt": "2025-08-16"}]}
  2 | {"name": "Older", "order": [{"amount": 1200, "status": "paid", "createdAt": "2024-08-15"}, {"amount": 300, "status": "failed", "createdAt": "2024-08-16"}]}
(2 rows)






Did you spot the error? PostgreSQL JSON operators are tricky:



@? is a jsonb_path_exists operator

@@ is a jsonb_path_match operator


The query returned all rows because the path exists. What the author wanted was filtering on the date in this path. Here is the query corrected:




postgres=# SELECT * FROM my_table 
           WHERE jsonb_col @@ '$.order.createdAt >= "2025-08-12"';
 id |                                                                         jsonb_col
----+------------------------------------------------------------------------------------------------------------------------------------------------------------
  1 | {"name": "Amit", "order": [{"amount": 1200, "status": "paid", "createdAt": "2025-08-15"}, {"amount": 300, "status": "failed", "createdAt": "2025-08-16"}]}
(1 row)






The result is the expected one: only one user ordered after "2025-08-12". Let's check if the index was used.


  
  
  2. Index unused


With such small dataset, the query runs a full table scan, but that may be a cost-based decision:




postgres=# explain (analyze, costs off, summary off)
           SELECT * FROM my_table
           WHERE jsonb_col @@ '$.order.createdAt >= "2025-08-12"'
;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Seq Scan on my_table (actual time=0.016..0.018 rows=1.00 loops=1)
   Filter: (jsonb_col @@ '($."order"."createdAt" >= "2025-08-12")'::jsonpath)
   Rows Removed by Filter: 1
   Buffers: shared hit=1
 Planning:
   Buffers: shared hit=1






I disable sequential scan and the index is used:




postgres=# set enable_seqscan to off;
SET

postgres=# explain (analyze, costs off, summary off)
           SELECT * FROM my_table
           WHERE jsonb_col @@ '$.order.createdAt >= "2025-08-12"'
;

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_table (actual time=0.023..0.025 rows=1.00 loops=1)
   Recheck Cond: (jsonb_col @@ '($."order"."createdAt" >= "2025-08-12")'::jsonpath)
   Rows Removed by Index Recheck: 1
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on my_index (actual time=0.009..0.009 rows=2.00 loops=1)
         Index Cond: (jsonb_col @@ '($."order"."createdAt" >= "2025-08-12")'::jsonpath)
         Index Searches: 1
         Buffers: shared hit=3
 Planning:
   Buffers: shared hit=1






However, the index was not used to find the rows, despite seeing "createdAt" >= "2025-08-12" in Index Cond. Look at the number of rows returned by the index scan: rows=2.00.

All rows were read, it is an index full scan, the index condition was not used because it cannot be used on an GIN index. The rows were filtered later when accessing the table, with Rows Removed by Index Recheck in order to return rows=1. If you had ten million rows in the table, and only a few with recent orders, it would have done the same: read all index entries, read all rows from the table, and finally reduce the result.

Reading PostgreSQL execution plans is tricky, but the numbers of explain analyze do not lie. The GIN index was not used. You can also read the documentation:


The standard distribution of PostgreSQL includes a GIN operator class for arrays, which supports indexed queries using these operators:

<@   @>   =   &&


Those are only equality operators:



@> - Contains: arrayA @> arrayB means that Array A contains all elements of Array B.

<@ - Is contained by: arrayA <@ arrayB means that Array A is contained in Array B (all elements of A are in B).

= - Equals: arrayA = arrayB means that arrays are equal (same elements, same order).

&& - Overlap: arrayA && arrayB means that arrays have any elements in common (intersection is not empty).


GIN indexes cannot access ranges of values or return ordered values. They are used to find exact values.

  
  
  3. Expression-based index: wrong result


The author adds another solution: use an expression based index rather than a GIN index:


Alternatively you could just make a more specific (and faster) index just on the creation time.

CREATE INDEX my_index ON my_table ((jsonb_col #>> '{order,createdAt}')::timestamptz);

SELECT * FROM my_table WHERE (jsonb_col #>> '{order,createdAt}')::timestamptz > now() - '2 days'::interval;


I copy/paste the index creation but it fails:




postgres=# CREATE INDEX my_index ON my_table ((jsonb_col #>> '{order,createdAt}')::timestamptz);

ERROR:  syntax error at or near "::"
LINE 1: ...x ON my_table ((jsonb_col #>> '{order,createdAt}')::timestam...






Expression-based indexes must include the expression in parentheses:




postgres=# CREATE INDEX my_index ON my_table ( ( (jsonb_col #>> '{order,createdAt}')::timestamptz) );

ERROR:  functions in index expression must be marked IMMUTABLE






Casting to a timestamp is non-deterministic, as it depends on the timezone, so I keep it as a character string in the index:




postgres=# CREATE INDEX my_index2 ON my_table ((jsonb_col #>> '{order,createdAt}'));

CREATE INDEX






The index is created, let's run the query:




postgres=# SELECT * FROM my_table 
           WHERE (jsonb_col #>> '{order,createdAt}') > '2025-08-12'
;

 id | jsonb_col
----+-----------
(0 rows)






The result is empty because jsonb_col #>> '{order,createdAt}' reads a scalar value at "order.createdAt" and none exist. It is different from the field in the array, which is, when there are two items in the array:



"order.0.createdAt" or jsonb_col #>> '{order,0,createdAt}'


"order.1.createdAt" or jsonb_col #>> '{order,1,createdAt}'



In MongoDB, if you create an index on "order.createdAt" it will transparently index all those possible paths, because MongoDB is a document database that supports flexible schema. However, in PostgreSQL, you need to use different index types for scalars and arrays.

In short, the solution proposed by the author uses the index but returns a wrong result (rows=0.00):




postgres=# explain (analyze, costs off, summary off)
           SELECT * FROM my_table
           WHERE (jsonb_col #>> '{order,createdAt}') > '2025-08-12'
;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Index Scan using my_index2 on my_table (actual time=0.012..0.012 rows=0.00 loops=1)
   Index Cond: ((jsonb_col #>> '{order,createdAt}'::text[]) > '2025-08-12'::text)
   Index Searches: 1
   Buffers: shared hit=1
(4 rows)






With expression-based index, you can get equality, sort or ranges, but they can be used only when there's no array in the path.


  
  
  4. Immutable function-based index: wrong result


The author proposes an alternative solution, to force the cast to timestamp being considered immutable (which it is not) and index it:


Making a function for that lookup cleans things up nicely.

CREATE FUNCTION order_created(jsonb) RETURNS timestamptz IMMUTABLE STRICT LANGUAGE sql AS $$

SELECT ($1 #>> '{order,createdAt}')::timestamptz;

$$;

Then your index and query look like:

CREATE INDEX my_index ON my_table (order_created(jsonb_col));

SELECT * FROM my_table WHERE order_created(jsonb_col) > now() - '2 days'::interval;


Of course, this is not better as the function still reads the "order" array as if it was a scalar value. In addition, the function-based index materializes a value calculated for a specific timezone, but the function that will use it may be in a different timezone, so the comparison will be wrong. 

To demonstrate that it is not immutable, despite the declaration, I create the function for the first item of the array, adding .0 to the path:




postgres=# CREATE FUNCTION order_created(jsonb) 
           RETURNS timestamptz IMMUTABLE STRICT LANGUAGE sql AS $$
            SELECT ($1 #>> '{order,0,createdAt}')::timestamptz;
           $$;

CREATE FUNCTION






For the same value, the one in the table, the function returns different values:




postgres=# SET TIMEZONE TO 'UTC';
SET
postgres=# select now() - order_created( jsonb_col ) from my_table
;
         ?column?
--------------------------
 -2 days -02:19:56.097701
 362 days 21:40:03.902299
(2 rows)

postgres=# SET TIMEZONE TO 'Pacific/Apia';
SET
postgres=# select now() - order_created( jsonb_col ) from my_table
;
         ?column?
--------------------------
 -1 days -13:19:42.152896
 363 days 10:40:17.847104
(2 rows)






If you ever create an immutable function for an expression that is not immutable, you must add enough tests to be sure it raises an error when used in a different context. Anyway, expression and function-based indexes cannot be used on documents with an array in the path.


  
  
  Conclusion


PostgreSQL provides extended datatypes and indexing options, with JSONB enabling some non-relational data modeling. However, attempting MongoDB-like schema flexibility in PostgreSQL reveals limitations in query result reliability, index efficiency, and type safety.

For data models with arrays, nested fields, and complex types, especially dates, MongoDB’s BSON and indexing ensure correctness and performance. In contrast, PostgreSQL’s JSONB often requires cumbersome path expressions and workarounds, leading to potential errors from operator confusion, array/scalar mismatches, or timezone issues. JSON and JSONB datatypes help store and retrieve documents within a PostgreSQL transaction, but it remains a SQL database rather than a document database.


for data-centric, normalized schemas: use PostgreSQL with a relational data model—even though you can add JSON and JSONB columns for occasional flexibility.
For flexible, application-centric document schemas: Use MongoDB. Its design is built for rich, evolving structures, arrays, and nested data—even though you can create references and perform lookups between documents.


Be cautious of what you read on Medium or elsewhere if it lacks demos and documentation. Many people accept information as correct without verification. By fact-checking, you’ll make better decisions and learn from others' mistakes. I've a blog post series about learning from the Myths about MongoDB
                                    
                                    
                                    
                                    
                                        by Franck Pachot
                                    
                                    
                                
                            
                                
                                    
                                        Stack - Convex Blog
                                    
                                    
                                        What is a real-time database? (and why your startup probably needs one)
                                    
                                    
                                    
                                        Real-time databases sync data instantly across users, eliminating polling lag. Clients subscribe to queries that auto-update when data changes, creating faster, more responsive apps essential for collaboration, e-commerce, and gaming.
                                    
                                    
                                    
                                
                            
                                
                                    
                                        Tinybird Engineering Blog
                                    
                                    
                                        How we built our own Claude Code
                                    
                                    
                                    
                                        Or, rather, how we built Tinybird Code, a command line agent inspired by Claude Code, but optimized for complex real-time data engieering problems with ClickHouse.
                                    
                                    
                                    
                                    
                                        by Rafa Moreno
                                    
                                    
                                
                            
                                
                                    
                                        Tinybird Engineering Blog
                                    
                                    
                                        How we built our own Claude Code
                                    
                                    
                                    
                                        Or, rather, how we built Tinybird Code, a command line agent inspired by Claude Code, but optimized for complex real-time data engineering problems with ClickHouse.
                                    
                                    
                                    
                                    
                                        by Rafa Moreno
                                    
                                    
                                
                            
                                
                                    
                                        Supabase Blog
                                    
                                    
                                        Supabase Auth: Build vs. Buy
                                    
                                    
                                    
                                        The reasons why (and why not) to use Supabase Auth instead of building your own.
                                    
                                    
                                    
                                
                            
                        
                    
                        
                            August 11, 2025
                            
                            
                                
                                    
                                        AWS Database Blog - Amazon Aurora
                                    
                                    
                                        How Wiz achieved near-zero downtime for Amazon Aurora PostgreSQL major version upgrades at scale using Aurora Blue/Green Deployments
                                    
                                    
                                    
                                        Wiz, a leading cloud security company, identifies and removes risks across major cloud platforms. Our agent-less scanner processes tens of billions of daily cloud resource metadata entries. This demands high-performance, low-latency processing, making our Amazon Aurora PostgreSQL-Compatible Edition database, serving hundreds of microservices at scale, a critical component of our architecture. In this post, we share how we upgraded our Aurora PostgreSQL database from version 14 to 16 with near-zero downtime using Amazon Aurora Blue/Green Deployments.
                                    
                                    
                                    
                                    
                                        by Sagi Tsofan
                                    
                                    
                                
                            
                                
                                    
                                        Franck Pachot
                                    
                                    
                                        PostgreSQL UUID: Bulk insert with UUIDv7 vs UUIDv4
                                    
                                    
                                    
                                        PostgreSQL 18 (currently in beta) introduces a UUID version 7 generation function that features a time-ordered prefix, reducing the scattering seen in random UUID version 4. I was surprised by the enthusiasm for this. For time-incrementing identifiers, I prefer cached sequences. I think the primary advantage of UUID is the ability to generate it from the application, before hitting the database. I discussed this in a previous post: UUID in PostgreSQL.

There's one case where generating a UUID version 7 in the database is beneficial: bulk loads, as it shares the same uuid datatype as UUIDv4, and both can be present in the same column.

For OLTP, UUIDv4 minimizes hotspots in the B-Tree index during concurrent inserts, and is often generated by the application. However, for bulk loads, UUIDv7 improves cache efficiency by updating fewer blocks. When data comes from another table, and necessitates an new identifier, a built-in function is useful.

  
  
  Bulk Insert throughput


For this demo, I ran a bulk ingest job, inserting 10 million rows, and compared identifiers generated by uuidv7() and uuidv4. The purpose is to illustrate the comparison.

Here is the full script:





-- reset (you are in a lab)
\! pkill -f "postgres: .* COPY"
drop table if exists demo;
drop table if exists progress;

-- create table and insert 10M rows in background
create table demo ( 
    id uuid default uuidv7() primary key,
 -- id uuid default uuidv4() primary key,
 value text
);

\! psql -c "copy demo(value) from program 'base64 -w 100 /dev/urandom | head -10000000'" &

-- monitor size and progress

create table progress as
select now() ts,*,pg_indexes_size('demo')/8192 index_pages, pg_table_size('demo') tablesize , pg_indexes_size('demo') indexsize, pg_current_wal_lsn()
 from pg_stat_progress_copy;
create index on progress(ts asc);

-- record last progress snapshot
insert into progress
select now(),*,pg_indexes_size('demo')/8192 index_pages, pg_table_size('demo') tablesize , pg_indexes_size('demo') indexsize, pg_current_wal_lsn()
 from pg_stat_progress_copy
returning *
\;
-- display rate over last snapshots
select
  pg_size_pretty((bytes_processed - lag(bytes_processed) over w) / extract(epoch from (ts - lag(ts) over w))) || '/s' as "COPY bytes/s",  
  (tuples_processed - lag(tuples_processed) over w) / extract(epoch from (ts - lag(ts) over w))::int as "rows/s",  
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn , lag(pg_current_wal_lsn) over w) / extract(epoch from (ts - lag(ts) over w))) || '/s' as "WAL bytes/s",   
round(pg_wal_lsn_diff(pg_current_wal_lsn , lag(pg_current_wal_lsn) over w) / (bytes_processed - lag(bytes_processed) over w),2) || ' %' as "WAL bytes/COPY bytes", 
 pg_size_pretty(indexsize) "index",
 pg_size_pretty(tablesize) "table",
 current_setting('shared_buffers') shared_buffers,
 ts
from progress
window w as (order by ts asc)
order by ts asc -- limit 30
-- every 10 second
\watch 10







The script first resets the lab environment to avoid leftovers from previous runs. It then creates a demo table, populating it via COPY with pseudo-random values. I run it in the background. The foreground session queries pg_stat_progress_copy every ten seconds and calculates some thoughput statistics. Each snapshot records:


Insert rate (rows per second)
COPY throughput (input bytes per second)
Index size and table size over time
WAL (Write-Ahead Log) usage



  
  
  UUIDv7


Here is the result with id uuid default uuidv7() primary key:




 COPY bytes/s | rows/s | WAL bytes/s | WAL bytes/COPY bytes |  index  |  table  | shared_buffers |              ts
--------------+--------+-------------+----------------------+---------+---------+----------------+-------------------------------
              |        |             |                      | 8232 kB | 38 MB   | 128MB          | 2025-08-11 15:44:56.858214+00
 9030 kB/s    |  91601 | 22 MB/s     | 2.46 x               | 36 MB   | 169 MB  | 128MB          | 2025-08-11 15:45:06.858344+00
 8979 kB/s    |  91036 | 22 MB/s     | 2.50 x               | 63 MB   | 298 MB  | 128MB          | 2025-08-11 15:45:16.858316+00
 819 kB/s     |   8314 | 2046 kB/s   | 2.50 x               | 65 MB   | 310 MB  | 128MB          | 2025-08-11 15:45:26.858289+00
 826 kB/s     |   8318 | 2048 kB/s   | 2.48 x               | 68 MB   | 321 MB  | 128MB          | 2025-08-11 15:45:36.858318+00
 7693 kB/s    |  77984 | 19 MB/s     | 2.50 x               | 91 MB   | 432 MB  | 128MB          | 2025-08-11 15:45:46.858495+00
 7296 kB/s    |  74032 | 18 MB/s     | 2.50 x               | 114 MB  | 537 MB  | 128MB          | 2025-08-11 15:45:56.858338+00
 5184 kB/s    |  52531 | 13 MB/s     | 2.51 x               | 130 MB  | 612 MB  | 128MB          | 2025-08-11 15:46:06.858545+00
 832 kB/s     |   8453 | 2017 kB/s   | 2.42 x               | 132 MB  | 624 MB  | 128MB          | 2025-08-11 15:46:16.858854+00
 1389 kB/s    |  14074 | 3528 kB/s   | 2.54 x               | 136 MB  | 644 MB  | 128MB          | 2025-08-11 15:46:26.858837+00
 7642 kB/s    |  77440 | 19 MB/s     | 2.49 x               | 160 MB  | 754 MB  | 128MB          | 2025-08-11 15:46:36.858441+00
 7776 kB/s    |  78848 | 19 MB/s     | 2.50 x               | 183 MB  | 866 MB  | 128MB          | 2025-08-11 15:46:46.858377+00
 4826 kB/s    |  48928 | 12 MB/s     | 2.50 x               | 198 MB  | 935 MB  | 128MB          | 2025-08-11 15:46:56.858401+00
 1350 kB/s    |  13675 | 3366 kB/s   | 2.49 x               | 202 MB  | 955 MB  | 128MB          | 2025-08-11 15:47:06.858323+00
 1216 kB/s    |  12372 | 3047 kB/s   | 2.51 x               | 206 MB  | 972 MB  | 128MB          | 2025-08-11 15:47:16.858334+00
 7731 kB/s    |  78382 | 19 MB/s     | 2.50 x               | 229 MB  | 1084 MB | 128MB          | 2025-08-11 15:47:26.859178+00
 6996 kB/s    |  70923 | 17 MB/s     | 2.50 x               | 251 MB  | 1185 MB | 128MB          | 2025-08-11 15:47:36.858273+00
 4928 kB/s    |  49925 | 12 MB/s     | 2.55 x               | 266 MB  | 1255 MB | 128MB          | 2025-08-11 15:47:46.858335+00
 934 kB/s     |   9504 | 2166 kB/s   | 2.32 x               | 269 MB  | 1269 MB | 128MB          | 2025-08-11 15:47:56.858356+00
 2259 kB/s    |  22880 | 5806 kB/s   | 2.57 x               | 276 MB  | 1301 MB | 128MB          | 2025-08-11 15:48:06.858839+00
 7744 kB/s    |  78496 | 19 MB/s     | 2.45 x               | 299 MB  | 1413 MB | 128MB          | 2025-08-11 15:48:16.858306+00
(21 rows)






The UUIDv7 ingest run shows consistently high throughput, with brief dips likely due to vacuum, background I/O or checkpoints, followed by quick recovery. Index growth is smooth and compact, while WAL overhead stays stable at 2.50 times the input data. The sequential, time-sortable nature of UUIDv7 enables fast and predictable bulk load performance, completing the 10M row job in just over 3 minutes and maintaining tight disk usage.


  
  
  UUIDv4


Here is the result with id uuid default uuidv4() primary key:




 COPY bytes/s | rows/s | WAL bytes/s | WAL bytes/COPY bytes |  index  |  table  | shared_buffers |              ts
--------------+--------+-------------+----------------------+---------+---------+----------------+-------------------------------
 8698 kB/s    |  88199 | 24 MB/s     | 2.82 x               | 37 MB   | 142 MB  | 128MB          | 2025-08-11 15:37:08.184794+00
 7802 kB/s    |  79130 | 20 MB/s     | 2.66 x               | 71 MB   | 254 MB  | 128MB          | 2025-08-11 15:37:18.184817+00
 5920 kB/s    |  59959 | 15 MB/s     | 2.58 x               | 90 MB   | 339 MB  | 128MB          | 2025-08-11 15:37:28.184804+00
 1248 kB/s    |  12664 | 3364 kB/s   | 2.70 x               | 96 MB   | 357 MB  | 128MB          | 2025-08-11 15:37:38.184869+00
 877 kB/s     |   8924 | 2391 kB/s   | 2.73 x               | 101 MB  | 370 MB  | 128MB          | 2025-08-11 15:37:48.184803+00
 1882 kB/s    |  19083 | 12 MB/s     | 6.60 x               | 112 MB  | 397 MB  | 128MB          | 2025-08-11 15:37:58.184795+00
 4384 kB/s    |  44420 | 12 MB/s     | 2.71 x               | 131 MB  | 460 MB  | 128MB          | 2025-08-11 15:38:08.184808+00
 3821 kB/s    |  38720 | 10006 kB/s  | 2.62 x               | 142 MB  | 515 MB  | 128MB          | 2025-08-11 15:38:18.184814+00
 2778 kB/s    |  28160 | 7144 kB/s   | 2.57 x               | 149 MB  | 555 MB  | 128MB          | 2025-08-11 15:38:28.184822+00
 1971 kB/s    |  19982 | 5152 kB/s   | 2.61 x               | 155 MB  | 583 MB  | 128MB          | 2025-08-11 15:38:38.184814+00
 1427 kB/s    |  14513 | 3750 kB/s   | 2.63 x               | 159 MB  | 604 MB  | 128MB          | 2025-08-11 15:38:48.184858+00
 1152 kB/s    |  11658 | 3069 kB/s   | 2.66 x               | 164 MB  | 621 MB  | 128MB          | 2025-08-11 15:38:58.184933+00
 1312 kB/s    |  13333 | 15 MB/s     | 11.83 x              | 169 MB  | 639 MB  | 128MB          | 2025-08-11 15:39:08.184786+00
 3373 kB/s    |  34144 | 9108 kB/s   | 2.70 x               | 184 MB  | 688 MB  | 128MB          | 2025-08-11 15:39:18.184791+00
 2221 kB/s    |  22528 | 5985 kB/s   | 2.69 x               | 196 MB  | 720 MB  | 128MB          | 2025-08-11 15:39:28.18487+00
 3885 kB/s    |  39424 | 10 MB/s     | 2.72 x               | 217 MB  | 776 MB  | 128MB          | 2025-08-11 15:39:38.184842+00
 3232 kB/s    |  32736 | 8685 kB/s   | 2.69 x               | 233 MB  | 822 MB  | 128MB          | 2025-08-11 15:39:48.18481+00
 832 kB/s     |   8448 | 2233 kB/s   | 2.68 x               | 237 MB  | 834 MB  | 128MB          | 2025-08-11 15:39:58.184806+00
 800 kB/s     |   8096 | 2136 kB/s   | 2.67 x               | 241 MB  | 846 MB  | 128MB          | 2025-08-11 15:40:08.184789+00
 1107 kB/s    |  11264 | 2962 kB/s   | 2.68 x               | 246 MB  | 862 MB  | 128MB          | 2025-08-11 15:40:18.184786+00
 768 kB/s     |   7744 | 15 MB/s     | 19.64 x              | 249 MB  | 873 MB  | 128MB          | 2025-08-11 15:40:28.184804+00
 2912 kB/s    |  29568 | 11 MB/s     | 3.99 x               | 260 MB  | 915 MB  | 128MB          | 2025-08-11 15:40:38.184825+00
 3475 kB/s    |  35200 | 9083 kB/s   | 2.61 x               | 271 MB  | 965 MB  | 128MB          | 2025-08-11 15:40:48.184843+00
 1491 kB/s    |  15136 | 3886 kB/
                                        by Franck Pachot
                                    

                                    
                                

                            
                                
                                    
                                        PlanetScale Blog
                                    
                                    
                                        Announcing Neki
                                    
                                    
                                    
                                        Sharded Postgres by the team behind Vitess
                                    
                                    
                                    
                                
                            
                                
                                    
                                        Alex Miller
                                    
                                    
                                        NULL BITMAP on SIMD
                                    
                                    
                                    
                                        A NULL BITMAP guest post on loop parallelism transformations.
                                    
                                    
                                    
                                
                            
                        

                    
                    
                    
                    
                        
                        
                            ←
                        
                        
                        
                            1
                            
                        
                        
                        
                            
                                2
                            
                        
                            
                                3
                            
                        
                            
                                4
                            
                        
                            
                                5
                            
                        
                            
                                6
                            
                        
                            
                                7
                            
                        
                            
                                8
                            
                        
                            
                                9
                            
                        
                            
                                10
                            
                        
                            
                                11
                            
                        
                        
                        
                            
                                ...
                            
                            71
                        
                        
                        
                        
                            →

entry	key 0 (hex)	field1	key 1 (hex)	field2	rowid	row
0	04 78	'x'	04 6a 6b 6c	'jkl'	00 00 08 41 00 00	1
1	04 78	'x'	04 6d 6e 6f	'mno'	00 00 08 41 00 00	1
2	04 79	'y'	04 6a 6b 6c	'jkl'	00 00 08 41 00 01	2
3	04 79	'y'	04 6d 6e 6f	'mno'	00 00 08 41 00 01	2
4	04 7a	'z'	04 6a 6b 6c	'jkl'	00 00 08 41 00 02	3