June 17, 2026
Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB
TL;DR: This advisory covers the two most important high-severity memory-safety vulnerabilities affecting MongoDB Community and our downstream Percona Server for MongoDB – CVE-2026-11933 and CVE-2026-9740. Both will be addressed in a single coordinated patch release, bundled with other recently revealed lower-scored CVE fixes: CVE-2026-9753, CVE-2026-9752, CVE-2026-9751, CVE-2026-9750, CVE-2026-9749, CVE-2026-9748, CVE-2026-9747, CVE-2026-9746, CVE-2026-9743, and CVE-2026-9741. Fixes land … Continued
The post Security advisory: CVE-2026-9740 and CVE-2026-11933 in Percona Server for MongoDB appeared first on Percona.
Fuzzy String Search for MySQL
The insert benchmark on a small server, IO-bound workload : Postgres 19 beta1
This has results for Postgres versions 19 beta1, 18.4 and 17.10 with the Insert Benchmark on a small server using a cached and CPU-bound workload. I also used MySQL 8.4.8 to see where performance was different.
Postgres continues to be boring in a good way. It is hard to find performance regressions.
tl;dr
- create index (the l.x step) is faster in Postgres 19beta1. A Postgres expert told me that the sort algorithm was changed to be more CPU efficient
- the write heavy steps (l.i1, l.i2) are 15% and 9% faster in 19 beta1 vs Postgres 17.10
- the second write heavy step (l.i2) is more than 20X faster in MySQL 8.4.8 vs Postgres thanks to the CPU overhead from get_actual_variable_range. I have written about this before.
I compiled MySQL 8.4.8 from source as well.
The server is an Beelink SER7 with a Ryzen 7 7840HS CPU with 8 cores and AMD SMT disabled, 32G of RAM. Storage is one SSD for the OS and an NVMe SSD for the database using ext-4 with discard enabled. The OS is Ubuntu 24.04.
- l.i0
- insert 800M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
- l.x
- create 3 secondary indexes per table. There is one connection per client.
- l.i1
- use 2 connections/client. One inserts 4M rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
- l.i2
- like l.i1 but each transaction modifies 5 rows (small transactions) and 1M rows are inserted and deleted per table.
- Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
- qr100
- use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
- qp100
- like qr100 except uses point queries on the PK index
- qr500
- like qr100 but the insert and delete rates are increased from 100/s to 500/s
- qp500
- like qp100 but the insert and delete rates are increased from 100/s to 500/s
- qr1000
- like qr100 but the insert and delete rates are increased from 100/s to 1000/s
- qp1000
- like qp100 but the insert and delete rates are increased from 100/s to 1000/s
(QPS for my version / QPS for Postgres 17.10)
The background in the table cells is blue for big improvements and yellow for regressions. There are no regressions here.
- there are large improvements in 19 beta1 (15% and 9%). The CPU overhead is lower in 19 beta1 compared to 17.10 (see cpupq here).
- throughput for the l.i2 step is more than 20X larger for MySQL than for Postgres. From vmstat I see that the CPU overhead (cpupq here) is more than 10X larger with Postgres vs MySQL. From flamegraphs the problem is the CPU overhead in get_actual_variable_range. I have written about this before (see here). The Postgres query planner uses too much CPU skipping old versions to figure out selectivity for a query and there are too many old versions because Postgres doesn't collect them ASAP, vacuum takes time. The flamegraphs are in subdirectories here.
| dbms | l.i0 | l.x | l.i1 | l.i2 | qr100 | qp100 | qr500 | qp500 | qr1000 | qp1000 |
|---|---|---|---|---|---|---|---|---|---|---|
| PG 17.10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| PG 18.4 | 1.01 | 1.03 | 1.00 | 1.00 | 0.98 | 1.00 | 0.99 | 0.99 | 0.99 | 0.99 |
| PG 19 beta1 | 1.01 | 1.15 | 1.05 | 1.09 | 0.97 | 1.01 | 0.96 | 1.01 | 0.97 | 1.00 |
| MySQL 8.4.8 | 0.77 | 0.89 | 0.76 | 21.62 | 0.61 | 1.07 | 0.66 | 0.93 | 0.85 | 0.84 |
June 16, 2026
Our Italy trip
We spent 10 days in Italy in early April. It was a lot of fun.
Back when I was a grad student, I stayed in Pisa for a month for a "summer school on mobile computing" in 2003. Mobile computing was the next "big" thing back then. (In retrospect, the research was misguided about the local/distributed approach to it, as most of mobile computing reconciled to cloud backends as that is often more efficient.) When I was in Pisa, I didn't travel around much, but I now realize I should have spent every weekend traveling. (In my defense, without Internet enabling trip planning/execution, and without GPS on the phone, traveling was very cumbersome, yes, back in 2003.)
Rome
Rome is amazing! The history is incredibly well preserved. Kudos to the Italians.
Rome is very walkable. The buses, on the other hand, were packed solid. My daughters were genuinely surprised by this exotic new form of transportation that does not exist in the US.
The art in the Vatican was exquisite. All the masters, all in one place. But the lines were horrendous.
Tiramisu is truly wonderful. But (please don't stone me for this) I don't like the coffee. I had promised myself I would drink coffee everywhere, and I tried. But the coffee, even the espresso, just didn't feel that good to me. It was mostly lukewarm. I've gotten used to hot coffee, so I quickly gave up my quest to find my coffee high in Italy.
As I broke the news to you in this blog post (a post about Jensen Huang, of all things), I got pickpocketed at the Trevi Fountain.
"Here's the strange part. After the initial shock and the credit-card cancellations, I found myself thinking: this person was unbelievably good at their job. I felt nothing. No bump, no distraction, nothing. The wallet disappeared from the deep front pocket of my jeans like a magic trick. Respect. I remember thinking: if only I were that good at my own work."
Venice
Venice is simply beautiful. The buildings, the canals, the whole scene were beautiful. The old town doesn't even allow motorcycles, so everything is foot-powered. Everything is well preserved.
Lake Como
Lake Como looked out of this world. It was trippy.
But there were lines everywhere. The ferries ran very inefficiently, and 2-hour wait times were the norm. And this wasn't even peak season. Italians are not the best at organizing/running things. In Italy there are lines everywhere: cafes, restaurants, ferries. Everywhere. A good chunk of my time in Italy was spent waiting in lines.
Turin
Turin is bigger and more metropolitan. It is very clean. I didn't get to see much of it, unfortunately.
The night before our early flight back to the States, we stayed near the Milan airport, in a little town called Somma Lombardo. Even that town was beautiful, with buildings dating from the 1500s, all in remarkably well-preserved shape.
So, what happened to Italy?
Italy was very active at the start of the tech revolution. Olivetti could have been the center of gravity for the personal computer revolution, you know. And then there are these magnificent brands from Italy, including Ferrari, Lamborghini, Fiat, Alfa Romeo. So why don't we see any more of that innovation?
Improve query performance with EXPLAIN plans in Amazon Aurora DSQL
Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement
Managing data retention policies is one of the most common operational tasks in MySQL. Applications continuously generate transactional, audit, logging, telemetry, and event data. Over time, these tables can grow to billions of rows, causing: Larger backups Longer recovery times Reduced buffer pool efficiency Slower index maintenance Increased storage costs Degraded query performance To address … Continued
The post Extending pt-archiver with a Partition-Aware Plug-in for Fast Retention Policy Enforcement appeared first on Percona.
The feedback loops behind Kubernetes
June 15, 2026
Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency
Overview When building high-availability MySQL environments, the choice between MySQL Group Replication (GR) and Percona XtraDB Cluster (PXC) often comes down to how they handle the eternal database dilemma: data consistency versus performance. While both provide “synchronous-like” replication, they approach the problem of stale reads—reading data that has been committed on one node but not … Continued
The post Group Replication VS Percona XtraDB Cluster: The True Cost of Consistency appeared first on Percona.
The Failover Brownout: Rethinking High Availability in MySQL Group Replication
It is time to talk again about Flow control and group replication. This time with a special eye on the use of Group Replication in the Kubernetes context. In this article we will dig a bit on how it works and what are the various side effects. The problem Recently I was refining the … Continued
The post The Failover Brownout: Rethinking High Availability in MySQL Group Replication appeared first on Percona.
See what your database is doing right now with Connections
June 13, 2026
Our MongoDB TLA+ Workshop
Shortly after I joined MongoDB research, we ran a TLA+ workshop. It was a two-day ordeal. We had a 1.5 days of instruction on TLA+ and syntax, after which we tried to help people get started with modeling. People liked learning about TLA+ on the first day, but except for a person or two, we didn't get anyone onboarded with TLA+ modeling. It was too much to offload on people and ask them to level up in a short time frame.
Well, two years after that first workshop, on May 11th, 2026, we ran a second edition of this workshop with one very big difference. What is that big difference, you ask?
AI!
AI makes formal methods not only necessary, but also more feasible and easier!
Jesse, Will, and I planned this workshop to be aggressively short. We provide under two hours of instruction, then everyone starts modeling hands-on. We act as TAs and help people as they go. The AI takes care of the syntax problem for TLA+, and also helps with modeling. We just need to teach people enough to read and make sense of a model, and Claude Code would take care of the rest. You have a problem? Ask Claude. Still stuck? Ask us.
Take away: Green Path to 100 true fans
Ok, I will cut to the chase. The workshop was a great success!
My biggest takeaway from it is this: I finally see a green path to 100 true fans. We had an order of magnitude greater conversion rate compared to the first workshop. Almost all of the attendees got started with modeling, and almost all of those modeled a part of an actual work project they are currently on.
As I've written before, with AI tools it is easy to be "technically fearless." People who had been heavy Claude users did a great job prompting and interacting with Claude, and hit the ground running. Some people were still hesitant to take the first step, but when one of us sat down with them, they quickly got over that initial friction and were off and modeling within minutes.
Workshop feedback report
We asked attendees to fill out a feedback survey afterward, and the results were encouraging. Across the board, attendees rated the workshop a 5 out of 5 and strongly agreed that TLA+ is a valuable tool for ensuring system correctness. Several attendees reported modeling real, current work projects during the hands-on session (including a concurrent rewrite of a core subsystem, a data materialization pipeline, and a distributed storage component) and found the exercise revealed liveness issues and design ambiguities they hadn't previously articulated. One attendee described how modeling gave them confidence in a design they would otherwise have had to validate the hard way through implementation.
On the question of AI integration, the feedback was uniformly positive. Multiple respondents noted that Claude Code removed syntax as a barrier entirely, letting them focus on the modeling itself rather than fighting the language. One participant said: "I have tried to write TLA+ pre-LLMs, and it was prohibitively difficult." Likelihood-to-use scores ranged from 3 to 5, with most landing at 4 or 5. The most common suggestions for improvement centered on adoption strategy: how to make TLA+ part of the default design workflow for engineers who aren't yet in the habit, perhaps through a champion system similar to what MongoDB has done with AI tooling.
TLA+ skills file
At the workshop, we brainstormed what a best-practices guide for AI-assisted TLA+ modeling should look like. Here is what we landed on.
No PlusCal! Stick to TLA+ directly. Keep it simple, easy to read, guarded commands. This is something I changed my mind on about three years ago, and I'm now happy to see that with AI assistance, there is no need to detour through PlusCal. As a mnemonic: /\ is like ∩ (intersection), so it means "and"; \/ is like ∪ (union), so it means "or."
Always include a .cfg file with the same name as the spec.
Before writing a single line, interview the user:
- Do you have a design doc?
- Do you have source code?
- Which property do you care about?
- Are you after correctness (safety), liveness, or both?
(This is a bit tongue-in-cheek.) Pretend you are Lamport. Be brutally minimalist, and only add things if you can show they are relevant.
Start from the most abstract model possible. Only go more concrete if you need to, and consult the user when you do.
Initially aim to stay within 50–200 lines. A model that stays small forces you to think clearly about what matters.
Prevent cognitive debt: explain each action with a short comment, so the reader can follow the intent without reverse-engineering it.
Finally, this is what we taught
Motivational overview
I opened with a motivational overview of modeling, drawing on my previous posts. The core message was this: as AI models write more of our code, formal modeling for concurrent and distributed systems is becoming more valuable than ever.
TLA+ is not just a verification tool. It is a design accelerator. It shifts your perspective: forces you to think about the system globally, lets you explore alternative designs quickly and safely before any code exists. It is faster to fix a conceptual model than to whack-a-mole corner cases in implementation. And thanks to LLMs, learning TLA+ syntax is no longer the bottleneck. AI and TLA+ amplify each other.
The workflow I described:
- Design first in TLA+ to produce a verified unambiguous specification.
- Hand the verified spec to an LLM to generate the corresponding code.
- Use TLA+ execution traces to derive model-based conformance tests.
- Use TLA+ invariants to derive property-based tests for the generated code.
I reinforced the message that AI + TLA+ is a massive win for the future of software development.
I also introduced a distinction I find underappreciated: the difference between modularity abstraction and modeling abstraction. Modularity abstraction hides implementation details through encapsulation and layering (the traditional approach taught in CS curricula). Modeling abstraction does something different: it reduces a system to its essential behavioral elements by stripping away everything irrelevant to the properties you want to study. While modularity abstraction aspires to conceal concurrency, modeling abstraction deliberately exposes it to maximize safe parallelism. The key skill in formal methods is knowing how to model abstractly (deciding on what to throw away) which is fundamentally different from the modularity thinking that programmers already have.
Jesse's talk
Jesse gave an introduction to TLA+ grounded in its computational model pitched for the LLM era. He started from the transition model: a TLA+ spec defines a state machine through three pieces: an initial-state predicate describing valid starting configurations, a set of actions (state-transition rules) that define legal moves, and fairness constraints that ensure the system makes progress rather than stalling forever. Each state is a complete assignment of values to the variables, and a behavior is a sequence of such states. Jesse walked through the coffee beans example to show how guards (preconditions) and next-state assignments work together.
He then explained how TLC, the TLA+ model checker, verifies specs by doing breadth-first search through the reachable state graph, exhaustively exploring all possible behaviors that the spec allows. This is what makes TLA+ different from testing: you are not sampling executions, you are covering all of them (up to the model bounds). Temporal operators like □ ("always") and ◇ ("eventually") let you express properties about entire infinite behaviors, not just individual states. The punchline Jesse made was again this: In the LLM era, you don't need to memorize the syntax to do this. You only need to understand what you are asking the model to check.
Will's talk
Will Schultz walked through a realistic data cloning specification, defining key properties formally and showing how to think about correctness in a domain most of the attendees work in daily. He then demoed Spectacle, his tool for interactive spec exploration and visualization — letting attendees see the state space as a navigable graph rather than a wall of output. He also covered the current state of AI-assisted modeling workflows with TLA+, grounding the "Claude can help" claim in actual practice.
Hands-on modeling
Then we did the hands-on portion. Attendees picked a current work project, and we worked with them as TAs. That's where the real learning happened of course!
Closing: TLA+ mental models
I closed the workshop by pointing people at where to go from here, and tried to give them a map of the conceptual territory for leveling up.
Effective TLA+ modeling rests on seven interconnected mental models. The foundation is aggressive abstraction: knowing what to discard rather than what to include. Paired with that is embracing TLA+'s global shared-memory fiction as a powerful reasoning tool rather than fighting it as unrealistic. From there, three refinement techniques: localizing guards to stable predicates (which makes slow models fast), deriving tight invariants that reflect genuine understanding of the system, and splitting atomic actions deliberately to expose real-world concurrency. These help you deepen a model until it tells you something you didn't already know.
Beyond correctness verification, these mental models serve a critical communication function. A well-written TLA+ spec is precise and executable documentation. It keeps distributed teams aligned on design intent in a way that English prose cannot. By exploring protocol alternatives through stepwise refinement and varying atomicity levels, teams can discover optimizations they wouldn't find through implementation alone.
HammerDB tproc-c on a large server, Postgres 14 to 19 beta1
This has results for HammerDB tproc-c on a large server using MySQL and Postgres. I am new to HammerDB and still figuring out how to explain and present results so I will keep this simple and just share graphs without explaining the results.
tl;dr
- There are small regressions in versions 16, 17 and 18
- NOPM usually improves a small amount in 19 beta1 relative to 18
- an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
- 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
- 128G RAM
- Ubuntu 24.04
- prior to version 18 the config file is named conf.diff.cx10a50g_c32r128 (x10a_c32r128) and is here for versions 14, 15, 16 and 17.
- for Postgres 18 and 19 I used conf.diff.cx10b_c32r128 (x10b_c32r128) with io_method=sync to be similar to the config used for versions 14 through 17.
The benchmark was run for several workloads:
- vu=10, wh=1000 - 10 virtual users, 1000 warehouses
- vu=20, wh=1000 - 20 virtual users, 1000 warehouses
- vu=40, wh=1000 - 40 virtual users, 1000 warehouses
- vu=10, wh=2000 - 10 virtual users, 2000 warehouses
- vu=20, wh=2000 - 20 virtual users, 2000 warehouses
- vu=40, wh=2000 - 40 virtual users, 2000 warehouses
- vu=10, wh=4000 - 10 virtual users, 4000 warehouses
- vu=20, wh=4000 - 20 virtual users, 4000 warehouses
- vu=40, wh=4000 - 40 virtual users, 4000 warehouses
- stored procedures are enabled
- partitioning is used because the warehouse count is >= 1000
- a 5 minute rampup is used
- then performance is measured for 60 minutes
(NOPM for some-version / NOPM for base-version)
The base version is Postgres 14.22.
A spreadsheet with absolute and relative values for NOPM is here.
Results: vu=10, wh=1000
Summary:
- There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
Results: vu=20, wh=1000
Summary:
- There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
Results: vu=40, wh=1000
Summary:
- There are small regressions in versions 17 and 18 while NOPM improves is 19 beta1
Results: vu=10, wh=2000
Summary:
- There are small regressions in version 18 while NOPM improves is 19 beta1
Results: vu=20, wh=2000
Summary:
- There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
Results: vu=40, wh=2000
Summary:
- There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
- There is no result for 18.1 because of a bug in my test scripts
Results: vu=10, wh=4000
Summary:
- There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
Results: vu=20, wh=4000
Summary:
- There are small regressions in versions 16, 17 and 18
Results: vu=40, wh=4000
Summary:
- There are small regressions in versions 16, 17 and 18 while NOPM improves is 19 beta1
June 12, 2026
HorizonDB preview: automate a reproducible lab with ARM
Azure HorizonDB is a new database service on Azure. It is PostgreSQL-compatible from a developer perspective, but its storage layer differs to support built-in high availability and scalability.
Since it's currently in preview, APIs might evolve and defaults could change. This is a great opportunity to explore how well PostgreSQL works with your existing applications.
If you're setting up a lab for those tests, you don’t want to navigate the UI each time. You need something that can be recreated quickly and dismantled once finished.
Azure Resource Manager (ARM) is well suited for that.
Login to Azure
I installed the Azure CLI and logged in on my laptop:
az login
This opens the browser, lets me sign in interactively, and lets me select the tenant and subscription.
If I'm already logged in, I can get my subscription ID:
az account show --query id -o tsv
From there, everything can be automated via the CLI.
ARM template for HorizonDB preview
I created a parameterized template, in hdb-template.json:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"apiVersion": { "type": "string", "defaultValue": "2026-01-20-preview" },
"clusterName": { "type": "string" },
"location": { "type": "string" },
"poolName": { "defaultValue": "DefaultPool", "type": "string" },
"administratorLogin": { "type": "string" },
"administratorLoginPassword": { "type": "securestring" },
"tags": { "type": "object", "defaultValue": {} },
"firewallRules": { "type": "array", "defaultValue": [] },
"guid": { "type": "string", "defaultValue": "[newGuid()]" }
},
"resources": [
{
"type": "Microsoft.HorizonDB/clusters",
"apiVersion": "[parameters('apiVersion')]",
"name": "[parameters('clusterName')]",
"location": "[parameters('location')]",
"properties": {
"createMode": "Default",
"version": "17",
"zonePlacementPolicy": "BestEffort",
"replicaCount": 2,
"vCores": 2,
"administratorLogin": "[parameters('administratorLogin')]",
"administratorLoginPassword": "[parameters('administratorLoginPassword')]",
"network": { "publicNetworkAccess": "Enabled" },
"aiModelManagement": 1
},
"tags": "[parameters('tags')]"
},
{
"condition": "[greater(length(parameters('firewallRules')), 0)]",
"type": "Microsoft.Resources/deployments",
"apiVersion": "2019-08-01",
"name": "[concat('firewallRules-', parameters('guid'), '-', copyIndex())]",
"copy": {
"count": "[if(greater(length(parameters('firewallRules')), 0), length(parameters('firewallRules')), 1)]",
"mode": "Serial",
"name": "firewallRulesIterator"
},
"dependsOn": [
"[concat('Microsoft.HorizonDB/clusters/', parameters('clusterName'))]"
],
"properties": {
"mode": "Incremental",
"template": {
"$schema": "http://schema.management.azure.com/schemas/2014-04-01-preview/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"resources": [
{
"type": "Microsoft.HorizonDB/clusters/pools/firewallRules",
"name": "[concat(parameters('clusterName'),'/',parameters('poolName'),'/',parameters('firewallRules')[copyIndex()].name)]",
"apiVersion": "[parameters('apiVersion')]",
"properties": {
"Description": "[parameters('firewallRules')[copyIndex()].name]",
"StartIpAddress": "[parameters('firewallRules')[copyIndex()].startIpAddress]",
"EndIpAddress": "[parameters('firewallRules')[copyIndex()].endIpAddress]"
}
}
]
}
}
}
]
}
This template defines the HorizonDB cluster and its configuration. I generated it the first time I manually created a cluster via the UI portal. I include the firewall rules in a loop to define multiple client IPs.
Keeping firewall rules in the template makes the lab fully reproducible, including network access, without requiring additional CLI commands.
HorizonDB parameters
In hdb-parameters.json I put all my parameters used by the ARM templates:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"clusterName": { "value": "franckpachot-hdb" },
"location": { "value": "swedencentral" },
"administratorLogin": { "value": "franck" },
"tags": { "value": { "Owner": "franckpachot" } },
"firewallRules": {
"value": [
{ "name": "home", "startIpAddress": "192.0.2.10", "endIpAddress": "192.0.2.20" },
{ "name": "shell", "startIpAddress": "52.178.13.135", "endIpAddress": "52.178.13.135" }
]
}
}
}
This file contains all environment-specific values: cluster name, region, allowed IP ranges, and admin user. You must update them. The regions where HorizonDB preview is available are listed here: https://learn.microsoft.com/en-us/azure/horizondb/overview#azure-regions
I do not store the password in the parameters file and pass it at deployment time.
Resource group
I create a resource group that will contain everything:
az group create --name franckpachot-rg --location swedencentral
This makes it easy to identify and delete everything related to the lab.
You will change the location to the region you chose above, and the resource group name will be used in the next command.
Deploy the cluster
A single command deploys the cluster from the template:
az deployment group create --resource-group franckpachot-rg --template-file hdb-template.json --parameters @hdb-parameters.json --parameters administratorLoginPassword=PostgreSQL@Azure
The template defines the infrastructure. The parameters define the environment, and the CLI command combines both.
The deployment takes about 15 minutes.
Retrieve the endpoint and connect
The endpoint is visible in the Azure portal, but it can also be retrieved directly from the CLI:
az resource show --resource-type Microsoft.HorizonDB/clusters --name franckpachot-hdb --resource-group franckpachot-rg --query properties.fullyQualifiedDomainName -o tsv
I set my PostgreSQL connection variables using the credentials used when creating it, and the hostname of the endpoint:
export PGPORT=5432
export PGUSER=franck
export PGPASSWORD=PostgreSQL@Azure
export PGDATABASE=postgres
export PGHOST=
You can then connect with psql or any PostgreSQL client, including the PostgreSQL extension for Visual Studio Code
Let's check the version banner and a few parameters:
$ psql
psql (16.2, server 17.9 (Azure HorizonDB (70f3b593ec7)(release)))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.
postgres=> select version();
version
-----------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.9 (Azure HorizonDB (70f3b593ec7)(release)) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.0, 64-bit
(1 row)
postgres=> \dconfig azure*
List of configuration parameters
Parameter | Value
--------------------------------------------+-------------------
azure.accepted_password_auth_method | md5,scram-sha-256
azure.enable_temp_tablespaces_on_local_ssd | on
azure.extensions |
azure.fabric_mirror_enabled | off
azure.service_principal_id |
azure.service_principal_tenant_id |
(6 rows)
postgres=> \dconfig shared*
List of configuration parameters
Parameter | Value
----------------------------------+------------------------------------------------------------------------------
shared_buffers | 11241MB
shared_memory_size | 11776MB
shared_memory_size_in_huge_pages | 5888
shared_memory_type | mmap
shared_preload_libraries | azure, orion_storage, pg_availability, pg_qs, pgms_stats, pgms_wait_sampling
(5 rows)
postgres=> show effective_cache_size;
effective_cache_size
----------------------
11241MB
(1 row)
postgres=> show full_page_writes;
full_page_writes
------------------
off
(1 row)
An interesting aspect is the memory setup. On a 2 vCore, 16 GiB RAM instance, shared_buffers is allocated about 11 GB, which exceeds typical PostgreSQL recommendations.
Traditionally, PostgreSQL utilizes both its buffer cache and the OS filesystem cache. In HorizonDB with disaggregated storage, the PostgreSQL fork avoids double buffering, and shared_buffers and effective_cache_size are closely aligned.
Additionally, full_page_writes is turned off. This is uncommon in PostgreSQL, but justified here because the storage layer already ensures data consistency without torn pages, and has a huge impact on performance.
To use extensions, create a parameter group and set azure.extensions. Managed services must control PostgreSQL extensibility because extensions can gain elevated system privileges.
Cleanup
The advantage of the resource group is that you can delete everything with a single command:
az group delete --name franckpachot-rg --yes
This removes the cluster and all associated resources.
Final thoughts
HorizonDB is currently in preview, and this is a perfect chance to test it!
With ARM and Azure CLI:
- You can set up a lab in just 15 minutes
- Test how PostgreSQL applications work with it
- Explore various extensions and features
- And easily clean everything up when you're done
This way, you can focus on what really matters: understanding how this PostgreSQL‑compatible platform performs with your applications and workloads. Please share your comments and feedback.
More Fake Mastodon Signups
On 2026-05-30, the Mastodon server I run started getting a flood of implausible signup requests. We think these are likely bots because our server rules and signup form clearly tell users to talk about their interest in gay leather. Most people sign up with something like “42 yo levis enthusiast into piss” or “Hi! I’m Snoofles the pup! I love bones and being a good girl for Madam!”. Instead these bios are either friendly, generic text…
I value online communities that are moderated with care and that encourage constructive interaction. I plan to use my account to share interesting stuff, connect with others who share similar interests, and contribute to discussions following the server’s rules and culture.
Or vaguely machine-flavored weirdness:
Pick the Antarctic path and commit. Custodian inspired by Brendan Fraser, active with Axe Throwing in Indianapolis.
Or text taken directly from existing Fediverse bios, like this one:
Ukraineophile, Save Western Culture and Civilization from russian kleptocracy Trying to make Putin’s bum as irritable as possible.
Often there’s nonsensical usernames (e.g. “ypy_asi”) or email addresses that don’t line up, or we get similar language from multiple email domains. I suspect there’s a few actors here, because the tactics don’t line up exactly, but it’s weird that they all started in the last few weeks.
A few of the email domains involved:
- forge45k.io.vn
- shortweb.live
- vtx.pbhak.dev
- funnyfail.app
- cyberlinkhub.com
- ptncereio.com
- sugarloafstudios.net
- a6nc1sl.jejes.de
- tiksofi.uk
- nowtopzen.com
- nanopools.info
- tmail.lt
- tmail.mx
- phugruphy.com
- sphinx.launders.money
- datamzone.com
- deisgn-ai.work.gd
- haibabon.com
- ai46boh.jejes.de
- initwag.com
- mailba.uk
- compservmail.com
- ff.zero34.qd.je
Many look to be transient domains that no longer resolve. Two (tmail.mx and tmail.lt) belong to a service which provides temporary email addresses to bypass email verification. For pbhak.dev, I think it might be someone’s personal machine which… maybe got compromised? Others, like jejes.de, look like abandoned wordpress installs; I assume they’re compromised too.
I’m seeing a fair number of European IPs for browsers–looks like VPNs and cloud providers.
Last year most of our spammers were from a single company. I’ve approved one of these inauthentic accounts, and am going to keep an eye on it to see what happens.
Write-heavy sysbench tests, a large server, modern Postgres and MySQL
This has results for modern Postgres and MySQL using write-heavy tests from sysbench and a large server. I think there are regressions in Postgres that arrive in some of versions 16, 17, 18 and 19 beta1 but I am far from certain and this blog post is just another step in my journey to figure that out.
tl;dr
- Postgres suffers a lot from throughput variation while MySQL+InnoDB does not
- InnoDB gets much better average throughput on 6 of 10 tests, similar throughput one one and then Postgres does better on 3 of 10 tests
- For tests from which I provided vmstat and iostat results, Postgres does more write IO per operation. In some cases InnoDB uses more CPU, in other cases it does not.
- Postgres from source for versions 15.17, 16.13, 17.9 and 18.3.
- MySQL from source for version 8.4.7
- an ax162s with an AMD EPYC 9454P 48-Core Processor with SMT disabled
- 2 Intel D7-P5520 NVMe storage devices with RAID 1 (3.8T each) using ext4
- 128G RAM
- Ubuntu 24.04
- the config file is named conf.diff.cx10a_c32r128 (x10a_c32r128) and is here for versions 15, 16 and 17.
- for Postgres 18 I used conf.diff.cx10b_c32r128 (x10b_c32r128) which is as close as possible to the Postgres 17 config and uses io_method=sync
- I only run the write-heavy tests (to save time)
- The tables are larger than memory and cannot be cached
- Each test (microbenchmark) is run for 2 hours when I normally run each for 15 minutes
- After each test a vacuum is done
(QPS for some version) / (QPS for Postgres 15.17)
Summary:
- throughput for Postgres drops after version 15.17. I don't know yet whether this is a regression.
- throughput for InnoDB is much better than Postgres in 6 of 10 tests, similar in one test, and much worse in 3 of 10 tests.
- Postgres suffers from too much variance
- Average throughput is ~1.55X larger for InnoDB than for Postgres
- Per operation, Postgres does ~1.20X more write IO (KB written) to storage than InnoDB
- Per operation, InnoDB uses more CPU and does more context switches. While autovacuum was enabled and was likely running during the test, my measurements exclude the manual vacuum done at the end of each test.
- Postgres suffers from too much variance
- Average throughput is ~1.09X larger for InnoDB than for Postgres
- Per operation, Postgres does ~1.30X more write IO (KB written) to storage than InnoDB
- Per operation, InnoDB uses more CPU and does more context switches. While autovacuum was enabled and was likely running during the test, my measurements exclude the manual vacuum done at the end of each test.
- Postgres suffers from too much variance
- Average throughput is ~2.06X larger for Postgres than for InnoDB
- Per operation, Postgres does ~1.67X more write IO (KB written) to storage than InnoDB
- Per operation, Postgres uses more CPU and does more context switches. This is the opposite of what happens above for update-index and update-zipf.
June 11, 2026
The insert benchmark on a small server, cached workload : Postgres 19 beta1
This has results for Postgres versions 19 beta1, 18.4 and 17.10 with the Insert Benchmark on a small server using a cached and CPU-bound workload.
Postgres continues to be boring in a good way. It is hard to find performance regressions.
tl;dr
- I don't see regressions here in 19 beta1
- I see some improvements here in 19 beta1
- index create (l.x) is faster but the step is short-running so I don't assume much from this
- the write-heavy steps (l.i1, l.i2) are faster and CPU overhead is lower in 19 beta1, I hope to explain why the CPU overhead is lower, but that waits for another day.
The server is an Beelink SER7 with a Ryzen 7 7840HS CPU with 8 cores and AMD SMT disabled, 32G of RAM. Storage is one SSD for the OS and an NVMe SSD for the database using ext-4 with discard enabled. The OS is Ubuntu 24.04.
- l.i0
- insert 30M rows per table in PK order. The table has a PK index but no secondary indexes. There is one connection per client.
- l.x
- create 3 secondary indexes per table. There is one connection per client.
- l.i1
- use 2 connections/client. One inserts 40M rows per table and the other does deletes at the same rate as the inserts. Each transaction modifies 50 rows (big transactions). This step is run for a fixed number of inserts, so the run time varies depending on the insert rate.
- l.i2
- like l.i1 but each transaction modifies 5 rows (small transactions) and 10M rows are inserted and deleted per table.
- Wait for S seconds after the step finishes to reduce variance during the read-write benchmark steps that follow. The value of S is a function of the table size.
- qr100
- use 3 connections/client. One does range queries and performance is reported for this. The second does does 100 inserts/s and the third does 100 deletes/s. The second and third are less busy than the first. The range queries use covering secondary indexes. If the target insert rate is not sustained then that is considered to be an SLA failure. If the target insert rate is sustained then the step does the same number of inserts for all systems tested. This step is frequently not IO-bound for the IO-bound workload.
- qp100
- like qr100 except uses point queries on the PK index
- qr500
- like qr100 but the insert and delete rates are increased from 100/s to 500/s
- qp500
- like qp100 but the insert and delete rates are increased from 100/s to 500/s
- qr1000
- like qr100 but the insert and delete rates are increased from 100/s to 1000/s
- qp1000
- like qp100 but the insert and delete rates are increased from 100/s to 1000/s
(QPS for my version / QPS for Postgres 17.10)
The background in the table cells is blue for big improvements and yellow for regressions. There are no regressions here.
- there are small improvements in 18.4
- there are large improvements in 19 beta1. The CPU overhead is lower in 19 beta1 compared to 17.10, ~15% lower for l.i1 and ~10% lower for l.i2. Hopefully I can explain why. But the lower CPU overhead might explain the improved performance in 19 beta1. Some of the metrics from iostat and vmstat are here.
| dbms | l.i0 | l.x | l.i1 | l.i2 | qr100 | qp100 | qr500 | qp500 | qr1000 | qp1000 |
|---|---|---|---|---|---|---|---|---|---|---|
| 17.10 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 18.4 | 1.00 | 1.03 | 1.02 | 1.07 | 0.99 | 1.00 | 1.00 | 1.00 | 1.01 | 1.00 |
| 19 beta1 | 1.01 | 1.16 | 1.23 | 1.22 | 0.99 | 1.00 | 0.99 | 0.99 | 1.00 | 1.00 |
Village News: MySQL News + Events (11 June 2026)
The only scalable delete in Postgres is DROP TABLE
Prompts Don't Enforce Business Rules. Databases Do.
CedarDB co-founder Lukas Vogel on why AI agents are forcing a long-overdue conversation about where enforcement actually lives. AI agents can now write and execute SQL. That changes everything about how you think about data security.
For decades, permission checks lived in the application layer. That worked fine when applications were deterministic. But agents generate queries dynamically, decide what actions to take, and move fast. If enforcement lives above the database, eventually something gets through that shouldn’t.
The answer isn’t a better prompt. It’s moving enforcement back to where it belongs: the database itself.
Lukas sat down with Shane Snider at Data Center Knowledge to talk through why this problem is real, why it’s not actually new, and what it means for teams building agentic applications today.
“Prompts don’t enforce business rules. Databases do.”
“If you want to give an agent access to the bottom of the stack, permissions have to live at the bottom of the stack.”
CedarDB was built from scratch for exactly this. No legacy assumptions, no workarounds. Row-level security and role-based access control enforced at the database layer, transactions and analytics in one engine, and a Postgres interface your agents already know how to use.
Read the full interview at Data Center Knowledge.
Want to see how CedarDB handles enforcement for agentic workloads? Book a demo.
June 10, 2026
Automate Amazon Aurora PostgreSQL major or minor version upgrade using AWS Systems Manager and Amazon EC2
How to migrate from Oracle to Amazon Aurora PostgreSQL using AWS CloudFormation (Part 1)
Postgres 19 beta1 vs sysbench on a small server
This has results from sysbench on a small server with Postgres 19 beta1, 18.4 and 17.10. Sysbench is run with low concurrency (1 thread) and a cached database. The purpose is to search for changes in performance, often from new CPU overheads.
tl;dr
- 19beta1, 18.4 and 17.10 have mostly similar performance
- There might be small regressions (about 2%) from 17.10 to 19beta1 but my tests are not good at spotting that.
- 19beta1 is much faster on one test (read-only-count) thanks to a new query plan
The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 900 seconds.
The benchmark is run with 1 client, 1 table and 50M rows. The purpose is to search for CPU regressions.
I provide charts below with relative QPS (rQPS). The relative QPS is the following:
(QPS for some version) / (QPS for base version)
I describe performance changes (changes to relative QPS) in terms of basis points. Performance changes by one basis point when the difference in rQPS is 0.01. When rQPS decreases from 0.95 to 0.85 then it changed by 10 basis points.
Results: point queries
- 19beta1 is better than 17.10 by ~3 basis points for most tests
- 19beta1 is slightly better than 18.4
- 19beta1 is worse than 17.10 by ~3 basis points in 4 of 5 tests
- 19beta1 is better than 17.10 by 5 basis points in the scan test
- 19beta1 and 18.4 are similar except for the scan test where 19beta1 did better
- 19beta1 is worse than than 17.10 on two tests
- 19beta1 is better than 17.10 on five tests
- 19beta1 and 17.10 are the same on one test
- 19beta1 is ~2.5X better than 17.10 on the read-only-count test
- 19beta1 and 18.4 have similar results except for the read-only-count test
Query plans for the read-only-count test ...
For 17.10
For 19beta1
- 19beta1 is worse than 17.10 by 2 to 5 basis points
- 18.4 is worse than 17.10 by 2 to 3 basis points