May 26, 2026
The Autovacuum Scale Factor Problem at Scale - Know Your Defaults
In PostgreSQL, autovacuum and autoanalyze exist to clean up dead tuples (old versions of updated/deleted rows) and update query planner statistics, respectively. The challenge is running them frequently enough so that query plans and execution do not degrade after data modifications, but not so frequently as to cause excessive I/O overhead.
Databases often maintain a counter of the number of modifications to trigger these background jobs. Oracle Database and MySQL use a stale percentage (the ratio of modifications to total rows) for statistics gathering. SQL Server uses a dynamically decreasing percentage to ensure statistics do not remain stale for too long on massive tables. PostgreSQL uses a hybrid approach: a fixed base threshold combined with a scale factor (a percentage) that grows proportionally with the table size.
This hybrid approach hits the sweet spot for most workloads, but it often requires tuning based on your specific data. The key factor to watch is the amount of static, "cold" data in your tables. Because the scale factor is calculated against the total table size, a large volume of cold data will significantly inflate the threshold. This can delay maintenance on the active working set—the "hot" data actually used by your queries—leaving it vulnerable to stale statistics or bloat.
Here are the default base thresholds:
postgres=# \dconfig *autovacuum*threshold
List of configuration parameters
Parameter | Value
------------------------------------+-----------
autovacuum_analyze_threshold | 50
autovacuum_vacuum_insert_threshold | 1000
autovacuum_vacuum_max_threshold | 100000000
autovacuum_vacuum_threshold | 50
(4 rows)
At first glance, this suggests tables are analyzed when 50 rows are modified, and vacuumed when 50 dead tuples accumulate (from deletes or updates) or 1,000 rows are inserted. But this is only true without the scale factor—10% for statistics, 20% for vacuum:
postgres=# \dconfig *autovacuum*scale_factor
List of configuration parameters
Parameter | Value
---------------------------------------+-------
autovacuum_analyze_scale_factor | 0.1
autovacuum_vacuum_insert_scale_factor | 0.2
autovacuum_vacuum_scale_factor | 0.2
(3 rows)
Because of the scale factor, the actual trigger thresholds increase with the size of the table. For the default settings, the formulas are:
- Analyze when inserts or modifications > 10% table row count plus 50 rows
- Vacuum when dead tuples (from DELETE or UPDATE) > 20% table row count plus 50 rows
- Vacuum when inserts > 20% table row count plus 1000 rows
As these formulas show, a larger table requires a much larger accumulation of changes before maintenance fires. This is perfectly acceptable if data churn is uniformly distributed, as small changes across a massive dataset will not drastically impact query cost estimations.
However, data distribution is rarely uniform and evolves over time (e.g., seasonal sales spikes, market expanding to new countries). Because static data inflates the table row count in the formulas above, your database waits too long to trigger maintenance on the active working set.
This is the core problem with default autovacuum settings at scale: a table with 5 million rows can accumulate half a million stale modifications before the planner statistics are refreshed, and over a million dead tuples before bloat is cleaned up. The larger the table grows, the longer it waits, and the worse the situation becomes:
- Query planner statistics become increasingly stale between analyzes.
- The visibility map is stale for longer and index-only scans become less efficient.
- Dead tuple bloat accumulates more between cleanups, wasting storage and slowing scans.
To demonstrate this, I have run the following script to simulate this kind of activity, constantly inserting 100 rows and then updating them. We delete nothing because we want to keep the history, but queries operate on those recent rows. Think of it like orders being entered, then processed, and remaining stored:
\c
\o tmp.log
-- run autovacuum frequently for the demo
alter system set autovacuum_naptime = '1s';
select pg_reload_conf();
-- create a table
drop table demo;
create table demo (
id bigserial primary key, n int default 0
);
vacuum analyze demo
;
-- show the vacuum and analyze statistics,
-- insert 100 rows and update them
-- run that in a loop every 5 seconds
select relname, n_tup_ins, n_tup_upd , n_mod_since_analyze, n_ins_since_vacuum
, autovacuum_count, last_autovacuum --, vacuum_count, last_vacuum
, autoanalyze_count, last_autoanalyze --, analyze_count, last_analyze
from pg_stat_user_tables where relid='demo'::regclass
\;
insert into demo select from generate_series(1,100)
\;
update demo set n=n+1 where id in (
select id from demo order by id desc limit 100
)
\watch i=5 c=100000
For each iteration, the total number of rows inserted (n_tup_ins) and updated (n_tup_upd), as visible in pg_stat_user_tables, increases by 100. It is the X-axis on this diagram (n_tup):
The Y-axis shows the staleness of statistics (n_mod_since_analyze) and the accumulation of dead tuples (n_mod_since_vacuum) until the auto vacuum/analyze kicks in.
With 5 million rows, the last million inserted rows accumulated dead tuples. That is 20% of the total table, as defined by the default vacuum scale factor, but it likely represents 100% of the data actively read by your queries (for example, if the application processes the last year or less of a 5-year history). Furthermore, the last 500,000 rows have completely stale statistics, the 10% default analyze scale factor, and the past months may not have the same data distribution as the previous years.
Think about the impact this has on the maximum value for an ID sequence or a created_at timestamp. It also completely skews the query planner's understanding of your data distribution (such as querying by country or day of the week). I have seen this cause severe performance issues in the real world: a retail company where shops only open on Sundays during the summer, or a trading platform suddenly processing entirely new market trends. Because the statistics are stale, the planner assumes your new, active data looks exactly like your old, historical data.
As the table grows, the impact of this bloat and staleness compounds, and performance will no longer scale. Eventually, your execution plans will flip—not because the queries changed, but simply because the estimations of the query planner are completely wrong.
For very large tables where the total size increases but the active working set is a small, predictable number of rows, you can effectively disable the scale factor and rely almost entirely on the fixed threshold:
ALTER TABLE demo SET (
autovacuum_analyze_scale_factor = 0.001,
autovacuum_analyze_threshold = 10000,
autovacuum_vacuum_scale_factor = 0.001,
autovacuum_vacuum_threshold = 10000
);
This sets a nearly flat threshold that does not grow with the table size. The right threshold value depends on how many rows your active working set changes per hour and how much staleness you can tolerate. However, you must monitor the consequences of running autovacuum frequently on a growing table to ensure it does not cause localized I/O spikes.
Here is how the same run starts with the new table settings:
Auto analyze never left more than ~10,000 modified rows without refreshing statistics. This threshold grows slightly with the table (at 10 million rows it doubles to 20,000), but remains vastly better than the default. Auto vacuum follows the same pattern for dead tuples, but runs more frequently here because the insert-specific vacuum trigger was left at its default (1,000 rows + 20% scale factor), which only triggers the analyze threshold beyond 45,000 rows.
To address this unbounded growth natively, recent PostgreSQL versions introduced autovacuum_vacuum_max_threshold (with a default of 100 million—which is too high for my example). Rather than letting the scale factor dictate an endlessly growing target, this parameter imposes a hard ceiling on the vacuum threshold calculation. This means that even on a massive 1-billion-row table, autovacuum will forcibly trigger once dead tuples reach the configured maximum, serving as a built-in safety net. You can even adjust this globally via a simple config reload, or set it as a per-table storage parameter to enforce a strict upper limit on staleness without micromanaging scale factors across your entire schema.
Naturally, enforcing stricter thresholds—whether through these new maximum caps or manual table-level tuning—means autovacuum will run more frequently, which demands more background worker capacity. Historically, increasing autovacuum_max_workers to handle this extra load required a full database restart. However, PostgreSQL now thoughtfully splits this architecture: autovacuum_worker_slots reserves the hard upper bound of backend slots at postmaster startup, while autovacuum_max_workers dictates how many of those slots are actively used. This allows you to dynamically scale up your active workers on the fly (ALTER SYSTEM SET autovacuum_max_workers = 8; SELECT pg_reload_conf();) to absorb heavy maintenance workloads without incurring any downtime.
Alternatively, if your table has a clear time-based or categorical boundary between hot and cold data, partitioning is worth considering. Autovacuum operates per partition, so a current_year partition with 100,000 rows will trigger maintenance far sooner than a monolithic 5-million-row table, meaning the default scale factor will naturally behave exactly as intended.
May 25, 2026
Running TidesDB as a MySQL 9.7 storage engine
tidesdb-mysql is an experimental build that was developed to verify how TidesDB, the LSM-tree key/value engine, can work with MySQL 9.7 as a storage engine. The current build is v0.2.4, and it’s an experiment, not a finished product. So you can use it in your tests if you also want to try TidesDB with MySQL … Continued
The post Running TidesDB as a MySQL 9.7 storage engine appeared first on Percona.
Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method
A Crunchy to Percona PostgreSQL migration is more straightforward than most cross-operator moves on Kubernetes, because the Percona PostgreSQL Operator is a hard fork of the Crunchy Data PostgreSQL Operator. Same Patroni HA, same pgBackRest backups, same overall CRD shape. This post walks through the safest of the three migration paths: a standby cluster method … Continued
The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method appeared first on Percona.
May 22, 2026
MySQL 9.7.0 PGO Benchmark Analysis
Overview Servers Tested: MySQL 9.7.0 (PGO-enabled build released by Oracle) MySQL 9.7.0 Non-PGO (built without Profile-Guided Optimization — see BUILD.md) Tier Configurations: Tier 2G: 2GB InnoDB buffer pool Tier 12G: 12GB InnoDB buffer pool Tier 32G: 32GB InnoDB buffer pool View Results 📊 Interactive Reports The benchmark reports are available as interactive HTML pages … Continued
The post MySQL 9.7.0 PGO Benchmark Analysis appeared first on Percona.
May 21, 2026
Amazon Aurora MySQL 8.4 is now generally available
Knowing when new open source database engine versions release on Amazon Aurora and Amazon RDS
Chess invariants
Chess is a lot trickier than it looks. It has so many rules: castling, en passant, pawn promotion, pinning, the discovered check, and the deadlock case of stalemate.
It is a concurrent system, but with a very specific kind of concurrency: interleaved execution. More specifically, taking turns: white, then black, then white.
You know what we do with concurrent systems here? Here we model them, and we distill their invariants.
Here is some setup definitions first.
In a CS or math paper, if you write "Section 2: Model and Problem" well enough, the rest of the paper writes itself. With this setup you can sort of see what the actions will be.
In fact, forget about the actions. Let's look at some invariants.
Invariants
When deriving invariants we ask: what must always be true? I find it useful to split the safety invariants into two camps: state invariants (which are predicates over a single state) and transition invariants (which are predicates over a step). The transition invariants are not as commonly used as state invariants, but they can be very helpful, especially when you are reasoning about transitions of a system.
State invariants
TypeOK says every variable lives in the right space. It is boring, but it has caught more bugs than I would like to admit. OneKingPerColor and BothKingsOnBoard are also sanity checks.
TurnParity is the first interesting one. It ties two state variables together: WHITE moves on even moves, BLACK on odd. The MakeMove action satisfies this TurnParity.
PreviousPlayerNotInCheck restates the rule that "you must end your turn not in check" as "look back: the player who just moved is not in check". NotBothInCheck is a corollary.
Transition invariants
These are predicates over a <<state, next-state>> pair, written with the bracketed form: [][P]_vars. They express how things change with constraints. The notation is simple: x is the value of the variable x in this state, and x' denotes the value in the next-state.
MoveCountStrictlyIncreases and TurnAlternates say each step increments the move count with the colors flipping. If a transition ever messes this up, something has gone wrong.
PieceCountNonIncreasing rules out pieces appearing out of thin air. SingleCapturePerMove tightens this: at most one piece disappears per step. ExactlyTwoSquaresChange is the strongest here. It says precisely two squares change per move, the source (now empty) and the destination (now holding the moving piece).
Haha, yes, this is a model of the basic chess rules only. A useful exercise here is to consider which of these invariants survive when we add castling, pawns, en passant?
ExactlyTwoSquaresChange gets violated when we add castling: four squares change in one move. Similarly, en passant captures a piece not on the destination square, so three squares change.
PieceCountNonIncreasing survives pawn promotion (when a pawn becomes a queen, the count is unchanged).
May 20, 2026
CVE-2026-8053: “We don’t use time-series” is not a mitigation
TL;DR: A bug in MongoDB’s time-series collection code allows a user with the standard readWrite role to corrupt memory within the mongod process. Best case: your database crashes, and you spend the night writing a postmortem. Worst case: an attacker is running their code as mongod, with the same access to your data that the … Continued
The post CVE-2026-8053: “We don’t use time-series” is not a mitigation appeared first on Percona.
Adding Foreign Keys Can Cause Deadlock Trouble
Takeaway: Adding foreign keys require schema modification locks on every table involved. This can cause deadlocks on busy systems. Schema modification locks (SCH-M) are taken by DDL (Data Definition Language) statements like CREATE/ALTER/DROP. Schema stability locks (SCH-S) are taken by DML (Data Manipulation Language) statements like INSERT/UPDATE/DELETE. Those two types of locks are incompatible. Meaning, […]
The post Adding Foreign Keys Can Cause Deadlock Trouble first appeared on Michael J. Swart.Where to Find VillageSQL Next Week
Manually Migrate Hash Slots in a Valkey/Redis Cluster
This article explains how to manually migrate hash slots in Valkey/Redis clusters to expand your deployment with minimal disruption to availability. Note: Valkey 9.0 introduces the Atomic Slot Migration (ASM) feature, which significantly improves migration speed (up to 9.52 times faster) and reliability, while reducing migration complexity. So you should use ASM instead if you … Continued
The post Manually Migrate Hash Slots in a Valkey/Redis Cluster appeared first on Percona.
May 19, 2026
Not All Open Source Is Equal: Choosing a PostgreSQL Operator for Kubernetes in 2026
Choosing an open source PostgreSQL operator for Kubernetes used to be a question about features and community size. In 2026, it has become a question about licensing posture, image distribution, and whether the project you pick today will still be operationally open in three years. This is part 1 of a 3-part series on running … Continued
The post Not All Open Source Is Equal: Choosing a PostgreSQL Operator for Kubernetes in 2026 appeared first on Percona.
Automated JDBC query caching with the AWS Advanced JDBC Wrapper
Keeping pgBackRest Open, Healthy, and Community Driven
When the future of pgBackRest suddenly became uncertain, the PostgreSQL ecosystem reacted quickly. At Percona, we believed the most important question was not: what replaces it? but: how do we ensure pgBackRest remains healthy, sustainable, and open for everyone? That distinction matters. pgBackRest is critical infrastructure used by enterprises around the world to protect some … Continued
The post Keeping pgBackRest Open, Healthy, and Community Driven appeared first on Percona.
Hunting orphan objects: 45% off our ClickHouse storage bill (and a near data-loss incident)
May 18, 2026
Building an AI-powered grid investigation agent with Aurora DSQL and Amazon Bedrock AgentCore
OSTEP Chapter 15: Address Translation
This is part of our series going through OSTEP book chapters. The OSTEP textbook is freely available at Remzi's website if you like to follow along.
This chapter extends the CPU virtualization playbook to memory. It's the same recipe: let the program run directly on the hardware, but interpose at carefully chosen points so the OS retains control. For memory, this happens at every memory access. Every load, store, and instruction fetch gets translated by hardware from a virtual to a physical address.
The mechanism here is called dynamic relocation, dating to the late 1950s. The base register holds the physical address where the process's address space starts, the bounds register holds its size. On every memory reference the hardware adds base and checks against bounds. If the address is out of range, the CPU raises an exception, and the OS kills the offender.
This takes collaboration between hardware and the OS. Hardware provides privileged mode, the base/bounds registers, translation circuitry, exception generation, and privileged instructions to update the registers. The OS provides memory allocation (a free list, in the simplest case), base/bounds management across context switches, and the exception handlers themselves.
Because there is only one base/bounds pair per CPU, the OS must save and restore them in the process control block(PCB). This means that while a process is descheduled, the OS can freely move its address space and then update the saved base. The process wakes up oblivious to this, hence the name dynamic relocation.
The chapter is transparent about what base-and-bounds gets wrong. The relocated process gets a fixed-size slot, but its stack and heap occupy only a small fraction of it, which means that the space in between causes internal fragmentation. With every process getting the same fat slot regardless of actual footprint, the physical memory fills up quickly. The segmentation discussion, coming next chapter, aims to fix this.