a curated list of database news from authoritative sources

May 28, 2026

Percona Operator for PostgreSQL 3.0.0: Hard Fork, OLM Scoping, Major Upgrades

The Percona Operator for PostgreSQL 3.0.0 is here. This is the release that completes the hard fork of the operator from the Crunchy Data PostgreSQL Operator into a fully independent project, with a dedicated upstream.pgv2.percona.com API group for the inherited CRDs, an automatic CRD-rename rollout for existing 2.x installs on upgrade, and a public roadmap … Continued

The post Percona Operator for PostgreSQL 3.0.0: Hard Fork, OLM Scoping, Major Upgrades appeared first on Percona.

Guide your Amazon Aurora MySQL migration with Kiro powers

Today, we announce the Amazon Aurora MySQL power for Kiro. The power connects Kiro’s AI agent to Aurora MySQL and pairs live database access with curated best-practice guidance. You describe what you need in natural language. The agent generates the API calls, SQL, and configuration for you to review and run. In this post, we walk through how the power guides a production migration from Amazon Relational Database Service (Amazon RDS) for MySQL 8.0 to Aurora MySQL through four phases: assessment, replica creation, promotion, and post-cutover validation.

May 27, 2026

Optimize costs in Amazon Aurora

By implementing modern optimization techniques for Aurora, you can achieve additional cost reduction beyond traditional methods alone. This isn’t only about spending less—it’s about building a more efficient, scalable, and resilient database environment. In this post, we show you a structured approach to optimizing Amazon Aurora database costs. It outlines specific strategies, implementation steps, and best practices across different optimization areas.

Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Backup-Restore and PV Reuse

A Percona PostgreSQL operator pgBackRest restore is the simplest way to move off the Crunchy Data PostgreSQL Operator: take a full Crunchy backup, point the new Percona cluster’s dataSource at the existing pgBackRest archive, and the cluster bootstraps from it before its first start. This post covers that path, plus a second option, persistent-volume reuse, for cases … Continued

The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Backup-Restore and PV Reuse appeared first on Percona.

CedarDB: Features of April 2026

This post takes a closer look at some of the most impactful features we have shipped in CedarDB across our recent releases. Whether you have been following along closely or are just catching up, here is a deeper look at the additions we are most excited about.

Set-Returning Functions: Lock-Step Evaluation

v2026-04-20

When handling bulk data transformations or speeding up database inserts, a popular developer trick is to use multiple set-returning functions side-by-side in the SELECT clause to “zip” arrays together into individual rows. To improve performance and scalability, PostgreSQL changed its behavior regarding this in version 10, shifting from generating a massive cross-product to a much more efficient lock-step evaluation (see the PostgreSQL 10 Release Notes).

To guarantee seamless compatibility and keep your queries lightning-fast, CedarDB evaluates multiple set-returning functions in the SELECT list in the exact same lock-step manner.

-- Zipping up arrays in lock-step to quickly generate rows
SELECT UNNEST('{alice, bob, charlie}'::TEXT[]) AS user_name, 
 UNNEST('{active, inactive, active}'::TEXT[]) AS status, 
 UNNEST('{150, 200, 350}'::INT[]) AS score;

Instead of exploding into 27 rows of useless, cross-joined data, CedarDB cleanly steps through the arrays row-by-row to return exactly 3 perfectly paired rows. If you rely on array unnesting to batch your application’s database inserts, you can now enjoy highly scalable performance and behavior that is completely identical to modern PostgreSQL.

Note: This lock-step evaluation applies to other set-returning functions you might already know! Alongside UNNEST, you can use functions like generate_series, json_array_elements, or regexp_matches to efficiently generate and zip your data.

ON UPDATE CASCADE: Keep Your Data in Sync Automatically

v2026-04-20

Changing core identifiers, like a user’s handle or a department code, used to mean manually updating every referencing row to avoid breaking foreign key constraints. To make your life easier, CedarDB now supports ON UPDATE CASCADE. Just add this clause to your foreign key, and CedarDB will automatically propagate updates from the parent table directly to its child tables.

Say you have a platform where posts reference an author’s username. If an author changes their handle, a single UPDATE handles the rest:

CREATE TABLE authors (username TEXT PRIMARY KEY);

CREATE TABLE posts (
 post_id INTEGER PRIMARY KEY,
 created_at TIMESTAMP,
 author_username TEXT REFERENCES authors(username) ON UPDATE CASCADE
);

-- Updating 'alice' to 'alice_smith' automatically updates all her posts!
UPDATE authors SET username = 'alice_smith' WHERE username = 'alice';

Note: To guarantee predictable performance and prevent runaway loops, CedarDB currently limits this to single-level cascades. An auto-updated column cannot act as the trigger for another cascade into a third table. CedarDB validates this at table creation time, so your schema stays consistent and performant.

pg_stat_database and pg_stat_activity: Observability Out of the Box

v2026-04-20

CedarDB now implements pg_stat_database and pg_stat_activity, two of Postgres’ most widely used monitoring tables. This means your existing observability stack (pgAdmin, Datadog, or any custom dashboard that speaks Postgres) just works with CedarDB, no changes required.

pg_stat_activity gives you a live window into what your database is doing right now: active queries, connection states, and client details. Spot long-running idle transactions that are holding locks or causing WAL bloat:

SELECT pid, usename, state, xact_start, now() - xact_start AS idle_duration
FROM pg_stat_activity
WHERE state = 'idle in transaction'
 AND now() - xact_start > interval '5 minutes';

pg_stat_database complements this with per-database aggregate statistics: transactions committed and rolled back, cache hit rates, tuples returned, and more. To check your database health at a glance:

SELECT datname,
 blks_hit::float / nullif(blks_hit + blks_read, 0) AS cache_hit_ratio,
 xact_commit,
 xact_rollback
FROM pg_stat_database
WHERE datname = current_database();

VACUUM (TRUNCATE): Release Disk Space Back to the OS

v2026-04-20

CedarDB’s storage footprint grows as your data grows, but until now, the main storage file never shrank. Dropped indexes, truncated tables, and deleted data all freed up pages internally, but the underlying file stayed the same size on disk. In some cases this could leave you with a much larger file than your actual data warrants, for example after building and then dropping a large index, or after rewriting ALTER operations.

VACUUM (TRUNCATE) addresses the most straightforward case: if there are unused pages at the end of the storage file, CedarDB will truncate the file and return that space to the OS.

-- After dropping a large index or table, reclaim the trailing space
VACUUM (TRUNCATE);

CedarDB also now properly returns pages to the free pool after ALTER TABLE and ALTER INDEX statements, making them eligible for truncation. More comprehensive shrinking behavior, covering space freed in the middle of the file, will follow in future releases.

Note: Only trailing unused pages can be released to the OS today. Freed space in the middle of the file is currently retained for reuse by future writes.

json_agg and json_build_array: JSON Aggregation in SQL

v2026-04-27

Two commonly used JSON aggregation functions are now available in CedarDB: json_agg and json_build_array.

json_agg aggregates rows into a JSON array, which makes it straightforward to produce nested JSON results directly from a query. This is useful for building API responses or feeding data to applications that expect JSON without an extra serialization step:

-- Return each author with a JSON array of their post titles
SELECT a.username,
 json_agg(p.title ORDER BY p.created_at DESC) AS recent_posts
FROM authors a
JOIN posts p ON p.author_username = a.username
GROUP BY a.username;

json_build_array lets you construct a JSON array from explicit values or column references in a single row:

SELECT json_build_array(user_id, username, email) AS user_tuple
FROM users
LIMIT 5;

Together, these two functions cover the most common patterns for producing JSON output directly in SQL, without needing to post-process results in application code.


That’s it for now


Questions or feedback? Join us on Slack or reach out directly.

Do you want to try CedarDB straight away? Sign up for our free Enterprise Trial below. No credit card required.

May 26, 2026

Announcing VillageSQL Server 0.0.4

Explore VillageSQL Server 0.0.4: now featuring VEF v3, custom aggregates, parameter inference, and preview capabilities like background threads.

The Autovacuum Scale Factor Problem at Scale - Know Your Defaults

In PostgreSQL, autovacuum and autoanalyze exist to clean up dead tuples (old versions of updated/deleted rows) and update query planner statistics, respectively. The challenge is running them frequently enough so that query plans and execution do not degrade after data modifications, but not so frequently as to cause excessive I/O overhead.

Databases often maintain a counter of the number of modifications to trigger these background jobs. Oracle Database and MySQL use a stale percentage (the ratio of modifications to total rows) for statistics gathering. SQL Server uses a dynamically decreasing percentage to ensure statistics do not remain stale for too long on massive tables. PostgreSQL uses a hybrid approach: a fixed base threshold combined with a scale factor (a percentage) that grows proportionally with the table size.

This hybrid approach hits the sweet spot for most workloads, but it often requires tuning based on your specific data. The key factor to watch is the amount of static, "cold" data in your tables. Because the scale factor is calculated against the total table size, a large volume of cold data will significantly inflate the threshold. This can delay maintenance on the active working set—the "hot" data actually used by your queries—leaving it vulnerable to stale statistics or bloat.

Here are the default base thresholds:

postgres=# \dconfig *autovacuum*threshold
        List of configuration parameters
             Parameter              |   Value
------------------------------------+-----------
 autovacuum_analyze_threshold       | 50
 autovacuum_vacuum_insert_threshold | 1000
 autovacuum_vacuum_max_threshold    | 100000000
 autovacuum_vacuum_threshold        | 50
(4 rows)

At first glance, this suggests tables are analyzed when 50 rows are modified, and vacuumed when 50 dead tuples accumulate (from deletes or updates) or 1,000 rows are inserted. But this is only true without the scale factor—10% for statistics, 20% for vacuum:

postgres=# \dconfig *autovacuum*scale_factor
       List of configuration parameters
               Parameter               | Value
---------------------------------------+-------
 autovacuum_analyze_scale_factor       | 0.1
 autovacuum_vacuum_insert_scale_factor | 0.2
 autovacuum_vacuum_scale_factor        | 0.2
(3 rows)

Because of the scale factor, the actual trigger thresholds increase with the size of the table. For the default settings, the formulas are:

  • Analyze when inserts or modifications > 10% table row count plus 50 rows
  • Vacuum when dead tuples (from DELETE or UPDATE) > 20% table row count plus 50 rows
  • Vacuum when inserts > 20% table row count plus 1000 rows

As these formulas show, a larger table requires a much larger accumulation of changes before maintenance fires. This is perfectly acceptable if data churn is uniformly distributed, as small changes across a massive dataset will not drastically impact query cost estimations.

However, data distribution is rarely uniform and evolves over time (e.g., seasonal sales spikes, market expanding to new countries). Because static data inflates the table row count in the formulas above, your database waits too long to trigger maintenance on the active working set.

This is the core problem with default autovacuum settings at scale: a table with 5 million rows can accumulate half a million stale modifications before the planner statistics are refreshed, and over a million dead tuples before bloat is cleaned up. The larger the table grows, the longer it waits, and the worse the situation becomes:

  • Query planner statistics become increasingly stale between analyzes.
  • The visibility map is stale for longer and index-only scans become less efficient.
  • Dead tuple bloat accumulates more between cleanups, wasting storage and slowing scans.

To demonstrate this, I have run the following script to simulate this kind of activity, constantly inserting 100 rows and then updating them. We delete nothing because we want to keep the history, but queries operate on those recent rows. Think of it like orders being entered, then processed, and remaining stored:

\c
\o tmp.log

-- run autovacuum frequently for the demo
alter system set autovacuum_naptime = '1s';
select pg_reload_conf();

-- create a table
drop table demo;
create table demo (
 id bigserial primary key, n int default 0
);
vacuum analyze demo
;

-- show the vacuum and analyze statistics,
-- insert 100 rows and update them
-- run that in a loop every 5 seconds
select relname, n_tup_ins, n_tup_upd , n_mod_since_analyze, n_ins_since_vacuum
 , autovacuum_count,  last_autovacuum  --, vacuum_count,  last_vacuum
 , autoanalyze_count, last_autoanalyze --, analyze_count, last_analyze
from pg_stat_user_tables where relid='demo'::regclass
\;
insert into demo select from generate_series(1,100)
\;
update demo set n=n+1 where id in (
select id from demo order by id desc limit 100
)
\watch i=5 c=100000

For each iteration, the total number of rows inserted (n_tup_ins) and updated (n_tup_upd), as visible in pg_stat_user_tables, increases by 100. It is the X-axis on this diagram (n_tup):

The Y-axis shows the staleness of statistics (n_mod_since_analyze) and the accumulation of dead tuples (n_mod_since_vacuum) until the auto vacuum/analyze kicks in.

With 5 million rows, the last million inserted rows accumulated dead tuples. That is 20% of the total table, as defined by the default vacuum scale factor, but it likely represents 100% of the data actively read by your queries (for example, if the application processes the last year or less of a 5-year history). Furthermore, the last 500,000 rows have completely stale statistics, the 10% default analyze scale factor, and the past months may not have the same data distribution as the previous years.

Think about the impact this has on the maximum value for an ID sequence or a created_at timestamp. It also completely skews the query planner's understanding of your data distribution (such as querying by country or day of the week). I have seen this cause severe performance issues in the real world: a retail company where shops only open on Sundays during the summer, or a trading platform suddenly processing entirely new market trends. Because the statistics are stale, the planner assumes your new, active data looks exactly like your old, historical data.

As the table grows, the impact of this bloat and staleness compounds, and performance will no longer scale. Eventually, your execution plans will flip—not because the queries changed, but simply because the estimations of the query planner are completely wrong.

For very large tables where the total size increases but the active working set is a small, predictable number of rows, you can effectively disable the scale factor and rely almost entirely on the fixed threshold:

ALTER TABLE demo SET (  
    autovacuum_analyze_scale_factor = 0.001,  
    autovacuum_analyze_threshold    = 10000,  
    autovacuum_vacuum_scale_factor  = 0.001,  
    autovacuum_vacuum_threshold     = 10000  
);  

This sets a nearly flat threshold that does not grow with the table size. The right threshold value depends on how many rows your active working set changes per hour and how much staleness you can tolerate. However, you must monitor the consequences of running autovacuum frequently on a growing table to ensure it does not cause localized I/O spikes.

Here is how the same run starts with the new table settings:

Auto analyze never left more than ~10,000 modified rows without refreshing statistics. This threshold grows slightly with the table (at 10 million rows it doubles to 20,000), but remains vastly better than the default. Auto vacuum follows the same pattern for dead tuples, but runs more frequently here because the insert-specific vacuum trigger was left at its default (1,000 rows + 20% scale factor), which only triggers the analyze threshold beyond 45,000 rows.

To address this unbounded growth natively, recent PostgreSQL versions introduced autovacuum_vacuum_max_threshold (with a default of 100 million—which is too high for my example). Rather than letting the scale factor dictate an endlessly growing target, this parameter imposes a hard ceiling on the vacuum threshold calculation. This means that even on a massive 1-billion-row table, autovacuum will forcibly trigger once dead tuples reach the configured maximum, serving as a built-in safety net. You can even adjust this globally via a simple config reload, or set it as a per-table storage parameter to enforce a strict upper limit on staleness without micromanaging scale factors across your entire schema.

Naturally, enforcing stricter thresholds—whether through these new maximum caps or manual table-level tuning—means autovacuum will run more frequently, which demands more background worker capacity. Historically, increasing autovacuum_max_workers to handle this extra load required a full database restart. However, PostgreSQL now thoughtfully splits this architecture: autovacuum_worker_slots reserves the hard upper bound of backend slots at postmaster startup, while autovacuum_max_workers dictates how many of those slots are actively used. This allows you to dynamically scale up your active workers on the fly (ALTER SYSTEM SET autovacuum_max_workers = 8; SELECT pg_reload_conf();) to absorb heavy maintenance workloads without incurring any downtime.

Alternatively, if your table has a clear time-based or categorical boundary between hot and cold data, partitioning is worth considering. Autovacuum operates per partition, so a current_year partition with 100,000 rows will trigger maintenance far sooner than a monolithic 5-million-row table, meaning the default scale factor will naturally behave exactly as intended.

May 25, 2026

Running TidesDB as a MySQL 9.7 storage engine

tidesdb-mysql is an experimental build that was developed to verify how TidesDB, the LSM-tree key/value engine, can work with MySQL 9.7 as a storage engine. The current build is v0.2.4, and it’s an experiment, not a finished product. So you can use it in your tests if you also want to try TidesDB with MySQL … Continued

The post Running TidesDB as a MySQL 9.7 storage engine appeared first on Percona.

Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method

A Crunchy to Percona PostgreSQL migration is more straightforward than most cross-operator moves on Kubernetes, because the Percona PostgreSQL Operator is a hard fork of the Crunchy Data PostgreSQL Operator. Same Patroni HA, same pgBackRest backups, same overall CRD shape. This post walks through the safest of the three migration paths: a standby cluster method … Continued

The post Migrate from Crunchy Data PostgreSQL Operator to Percona PostgreSQL Operator: Standby Cluster Method appeared first on Percona.

May 23, 2026

May 22, 2026

MySQL 9.7.0 PGO Benchmark Analysis

Overview Servers Tested: MySQL 9.7.0 (PGO-enabled build released by Oracle) MySQL 9.7.0 Non-PGO (built without Profile-Guided Optimization — see BUILD.md) Tier Configurations: Tier 2G: 2GB InnoDB buffer pool Tier 12G: 12GB InnoDB buffer pool Tier 32G: 32GB InnoDB buffer pool   View Results 📊 Interactive Reports The benchmark reports are available as interactive HTML pages … Continued

The post MySQL 9.7.0 PGO Benchmark Analysis appeared first on Percona.

May 21, 2026

Amazon Aurora MySQL 8.4 is now generally available

Today, we are excited to announce the general availability of Amazon Aurora MySQL 8.4, our latest major version, compatible with community MySQL 8.4.7. This release marks an important milestone for Aurora MySQL customers, introducing a simplified versioning model aligned directly with community MySQL, along with a streamlined patch version experience, and the full set of community MySQL 8.4 enhancements. In this post, we discuss the customer challenges that this release addresses, introduce Aurora MySQL 8.4, walk through the new versioning approach and its benefits for customers, cover the key capabilities delivered in Aurora MySQL 8.4, and show you how to get started.

Chess invariants

Chess is a lot trickier than it looks. It has so many rules: castling, en passant, pawn promotion, pinning, the discovered check, and the deadlock case of stalemate.

It is a concurrent system, but with a very specific kind of concurrency: interleaved execution. More specifically, taking turns: white, then black, then white.

You know what we do with concurrent systems here? Here we model them, and we distill their invariants.


Here is some setup definitions first.

In a CS or math paper, if you write "Section 2: Model and Problem" well enough, the rest of the paper writes itself. With this setup you can sort of see what the actions will be.

In fact, forget about the actions. Let's look at some invariants.


Invariants

When deriving invariants we ask: what must always be true? I find it useful to split the safety invariants into two camps: state invariants (which are predicates over a single state) and transition invariants (which are predicates over a step). The transition invariants are not as commonly used as state invariants, but they can be very helpful, especially when you are reasoning about transitions of a system.


State invariants


TypeOK says every variable lives in the right space. It is boring, but it has caught more bugs than I would like to admit. OneKingPerColor and BothKingsOnBoard are also sanity checks. 

TurnParity is the first interesting one. It ties two state variables together: WHITE moves on even moves, BLACK on odd. The MakeMove action satisfies this TurnParity.

PreviousPlayerNotInCheck restates the rule that "you must end your turn not in check" as "look back: the player who just moved is not in check". NotBothInCheck is a corollary.


Transition invariants

These are predicates over a <<state, next-state>> pair, written with the bracketed form: [][P]_vars. They express how things change with constraints.  The notation is simple: x is the value of the variable x in this state, and x' denotes the value in the next-state.


MoveCountStrictlyIncreases and TurnAlternates say each step increments the move count with the colors flipping. If a transition ever messes this up, something has gone wrong.

PieceCountNonIncreasing rules out pieces appearing out of thin air. SingleCapturePerMove tightens this: at most one piece disappears per step. ExactlyTwoSquaresChange is the strongest here. It says precisely two squares change per move, the source (now empty) and the destination (now holding the moving piece). 

Haha, yes, this is a model of the basic chess rules only. A useful exercise here is to consider which of these invariants survive when we add castling, pawns, en passant?

ExactlyTwoSquaresChange gets violated when we add castling: four squares change in one move. Similarly, en passant captures a piece not on the destination square, so three squares change.

PieceCountNonIncreasing survives pawn promotion (when a pawn becomes a queen, the count is unchanged).