a curated list of database news from authoritative sources

May 21, 2025

The Future of Customer Support is Lies, I Guess

TrueNAS makes file servers, also known as Network Attached Storage (NAS). I bought one of their smaller boxes to house backups, and it’s a great little NAS. For many years TrueNAS sold boxes with a BSD-based operating system, but in the last few years they’ve released a new, Linux-backed operating system called TrueNAS SCALE, also called “Community Edition”. I was considering migrating to TrueNAS SCALE, but the docs started off with this warning:

TrueNAS Enterprise customers should consult with TrueNAS Enterprise Support before attempting migrate.

The process requires an extended maintenance window, requires executing steps in the correct order to prevent issues with system configuration and operation, and additional system review post-migration to catch and correct any configuration issues.

Since I’d actually purchased their hardware, I wrote to their support email. I asked whether the BSD-based OS was reaching end-of-life, and if there was anything special I should know about the upgrade process for my particular machine. The response came from a customer success coordinator, whose name I’ll omit for privacy. It began:

Hi Kyle,

It sounds like you’re at an important crossroads with your TrueNAS Mini 3.0-X+ system as it transitions from the traditional BSD-based OS to the Linux-based SCALE platform, now known as TrueNAS Community Edition.

Understanding the Transition
End-of-Life for Certain Versions: The BSD-based version of TrueNAS you’re currently using (TrueNAS-13.0-U6.7) may be reaching its end-of-life phase, prompting a move to newer systems like TrueNAS SCALE, which is based on Linux and offers more modern features.

Community Edition: The Community edition represents the transition towards a more flexible, open-source platform that leverages FreeBSD jails (similar in concept to containers) for virtualization, providing scalability and extensibility.

The phrase “may be reaching its end-of-life phase” was immediately worrying: how does a TrueNAS support agent not know whether their own products are EOL? But what really shocked me was the blatant lie in the third paragraph: they told me that the new OS was based on Linux, then immediately turned around and said that the new system was based on FreeBSD. FreeBSD jails were TrueNAS’s old approach to virtualization; the new Community Edition uses Docker containers. I could not believe that a human being had actually skimmed this response, let alone written it. I was, I think, talking to a Large Language Model (LLM) wearing the face of a person.

The response went on, coining odd phrases and warning of unspecified risks. “Non-HA (Non-Redundant Array of Independent Disks) TrueNAS hardware can sometimes have specific requirements or limitations when migrating to TrueNAS Community Edition,” they said. RAID and HA are different things in TrueNAS. Moreover, the reason I wrote to support in the first place was to ask what those “specific requirements” were. I was hoping for a support article I’d somehow missed, or a canned response. Instead I got my own question, regurgitated as a warning.

I wrote with followup questions, and got more nonsense in response. When I asked how TrueNAS Community Edition used FreeBSD jails, a different customer success coordinator replied:

FreeBSD jails are similar to containers (like Docker) but built into the FreeBSD system. TrueNAS uses them to run isolated apps and services. For example, you can run Plex, Nextcloud, or other apps safely and separately using jails — all within your TrueNAS system.

They’re lightweight, secure, and don’t need a full virtual machine to run each service. It’s a great way to expand what your system can do without using much extra power or memory.

This is all lies. TrueNAS CORE, the old FreeBSD OS, ran jails. TrueNAS SCALE / Community Edition runs Docker containers.

Quoting their HA/RAID confusion, I asked whether for TrueNAS hardware, the two were actually the same.

Good question — they’re different:

  • RAID protects your data by spreading it across multiple disks, so if one disk fails, your data is still safe.

  • HA (High Availability) means the system has two controllers. If one fails, the other takes over so the system keeps running. This is used in high-end enterprise systems (like our TrueNAS M-series), not in the MINI line.

This is, I believe, accurate, but betrays no sign that the original response conflated the two.

I asked what specific limitations my NAS model had that would cause issues migrating to Community Edition. They replied:

The MINI-3.0-X+ is compatible with Community Edition (TrueNAS CORE), but there are a few things to know:

  • No Enterprise-only features: Some features like enclosure management and support alerts are only available in the Enterprise version.

  • No HA support: Not an issue for the MINI, since it’s a single-controller system anyway.

  • Support level: CORE is community-supported, so it doesn’t come with iXsystems support like the Enterprise version does.

But overall — the MINI runs CORE just fine for most use cases!

This sounds like a reasonable response, but it too is complete nonsense. TrueNAS CORE is a completely different operating system. I asked whether it would run Community Edition, also known as SCALE.

I worked support for many years, helping people with desktops, systems and networking, and occasionally for web services. I’m used to knowledge base articles and canned responses on both ends—and indeed, that’s exactly what I was hoping TrueNAS would provide. All I needed to know was why the documentation warned their customers to contact support before upgrading. Instead I got a pile of vague nonsense and lies.

I get it. Support is often viewed as a cost center, and agents are often working against a brutal, endlessly increasing backlog of tickets. There is pressure at every level to clear those tickets in as little time as possible. Large Language Models create plausible support responses with incredible speed, but their output must still be reviewed by humans. Reviewing large volumes of plausible, syntactically valid text for factual errors is exhausting, time-consuming work, and every few minutes a new ticket arrives.

Companies must do more with less. The industry is changing, and what was once a team of five support engineers becomes three. Pressure builds, and the time allocated to review the LLM’s output becomes shorter and shorter. Five minutes per ticket becomes three. The LLM gets it mostly right. Two minutes. Looks good. Sixty seconds. Click submit. There are one hundred eighty tickets still in queue, and behind every one is a disappointed customer, and behind that is the risk of losing one’s job. Thirty seconds. Click submit. Click submit. The metrics do not measure how many times the system has lied to customers.

The Future of Newspapers is Lies, I Guess

I subscribe to the Chicago Sun-Times, a non-profit newspaper. This week, they sent me a sixty-four page special insert, branded with the Chicago Sun-Times logo, full of LLM nonsense. Yesterday I wrote the following letter to the Chicago Sun-Times. That evening, they published this followup.

Dear Mr. Buscaglia and the Chicago Sun-Times,

The May 18th “Heat Index” summer insert feels remarkably like “AI slop”: text generated by a large language model. The unusually enthusiastic use of adjectives coats the entire insert with an oily sheen, but I’m also concerned that there may be wholesale fabrications in the text.

For instance, “Summer soundtracks” cites Dr. Daniel Levitin, in an interview with Scientific American, as saying “Music activates the brain’s reward centers in ways similar to other pleasurable experiences. When paired with meaningful activities, these songs become powerful memory anchors that can transport us back to those moments for decades”. These phrases, and substrings thereof, return zero matches on Kagi or Google. Scientific American’s archives include a January 22, 2001 article with several quotes from Levitin, but nothing like the purported quotes.

The “Summer food trends” article cites an interview in Bon Appetit, claiming Padma Lakshmi said, “What distinguishes this summer’s global food exploration is how these international flavors are being adapted to local, seasonal ingredients.” Lakshmi is published in an interview with BA, but I can’t find any use of the uncanny marketer-speak “this summer’s global food exploration”.

The same article also cites the National Ice Cream Retailers Association as forecasting “unexpected savory notes” and “hyperlocal ingredient sourcing” as the “fastest-growing segments in premium frozen desserts”. I find it hard to believe these are segments at all—nor do these phrases appear anywhere on the NICRA web site.

The “Summer reading list for 2025” recommends books like “The Rainmakers”, by Percival Everett—a real author, but as far as I can tell, not a real text. Immediately thereafter it offers “Salt and Honey”, by “Delia Owens”—again, a real writer, not a real book. I started reaching out to some of the experts cited in the insert to ask whether their quotes were real, then realized the Verge beat me to it.

These examples are just from a cursory skim; the insert fails my LLM “sniff test” on essentially every page. How did this happen? How do you print and distribute a full-color, sixty-four page special edition without anyone reading it first? Many of the articles have no byline, but those that do are purportedly by Marco Buscaglia. Was he responsible? Or is he too, like so many of the people cited in this nonsense publication, a victim of confabulatory applied statistics? This would be a great project for the Sun-Times newsroom, assuming OpenAI hasn’t eaten them yet.

Doubtfully,

—Kyle Kingsbury

May 20, 2025

Achieve up to 1.7 times higher write throughput and 1.38 times better price performance with Amazon Aurora PostgreSQL on AWS Graviton4-based R8g instances

In this post, we demonstrate how upgrading to Graviton4-based R8g instances with Aurora PostgreSQL-Compatible 17.4 on Aurora I/O-Optimized cluster configuration can deliver significant price-performance gains – delivering up to 1.7 times higher write throughput, 1.38 times better price-performance and reducing commit latency by up to 46% on r8g.16xlarge instances and 38% on r8g.2xlarge instances as compared to Graviton2-based R6g instances.

InnoDB Cluster: Set Up Router and Validate Failover

Setting up an InnoDB Cluster requires three key components: Group Replication, MySQL Shell, and MySQL Router. In the previous post, we covered the process of building a 3-node InnoDB Cluster. In this post, we shift our focus to configuring MySQL Router and validating failover functionality. Environment overview We are using three InnoDB Cluster nodes along […]

May 19, 2025

An Introduction to Dictionary Operations in Data Masking Component

In this blog post, we will describe typical usage scenarios for dictionary operations in the Data Masking Component, which is available in Percona Server for MySQL as an open source alternative to Oracle’s enterprise version. In particular, we will consider the following functions. gen_dictionary() – a function that returns a random term from a dictionary. gen_blocklist() – […]

May 18, 2025

RocksDB 10.2 benchmarks: large & small servers with cached workload

I previously shared benchmark results for RocksDB using the larger server that I have. In this post I share more results from two other large servers and one small server. This is arbitrary but I mean >= 20 cores for large, 10 to 19 cores for medium and less than 10 cores for small.

tl;dr

  • There are several big improvements
  • There might be small regression in fillseq performance, I will revisit this
  • For the block cache hyperclock does much better than LRU on CPU-bound tests

Software

I used RocksDB versions 6.29.5, 7.10.2, 8.11.4, 9.0.1, 9.1.2, 9.2.2, 9.3.2, 9.4.1, 9.5.2, 9.6.2, 9.7.4, 9.8.4, 9.9.3, 9.10.0, 9.11.2, 10.0.1, 10.1.3 and 10.2.1. Everything was compiled with gcc 11.4.0.

For 8.x, 9.x and 10.x the benchmark was repeated using both the LRU block cache (older code) and hyperclock (newer code). That was done by setting the --cache_type argument:

  • lru_cache was used for versions 7.6 and earlier
  • hyper_clock_cache was used for versions 7.7 through 8.5
  • auto_hyper_clock_cache was used for versions 8.5+

Hardware

My servers are described here. From that list I used:

  • The small server is a Ryzen 7 (AMD) CPU with 8 cores and 32G of RAM. It is v5 in the blog post.
  • The first large server has 24 cores with 64G of RAM. It is v6 in the blog post.
  • The other large server has 32 cores and 128G of RAM. It is v7 in the blog post.

Benchmark

Overviews on how I use db_bench are here and here.

Tests were run for a workload with the database cached by RocksDB that I call byrx in my scripts.

The benchmark steps that I focus on are:
  • fillseq
    • load RocksDB in key order with 1 thread
  • revrangeww, fwdrangeww
    • do reverse or forward range queries with a rate-limited writer. Report performance for the range queries
  • readww
    • do point queries with a rate-limited writer. Report performance for the point queries.
  • overwrite
    • overwrite (via Put) random keys using many threads

Relative QPS

Many of the tables below (inlined and via URL) show the relative QPS which is:
    (QPS for my version / QPS for base version)

The base version varies and is listed below. When the relative QPS is > 1.0 then my version is faster than the base version. When it is < 1.0 then there might be a performance regression or there might just be noise 

Small server

The benchmark was run using 1 client thread and 20M KV pairs. Each benchmark step was run for 1800 seconds. Performance summaries are here

For the byrx (cached database) workload with the LRU block cache:

  • see relative and absolute performance summaries, the base version is RocksDB 6.29.5
  • fillseq is ~14% faster in 10.2 vs 6.29 with improvements in 7.x and 9.x
  • revrangeww and fwdrangeww are ~6% slower in 10.2 vs 6.29, I might revisit this
  • readww has similar perf from 6.29 through 10.2
  • overwrite is ~14% faster in10.2 vs 6.29 with most of the improvement in 7.x

For the byrx (cached database) workload with the Hyper Clock block cache

  • see relative and absolute performance summaries, the base version is RocksDB 8.11.4
  • there might be small regression (~3%) or there might be noise in the results
Results from RocksDB 10.2.1 that show relative QPS for 10.2 with the Hyper Clock block cache relative to 10.2 with the LRU block cache. Here the QPS for revrangeww, fwdrangeww and readww are ~10% better with Hyper Clock.

relQPS  test
0.99    fillseq.wal_disabled.v400
1.09    revrangewhilewriting.t1
1.13    fwdrangewhilewriting.t 1
1.15    readwhilewriting.t1
0.96    overwriteandwait.t1.s0

Large server (24 cores)

The benchmark was run using 16 client threads and 40M KV pairs. Each benchmark step was run for 1800 seconds. Performance summaries are here.

For the byrx (cached database) workload with the LRU block cache

  • see relative and absolute performance summaries, the base version is RocksDB 6.29.5
  • fillseq might have a new regression of ~4% in 10.2.1 or that might be noise, I will revisit this
  • revrangeww, fwdrangeww, readww and overwrite are mostly unchanged since 8.x

For the byrx (cached database) workload with the Hyper Clock block cache

  • see relative and absolute performance summaries, the base version is RocksDB 8.11.4
  • fillseq might have a new regression of ~8% in 10.2.1 or that might be noise, I will revisit this
  • revrangeww, fwdrangeww, readww and overwrite are mostly unchanged since 8.x
Results from RocksDB 10.2.1 that show relative QPS for 10.2 with the Hyper Clock block cache relative to 10.2 with the LRU block cache.  Hyper Clock is much better for workloads that have frequent access to the block cache with multiple threads.

relQPS  test
0.97    fillseq.wal_disabled.v400
1.35    revrangewhilewriting.t16
1.43    fwdrangewhilewriting.t16
1.69    readwhilewriting.t16
0.97    overwriteandwait.t16.s0

Large server (32 cores)

The benchmark was run using 24 client threads and 50M KV pairs. Each benchmark step was run for 1800 seconds. Performance summaries are here.

For the byrx (cached database) workload with the LRU block cache

  • see relative and absolute performance summaries, the base version is RocksDB 6.29.5
  • fillseq might have a new regression of ~10% from 7.10 through 10.2, I will revisit this
  • revrangeww, fwdrangeww, readww and overwrite are mostly unchanged since 8.x

For the byrx (cached database) workload with the Hyper Clock block cache

  • see relative and absolute performance summaries, the base version is RocksDB 7.10.2
  • fillseq might have a new regression of ~10% from 7.10 through 10.2, I will revisit this
  • revrangeww, fwdrangeww, readww and overwrite are mostly unchanged since 8.x
Results from RocksDB 10.2.1 that show relative QPS for 10.2 with the Hyper Clock block cache relative to 10.2 with the LRU block cache. Hyper Clock is much better for workloads that have frequent access to the block cache with multiple threads.

relQPS  test
1.02    fillseq.wal_disabled.v400
1.39    revrangewhilewriting.t24
1.55    fwdrangewhilewriting.t24
1.77    readwhilewriting.t24
1.00    overwriteandwait.t24.s0

Chapter 3: Two Phase Locking (Concurrency Control Book)

Chapter 3 presents two-phase locking (2PL). Remember I told you in Chapter 2: Serializability Theory that the discussion is very scheduler-centric? Well, this is a deeper dive into the scheduler, using 2PL as the concurrency control mechanism. The chapter examines the design trade-offs in scheduler behavior, proves the correctness of basic 2PL, dissects how deadlocks arise and are handled, and discusses many variations and implementation issues.

Here are the section headings in Chapter 3.

  1. Aggressive and Conservative Schedulers
  2. Basic Two Phase Locking
  3. Correctness of Basic Two Phase Locking
  4. Deadlocks
  5. Variations of Two Phase Locking
  6. Implementation Issues
  7. The Phantom Problem
  8. Locking Additional Operators
  9. Multigranularity Locking
  10. Distributed Two Phase Locking
  11. Distributed Deadlocks
  12. Locking Performance
  13. Tree Locking

Yep, this is a long chapter: 65 pages. 


3.1 Aggressive and Conservative Schedulers

The chapter opens by asking: how aggressive or conservative should a scheduler be? An aggressive scheduler tries to schedule operations immediately, risking aborts later. A conservative scheduler delays operations to avoid future conflicts, often needlessly.

This choice affects throughput, latency, and starvation. If conflicts are rare, aggressive wins. If conflicts are frequent or costly, conservative may be preferable.

The book says: "A very conservative version of any type of scheduler can usually be built if transactions predeclare their readsets and writesets. This means that the TM begins processing a transaction by giving the scheduler the transaction’s readset and writeset. Predeclaration is more easily and efficiently done if transactions are analyzed by a preprocessor, such as a compiler, before being submitted to the system, rather than being interpreted on the fly."

This reminded me of Chardonnay, OSDI'23, where Bernstein is an author. (Here is my review/summary of it.) Isn't that an interesting follow-up to the above paragraph 36 years later.


3.2 Basic Two Phase Locking

Three rules define 2PL: don’t grant conflicting locks, don’t release until the data manager (DM) acknowledges, and once you release a lock, you’re done acquiring. These encode the "growing" and "shrinking" phases that give 2PL its name.

Here is the notation the book uses: `rl_i[x]` denotes a read lock by transaction Ti on item x, `wl_i[x]` a write lock, and `wu_i[x]` means Ti releases the write lock. These are distinct from the operations themselves (like `r_i[x]` or `w_i[x]`), and the system must track both.

The figure below shows an example of what would go wrong if rule 3 is violated.


3.3 Correctness of Basic Two Phase Locking

In this section, the authors prove that histories produced by 2PL schedulers are serializable. The proof hinges on showing that the serialization graph (SG) is acyclic. They derive this from the locking orderings induced by the scheduler's behavior. The key idea is that if T1 precedes T2 in SG, then T1 must have released a lock before T2 acquired a conflicting one. Since T1 does not acquire any locks after releasing that one, this precludes cycles in the SG.


3.4 Deadlocks

2PL enables serializability, but at the cost of deadlocks. The section opens with classic 2PL-induced deadlocks: mutual lock upgrades. Then it dives into detection via Waits-For Graphs (WFGs), where the idea is to add an edge when one transaction waits on another, and detect deadlocks by finding cycles.

Timeouts versus explicit detection gets compared. Timeouts are crude but simple (deadlocks don’t vanish on their own, so detection can be delayed safely). WFG cycle detection is precise but expensive. The section discusses how often to detect, how to tune timeouts, and how to pick a victim.


3.5 Variations of Two Phase Locking

Conservative 2PL requires that a transaction predeclare all the data it may access and acquire all locks up front before execution begins. If even one lock is unavailable, it waits for all. This avoids deadlocks entirely but sacrifices concurrency and assumes predictability in data access.

Strict 2PL, used in nearly all real systems, releases locks only at commit time. This is more strict than the basic 2PL we discussind in 3.2, where the shrinking could happen gradually. But in return, this guarantees strictness and recoverability. To avoid cascading aborts, write locks are held until after commit; read locks can be released a bit earlier. 


3.6 Implementation Issues

This section gives some guidelines (as of 1987) for implementing a lock manager, handling blocking and aborting transactions, and ensuring atomicity of reads and writes. 


3.7 The Phantom Problem

The phantom problem is set up nicely. Most real databases can dynamically grow and shrink. In addition to Read and Write, they usually support Insert and Delete. Index locking is introduced as a way to lock ranges instead of records. This helps detect conflicts for inserts into a scanned range.

Two locks conflict if there could be a record that satisfies both predicates, i.e., the predicates are mutually satisfiable. This is predicate locking. While more general than index locking, it's more expensive, as it requires the lock manager (LM) to reason about predicates. Predicate locking is rare in practice.

We had discussed index locking and predicate locking in Chapter 7 of the Gray/Reuters book, so I defer to that discussion.


3.8 Locking Additional Operators

Similar to how Chapter 2 extended serializability theory beyond read/write to include increment/decrement, Chapter 3 does the same for locking. I really like this treatment and generalization opportunity!

To add new operation types, follow these rules:

  • Make each new operation atomic w.r.t. all others.
  • Define a lock type for each operation.
  • Build a compatibility matrix where two lock types conflict iff their corresponding operations don’t commute.


3.9 Multigranularity Locking

Multi-granularity locking (MGL) allows each transaction to use granule sizes most appropriate to its behavior. Long transactions can lock coarser items like files; short transactions can lock individual records. In this way, long transactions don’t waste time setting too many locks, and short transactions don’t block others by locking large portions of the database that they don’t access. This balances overhead and concurrency.

We assume the lock instance graph is a tree. A lock on a coarse granule x implicitly locks all its descendants. For instance, a read lock on an area implicitly read-locks its files and records.

To make fine-grained locks compatible with this, we use intention locks. Before locking a record x, the scheduler sets intention locks (`ir`, `iw`) on x’s ancestors (database, area, file). This prevents conflicting coarse-grain locks from being acquired concurrently.


3.10 Distributed Two Phase Locking

Section 3.10 explains how Strict 2PL makes distributed concurrency control simpler. With Basic 2PL, one site might release locks while another is still acquiring them, which breaks the two-phase rule unless schedulers coordinate. But with Strict 2PL, locks are only released after commit, so once a scheduler sees the commit, it knows the transaction is done acquiring locks. No need to talk to other sites. The TM only sends commit after all operations are done, so this setup guarantees that each site’s local two-phase rule lines up with the global one. Strict 2PL gives you correctness without extra coordination.


3.11 Distributed Deadlocks

Distributed deadlocks arise when the global Waits-For Graph (WFG), formed by uniting local WFGs, contains a cycle even if each local WFG is acyclic. Detecting global cycles requires message exchanges among sites. A site may initiate detection when it sees a potential cycle, collecting dependency information and detecting loops.

A deadlock prevention method is to use timestamps. When a transaction T1 is about to wait for T2, the system checks their timestamps. If the wait would lead to a possible deadlock, it aborts one of them.

  • In wait-die, older transactions wait for younger ones; younger ones abort if they encounter older ones.
  • In wound-wait, older transactions preempt younger ones by aborting them; younger transactions are allowed to wait for older ones.

Wait-die favors younger transactions but can cause repeated aborts; wound-wait favors older transactions and avoids starvation.


3.13 Tree Locking

Suppose data items are structured as nodes in a tree, and transactions always follow tree paths. The scheduler can exploit this predictability using Tree Locking (TL), rather than requiring the grow/shrink discipline of 2PL. That is, a transaction can release a lock and later acquire another, so long as it follows the tree order (root to leaf).

TL resembles resource ordering schemes in OSes for deadlock avoidance. It ensures deadlock freedom. Once a transaction locks all children of a node it needs, it can safely release the lock on the parent. This can improve performance by holding locks for shorter durations. But it only makes sense if transactions follow predictable root-to-leaf access patterns. Otherwise, TL can reduce concurrency among transactions.

TL must also be extended if we want recoverability, strictness, or to avoid cascading aborts. For example, writes should be held until commit. In practice, most updates are on leaf nodes, making this workable.

This is a niche idea, but when applicable, it's elegant. I like it! The section closes by discussing locking in B-trees as an application.

May 17, 2025

May 16, 2025

Freedom and Flexibility: Rethinking Your MongoDB Cloud Strategy Beyond Atlas

Let’s be honest: Getting MongoDB up and running quickly in the cloud sounds fantastic. Services like MongoDB Atlas promise easy deployment, automated scaling, and hands-off management on AWS, Azure, and GCP. For teams looking to shed operational burdens, the appeal is tempting. Click a few buttons, get a database… what’s not to like? However, as […]

May 15, 2025

New in Percona Everest 1.6.0: Easily Deploy PMM with a Single Helm Command

Monitoring your databases is critical, especially in Kubernetes environments where visibility and automation are key. That’s why, in Percona Everest 1.6.0, we introduced a highly requested feature: the ability to automatically deploy Percona Monitoring and Management (PMM) as part of the Everest Helm chart using just one flag. This simplifies the process for teams who […]