a curated list of database news from authoritative sources

April 23, 2026

Innovation From Every Corner: Inside Percona’s Build with AI Competition

At Percona, we’re passionate about open source database software, helping organizations of all sizes run, manage, and optimize their databases with the freedom and transparency that open source provides. That spirit of openness doesn’t stop at our products, it runs through everything we do, including how we encourage our own people to innovate. We recently … Continued

The post Innovation From Every Corner: Inside Percona’s Build with AI Competition appeared first on Percona.

Scaling Your Cache: A Step-by-Step Guide to Setting Up Valkey Replication

In the recent open-source data landscape, Valkey has emerged as a prominent player. Born as a Linux Foundation-backed, fully open-source fork of Redis (following Redis’s recent licensing changes), Valkey serves as a high-performance, in-memory key-value data store. Whether Valkey is deployed as a primary database, an ephemeral cache, or a rapid message broker, a single … Continued

The post Scaling Your Cache: A Step-by-Step Guide to Setting Up Valkey Replication appeared first on Percona.

April 22, 2026

Percona Live 2026 is Back in the Bay Area — Here’s Why You Don’t Want to Miss It

We’re thrilled to welcome the open source database community back in person for Percona Live 2026, taking place May 27–29 in the Bay Area. After the energy of past events, there’s nothing like being together again — swapping war stories over coffee, sketching architectures on napkins, and learning from the people building and running databases … Continued

The post Percona Live 2026 is Back in the Bay Area — Here’s Why You Don’t Want to Miss It appeared first on Percona.

Supabase is now ISO 27001 certified

Supabase is certified to ISO/IEC 27001:2022. The certificate covers our information security management system across the entire platform.

April 21, 2026

Impacts of updates in open-source databases

We recently looked at how various open-source database engines maintain their secondary indexes (in a previous analysis) and found significant differences.  The maintenance of indexes is not the only aspect where storage engines differ, another significant difference is how they handle simple row updates.  These updates highlight how these open-source databases organize data and manage … Continued

The post Impacts of updates in open-source databases appeared first on Percona.

Ring’s Billion-Scale Semantic Video Search with Amazon RDS for PostgreSQL and pgvector

In this post, we share Ring’s billion-scale semantic video search on Amazon RDS for PostgreSQL with pgvector architectural decisions vs alternatives, cost-performance-scale challenges, key lessons, and future directions. The Ring team designed for global scale their vector search architecture to support millions of customers with vector embeddings, the key technology for numerical representations of visual content generated by an AI model. By converting video frames into vectors-arrays of numbers that capture what’s happening (visual content) in each frame – Ring can store these representations in a database and search them using similarity search. When you type “package delivery,” the system converts that text into a vector and finds the video frames whose vectors are most similar-delivering relevant results in under 2 seconds.

Percona Operator for MySQL 1.1.0: PITR, Incremental Backups, and Compression

The latest release of the Percona Operator for MySQL, 1.1.0, is here. It brings point-in-time recovery, incremental backups, zstd backup compression, configurable asynchronous replication retries, and a set of stability fixes. This post walks through the highlights and how they help your MySQL deployments on Kubernetes.   Percona Operator for MySQL 1.1.0 Running stateful databases … Continued

The post Percona Operator for MySQL 1.1.0: PITR, Incremental Backups, and Compression appeared first on Percona.

PostgreSQL Performance: Is Your Query Slow or Just Long-Running?

Introduction: Recently I was having a conversation with a DB Enthusiast, and he mentioned that when he was a fresher, he tuned an ETL/reporting query that was running for 8-10 hours via a nightly job by 1/3rd. He went to his manager, saying that he reduced the query execution time, thinking that the manager would … Continued

The post PostgreSQL Performance: Is Your Query Slow or Just Long-Running? appeared first on Percona.

Approaches to tenancy in Postgres

There are many ways to slice a Postgres database for multi-tenant applications. Let's look at the three most common approaches and the trade-offs.

April 20, 2026

Aurora Serverless: Faster performance, enhanced scaling, and still scales down to zero

Amazon Aurora Serverless is an on-demand, auto scaling configuration for Aurora that scales up to support your most demanding workloads and down to zero when you don’t need it. The latest improvements deliver up to 30% better performance and enhanced scaling that understands your workload. These enhancements are available at no additional cost for a better price-performance ratio. In this post, we’ll share recent performance and scaling improvements with benchmark results, showing how Aurora Serverless can now scale up to 45.0% faster with a 32.9% faster workload completion time.

Deploying Cross-Site Replication in Percona Operator for MySQL (PXC)

Having a separate DR cluster for production databases is a modern day requirement or necessity for tech and other related businesses that rely heavily on their database systems. Setting up such a [DC -> DR] topology for Percona XtraDB Cluster (PXC), which is a virtually- synchronous cluster, can be a bit challenging in a complex … Continued

The post Deploying Cross-Site Replication in Percona Operator for MySQL (PXC) appeared first on Percona.

April 18, 2026

Mutable BSON and Oracle OSON

AskTom Live is a great source of information from Oracle developer advocates and product managers, but I recently came across a clickbait marketing title ("Not All Binary Protocols Are Created Equal: The Science Behind OSON's 529x Performance Advantage") which compares apples to oranges, and it's an opportunity to explain what BSON is, the binary JSON format used by MongoDB.

TL;DR: If you want to compare with OSON, the Oracle Database datatype for JSON, you should compare the Mutable BSON Document which is the structure that MongoDB uses to access documents, reading and updating individual fields. Raw BSON is closer to protobuf: a compact serialization format for disk or network transfer, with access metadata removed and no blocks or headers.

I've left the following comment to the YouTube video but it seems that it is not publicly visible, so here it is.

Let me explain how Oracle Database and MongoDB handle disk-based data access, and you will understand the different design purposes of OSON and BSON, and why you are not testing the right thing to compare them.

Oracle Database, like many traditional databases, uses the same format on disk (blocks) and in memory (buffers), and must store all transient metadata that helps access it in memory on persistent storage. This applies to table blocks (which contain a table directory, a row directory, and even lock flags, ITLs, that need to be cleaned up later), and the same idea was used for OSON (header, dictionary, sorted field IDs, offset arrays). Think of it as a mini database with its catalog, like the Oracle database has its dictionary and segment headers, which map physical extents and blocks. Then accessing the on-disk OSON structure directly makes sense — it's designed to be used through buffers that match the disk blocks.

But MongoDB with WiredTiger uses a smarter cache where the in-memory structures are optimized for RAM: adding pointers instead of disk offsets, building an Elements Vector for O(1) field access, and adding skiplists to navigate fields, all when data is loaded into the database cache. So there are two formats: the mutable BSON that the database actually works on in memory for query processing and updates, and the on-disk raw BSON that, on purpose, strips any unnecessary metadata and compresses it, to maximize the OS filesystem cache usage, and fits to the major advantage of MongoDB for documents: read/write a document in a single I/O.

The raw BSON is a serialization format for disk and network, not to be accessed partially, because MongoDB has a powerful mutable BSON format in memory with O(1) access through its Elements Vector indexing. The O(n) sequential scan, the "no partial updates" limitation, and the field position penalties you describe — those are properties of the serialization format, not how MongoDB actually processes queries. And by definition, the serialization format is read sequentially, even though BSON can jump between fields. Don't do that except when you need a full document. Use the MongoDB server and drivers to access BSON, and learn how to use it correctly.

With this understanding, you can see that the "529x performance" clickbait title comes from a mistake: you used raw BSON to access individual fields, bypassing everything MongoDB does when serving a query. It would be like using BBED to query Oracle Datafiles without going through the instance — no buffer cache, no row directory navigation, no dictionary lookups — and then concluding that Oracle's storage format is slow.

Notably, the original OSON VLDB paper (Liu et al., 2020) by Zhen Hua Liu doesn't make the claims this video does. That paper honestly compares OSON against Oracle's own JSON text storage, not against MongoDB's query processing. It compares encoding sizes with BSON, which is legitimate for a serialization format comparison (though it overlooks that BSON in MongoDB is compressed on disk and over the network). The paper authors understood they were comparing serialization formats and storage approaches within Oracle, not benchmarking MongoDB's actual runtime performance. I believe OSON is the optimal format for Oracle because it was integrated into the existing instance, cache, and securefiles, which were created a long time ago. Conversely, BSON is ideal for MongoDB, as it capitalizes on the document database's purpose and the WiredTiger architecture.

April 17, 2026

MariaDB’s Snapshot Isolation: A Fix That Breaks More Than It Fixes

Jepsen’s analysis of MySQL 8.0.34 walked through a set of concurrency and isolation anomalies in InnoDB. MariaDB, which inherits the same codebase, took the report seriously and shipped a response: a new server variable called innodb_snapshot_isolation, turned on by default starting in 11.8. The announcement claims that with the flag enabled, Repeatable Read in MariaDB … Continued

The post MariaDB’s Snapshot Isolation: A Fix That Breaks More Than It Fixes appeared first on Percona.

April 16, 2026

Build resilient Kerberos authentication for Aurora Global Database without joining Active Directory domain

In this post, we show you how to build a multi-Region Kerberos authentication system that matches your Aurora Global Database’s resilience using AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) with multi-Region replication and a one-way forest trust to your on-premises Active Directory, so your Linux clients can authenticate without joining the AD domain.

The Future of Everything is Lies, I Guess: Where Do We Go From Here?

Table of Contents

This is a long article, so I've broken it up into a series of posts, listed below. You can also read the full work as a PDF or EPUB.

Previously: New Jobs.

Some readers are undoubtedly upset that I have not devoted more space to the wonders of machine learning—how amazing LLMs are at code generation, how incredible it is that Suno can turn hummed melodies into polished songs. But this is not an article about how fast or convenient it is to drive a car. We all know cars are fast. I am trying to ask what will happen to the shape of cities.

The personal automobile reshaped streets, all but extinguished urban horses and their waste, supplanted local transit and interurban railways, germinated new building typologies, decentralized cities, created exurban sprawl, reduced incidental social contact, gave rise to the Interstate Highway System (bulldozing Black communities in the process), gave everyone lead poisoning, and became a leading cause of death among young people. Many parts of the US are highly car-dependent, even though a third of us don’t drive. As a driver, cyclist, transit rider, and pedestrian, I think about this legacy every day: how so much of our lives are shaped by the technology of personal automobiles, and the specific way the US uses them.

I want you to think about “AI” in this sense.

Some of our possible futures are grim, but manageable. Others are downright terrifying, in which large numbers of people lose their homes, health, or lives. I don’t have a strong sense of what will happen, but the space of possible futures feels much broader in 2026 than it did in 2022, and most of those futures feel bad.

Much of the bullshit future is already here, and I am profoundly tired of it. There is slop in my search results, at the gym, at the doctor’s office. Customer service, contractors, and engineers use LLMs to blindly lie to me. The electric company has hiked our rates and says data centers are to blame. LLM scrapers take down the web sites I run and make it harder to access the services I rely on. I watch synthetic videos of suffering animals and stare at generated web pages which lie about police brutality. There is LLM spam in my inbox and synthetic CSAM on my moderation dashboard. I watch people outsource their work, food, travel, art, even relationships to ChatGPT. I read chatbots lining the delusional warrens of mental health crises.

I am asked to analyze vaporware and to disprove nonsensical claims. I wade through voluminous LLM-generated pull requests. Prospective clients ask Claude to do the work they might have hired me for. Thankfully Claude’s code is bad, but that could change, and that scares me. I worry about losing my home. I could retrain, but my core skills—reading, thinking, and writing—are squarely in the blast radius of large language models. I imagine going to school to become an architect, just to watch ML eat that field too.

It is deeply alienating to see so many of my peers wildly enthusiastic about ML’s potential applications, and using it personally. Governments and industry seem all-in on “AI”, and I worry that by doing so, we’re hastening the arrival of unpredictable but potentially devastating consequences—personal, cultural, economic, and humanitarian.

I’ve thought about this a lot over the last few years, and I think the best response is to stop. ML assistance reduces our performance and persistence, and denies us both the muscle memory and deep theory-building that comes with working through a task by hand: the cultivation of what James C. Scott would call metis. I have never used an LLM for my writing, software, or personal life, because I care about my ability to write well, reason deeply, and stay grounded in the world. If I ever adopt ML tools in more than an exploratory capacity, I will need to take great care. I also try to minimize what I consume from LLMs. I read cookbooks written by human beings, I trawl through university websites to identify wildlife, and I talk through my problems with friends.

I think you should do the same.

Refuse to insult your readers: think your own thoughts and write your own words. Call out people who send you slop. Flag ML hazards at work and with friends. Stop paying for ChatGPT at home, and convince your company not to sign a deal for Gemini. Form or join a labor union, and push back against management demands that you adopt Copilot—after all, it’s for entertainment purposes only. Call your members of Congress and demand aggressive regulation which holds ML companies responsible for their carbon and digital emissions. Advocate against tax breaks for ML datacenters. If you work at Anthropic, xAI, etc., you should think seriously about your role in making the future. To be frank, I think you should quit your job.

I don’t think this will stop ML from advancing altogether: there are still lots of people who want to make it happen. It will, however, slow them down, and this is good. Today’s models are already very capable. It will take time for the effects of the existing technology to be fully felt, and for culture, industry, and government to adapt. Each day we delay the advancement of ML models buys time to learn how to manage technical debt and errors introduced in legal filings. Another day to prepare for ML-generated CSAM, sophisticated fraud, obscure software vulnerabilities, and AI Barbie. Another day for workers to find new jobs.

Staving off ML will also assuage your conscience over the coming decades. As someone who once quit an otherwise good job on ethical grounds, I feel good about that decision. I think you will too.

And if I’m wrong, we can always build it later.

And Yet…

Despite feeling a bitter distaste for this generation of ML systems and the people who brought them into existence, they do seem useful. I want to use them. I probably will at some point.

For example, I’ve got these color-changing lights. They speak a protocol I’ve never heard of, and I have no idea where to even begin. I could spend a month digging through manuals and working it out from scratch—or I could ask an LLM to write a client library for me. The security consequences are minimal, it’s a constrained use case that I can verify by hand, and I wouldn’t be pushing tech debt on anyone else. I still write plenty of code, and I could stop any time. What would be the harm?

Right?

… Right?


Many friends contributed discussion, reading material, and feedback on this article. My heartfelt thanks to Peter Alvaro, Kevin Amidon, André Arko, Taber Bain, Silvia Botros, Daniel Espeset, Julia Evans, Brad Greenlee, Coda Hale, Marc Hedlund, Sarah Huffman, Dan Mess, Nelson Minar, Alex Rasmussen, Harper Reed, Daliah Saper, Peter Seibel, Rhys Seiffe, and James Turnbull.

This piece, like most all my words and software, was written by hand—mainly in Vim. I composed a Markdown outline in a mix of headers, bullet points, and prose, then reorganized it in a few passes. With the structure laid out, I rewrote the outline as prose, typeset with Pandoc. I went back to make substantial edits as I wrote, then made two full edit passes on typeset PDFs. For the first I used an iPad and stylus, for the second, the traditional pen and paper, read aloud.

I circulated the resulting draft among friends for their feedback before publication. Incisive ideas and delightful turns of phrase may be attributed to them; any errors or objectionable viewpoints are, of course, mine alone.

Why A Goat?

New Brand. Same Independence. If you read today’s announcement, you know Percona has a lot to say about what’s broken in modern data infrastructure. Lock-in dressed up as openness. Costs that climb while control shrinks. Vendors who made “managed” mean giving up visibility instead of gaining it. When we decided to stop being quiet about … Continued

The post Why A Goat? appeared first on Percona.