a curated list of database news from authoritative sources

May 03, 2026

BugBash'26: Day 2

Ok, finally getting sometime to put my butt down to write about day 2 of BugBash.


Why do so few buildings fall down?

Brian Potter, Senior Infrastructure Fellow @ Institute for Progress, Author of Construction Physics  newsletter.

Buildings rarely collapse. The rate of major structural failing is  between 1/100K to 1/ 1 million. (This is how I know this is a serious statistic: it is an interval.) 

Why don't more buildings fall down?

There are some technical reasons to it: buildings are simple stuctures with no (or little) moving parts. Buildings exhibit a limited number of behavior when you load their structure: stress, deflection, vibration, creep, etc. And these behaviors are commensurate to the  proportion of the force you put in. Finally, buildings are designed for 2X-3X of expected load.

Let's go deeper into structural elements. We have good theories for how structural elements behave, and individual components are tested extensively and are standardized. A building is exposed to a bounded load by default, and the maximum design forces (earthquakes, hurricanes) rarely occur. And many buildings are stitched together with alternate load paths, providing redundancy if a major element fails.

There are also cultural reasons to why buildings don't fall down. We have building codes in place: international building code, residential code, fire code, mechanical code, and plumbing code. These  codes enforce and improve best practices over time through reactive updates. There is a saying in civil engineering: building codes are written in blood. A recent example is Boston's big dig, which changed the code on anchoring blocks to the ceiling of the tunnel.

Other cultural reasons are builders are required to be licensed in many countries, and the profession's has strong risk aversion and conservatism culture. Civil engineers are really conservative people. They don't attempt building flying spinning restaurants, for example. They  still like to rely on hand calculations as backup. (There is a lesson here for AI era.) Brian, himself, did work for 5 years designing just parking garages, and then 10 years designing just apartment buildings.

When these things stop being true, building collapses become more common. For example, when we had less knowledge of building techniques, failures were more common. There are studies that tie the rate of failure of bridges to the lack of engineering knowledge, and showing 10X improvement when the knowledge increased.

Leaving large safety margins are also a big part of this.  For building types, that have little margin of safety (e.g., offshore platforms since they need to be submersible), the rate of failure increases. For buildings with unusual or  out of sample typologies, failure risk increases. A famous example is the Citicorp Center Tower. It had unusual structure, and they made a small change to it, which they later determined to be a wind hazaard, design modification undone before storm

Another famous example is the Tacoma Narrows Bridge, which collapsed due to extreme wind induced oscillations, and became an important lesson. 


Gary Marcus fireside chat

Gary is a Prof. Emeritus of Psychology and Neural Science  at NYU. Will Wilson interviewed him for this fireside chat.

Gary says, AI researchers don't want him on fireside chats, because he called bullshit on AI. He says that AI does do something, but does it badly. You cannot pour more money, and expect to achieve AGI. We might get to AGI by other means, but not through LLMs. Neurosymbolic AI may be the way to get there. When pressed on LLMs recently crossing a threshold, and taking over a lot of programming tasks, Gary ties this to neurosymbolic AI. He argues Claude code is not a pure LLM, you cannot scale pure LLMs, so they are using a lot of harness and tools, which apparently counts as neurosymbolic stuff. At a later point in the conversation, Gary said claude code is a bad attempt at symbolic AI: if-else statements, regular expressions, etc. It may only count as neurosymbolic hybrid. Gary gave some Richard Stallman and GNU Linux vibes by trying to rename the advances in AI as neurosymbolic AI.

Most of successful AI is narrow AI: deepblue, jeopardy computer versus broad AI, which is AGI. For AGI, multidimensional intelligence is needed. Is memorization smart? Is being useful for technical tasks smart? Calculators are smart under that definition as well. The definition should be cognitive, just being economically useful doesn't cut it for AGI definition.

Neural networks are parallel statistical computation. In 1967, there was a lot of excitement around NN that it will change the world. There was no proof on converging of a NN with backpropagation to a specified outcome, but it was still a useful system. In 2001, Gary wrote a book on why you are not gonna get there without techniques from symbolic AI. Gary said Geoff Hinton ridiculed him, and he said he would be happy to debate me, but he lied. He said he is open to the debate, but Hinton was very hostile to having any symbol manipulation, and kept saying they just need scaling.

What happened in the last years is LLMs using harnesses, as they are like bulls in a china shop. Will pressed back, asking why does it matter, the type of LLM scales well enough to get a lot of use out of it already, and Gary replied that we are still looking at one tiny corner, and  if you care about science, you wanna know what matters and what not.

Gary also said that winner takes all bullcase for investment is wrong. First of all, it looks like there won't be a single one winner capture everything. Secondly, the government nationalizing is a risk for investment. Finally, the model could get stolen. Gary argued that studies show  people don't get ROI on AI investment.

Gary also predicted there won't be AGI until 2027 or 2028. Phew! He said that AGI would able to watch a new movie and understand it, but the current AI systems would not be able to do it for a new movie outside their training set, for example, "One Battle After Another", and understand the Sean Penn character. I don't know man... Maybe somebody should give this a try. Similarly, he said, current AI tech won't be able to read a new novel and understand. Again, I think this is underestimating the current LLMs. Gary made a bet on this on SubStack back in 2024, and listed 10 things that won't happen by 2027.

Gary says his predictions come from cognitive analysis, and he doesn't see a clue that there is "world building" in the models.  It is only image space over time, and no world building. They are not able to do reasoning, and  the conceptual/algorithmic breakthrough may not come soon, and we would need 5-10 breakthroughs. World models, requires being neurosymbolic, but being neurosymbolic doesn't give you world models, you need ontologies.

There is also the question of, what if the barking dog finally catches up to the car.  Let's suppose AGI get created, what happens to society? Gary said we should endow it with human values. If AGI is able to be jailbroken like LLMs, we are in for a bad time.


Building confidence in an always-in-motion distributed streaming system

Frank McSherry is famous for originating timely-dataflow/differential-dataflow work and bringing SQL view materialization to market at Materialize with 1PB deployed capacity. He is also famous for his "scalability but at what cost" work.

Frank said he and his company gets 10-100x benefit from AI Neurosymbolic AI. But building confidence is a process, and the talk provides his opinions about building confidence about using AI for coding. 

Systems that work, work for a reason: Frank said he is a theory person, and theory people in CS (unlike that in physics) made the practice possible. He says there is one reason timely-dataflow/differential-dataflow works: virtual time (Jefferson'85)! This was originally suggested for discrete event based simulations, and as a second use case for concurrency control. Materarialized collections are built on (time, diff, data) abstraction. The changelog provides a specific collection at each time. Operations transform changelogs, while preserving the virtual time. The operations compose, and this means SQL plans compose. This way Materialize pre-resolves nondeterminism at the boundary. It removes logical contention from the critical path.

Abstraction is a superpower: Frank said, when he started as a software systems builder, he thought his job was to be smart and clever, but he eventually concluded that his job is to provide effective abstractions. The job is to manage, delete, and package complexity (which no one wants). Virtual time is a great abstraction for Materialize. It is hard to misuse components that respect virtual time. Composability of virtual time made all the difference, and removed the logical contention, clearing the deck.

Use it or lose it: Dogfood your own work. Benchmark and communicate its value.  Confidence is something you provide to others


Lightning Talks

Borrowing FoundationDB's simulator for layer development

Pierre Zemb of Clever Cloud talked about his journey from HBase operational nightmares (network splits, manual repairs with hbck) to building Rust-based layers on FoundationDB. He was motivated by FDB's deterministic simulator which abstracts every fallible interaction (network, disk, time, randomness) behind swappable interfaces. He and his team figured out how to inject their own Rust code into FDB's simulated cluster, initially just to verify transactional consistency but eventually testing increasingly rich workloads. This surfaced bugs everywhere, and make them switch to a simulation-first development.

I didn't take notes for these two lightning talks, so I just mention them by title.

Symbolic execution for invariant discovery (not just bug finding); Anish Agarwal, Head of Product @ Olympix 

Fuzzamoto: Full system fuzzing for Bitcoin nodes; Niklas Gögge, Security Engineer @ Brink


Lightning Talks

CUDA over TCP: reverse engineering the CUDA API; Shivansh Vij, CEO @ Loophole Labs

Verifying Cedar Policy’s correctness with PBT & differential response testing; Lucas Käldström, Staff Engineer @ Upbound

Keeping up with code being written 24/7; Josh Ip, Founder & CEO @ Ranger

Hacking kiosks


Behaviors as the backbone of software correctness

Gabriela Moreira, CEO of Quint, talked about her path from Informal Systems and the blockchain space into building Quint as a friendlier alternative to TLA+. She said loves TLA+ and its core abstraction of modeling systems as states and transitions, but only about 10% of her colleagues ever adopted it, citing the syntax as the sticking point. (I would like to propose a rule that, anybody who can read and write Rust syntax don't get to complain about TLA+ syntax. You literally need to learn 5-10 keywords, and that's it.)

Gabriela said that Quint keeps the underlying power but offers a different syntax plus type checking, and "quint run" performs random simulation of the state space (something TLA+ also supports), which tends to be faster than full model checking. The big question here is how you know you're done. In the rest of the talk, Gabriela explained that this confidence should come from understanding, sanity checks, testing with failures, and witnesses such as vacuity checks and traces toward a property. She said reproducible examples are central: tests written as `init.then(...).then(...).expect(...)` chains also serve as documentation you can actually trust. Behaviors should become the backbone across the whole software development life cycle, enabling model-based testing, trace validation, and hybrid approaches. The AI angle gives this fresh urgency of course. Spec-driven development and model-based testing becomes both more necessary and considerably easier with AI in the loop.


Steel, Rust, and Truth

Steve Klabnik, co-author of The Rust Programming Language, opened by talking about Pittsburgh. He then talked about his grandfather Keith and his father, a tool-and-die man who literally checked tools for rust. Klabnik counted himself lucky that his own passion turned out to be economically viable. After 40 minutes of this, he posed the question hanging over the room: in this moment, in 2026, which one are you: the grandfather who rode out the change, the father who didn't make it out, or Pittsburgh itself, which had to become something else entirely? This was a talk about feelings, which is awkward territory for software people who haven't been big about this topic, but who couldn't do much to dodge this existential thinking the last year or so.

Steve traced the history of correctness: Descartes stripping away every uncertain thing, Leibniz with his calculemus, Hilbert trying to formalize mathematics, and Gödel arriving to say sorry bro. The thread continues into our field: Hoare triples, Milner and type theory, O'Hearn and separation logic. These all aimed at the question of whether we can prove programs correct. The pragmatic answer for decades has been "it ran, is that enough?" 

Derrida in 1967 said meaning is never fully present in the sign. We never really knew reality anyway, but we'd settled on "good enough" because we wrote the code, understood it, and tried it. AI broke all three of those at once.

Steve said that the formal methods community has been taking correctness seriously for sixty years with contracts, specs, invariants, refinements, types, proofs. He argued that this crowd already knows how to wield the tools every programmer is now going to need. Harkening back on Derrida, you can't fully understand reality, but you can try to improve your grasp of it. And this craft suddenly matters to everyone.

Steve is a great speaker. The talk almost felt like stand up at times, and philosophy class at other times. He also spoke some uneasy truths to the software engineering crowd. We had been telling ourselves we were making the world a better place while "disrupting" other industries, and now the disruption has finally come to us.

May 02, 2026

Codd's Connection Trap and Oracle's JOIN TO ONE

In a previous post, I explored Codd's connection trap in PostgreSQL and MongoDB — the classic pitfall where joining two independent many-to-many relationships through a shared attribute produces spurious combinations that look like facts but aren't.

The example followed Codd's 1970 suppliers–parts–projects model: we know which suppliers supply which parts, and which projects use which parts, but joining through parts to derive supplier–project relationships is a relational composition — it tells us what could be true, not what is true.

Oracle Database 26ai introduces JOIN TO ONE, a SQL extension that structurally prevents this class of errors. In this post, I'll reproduce Codd's connection trap in Oracle, show how JOIN TO ONE catches it, and demonstrate the correct solutions.

Why This Matters: A Gap in SQL joins

When developers build joins at the application level — fetching a parent row for a given foreign key in PL/SQL or application code — they naturally get safety checks: TOO_MANY_ROWS tells them a lookup that should have been unique returned multiple rows, and NO_DATA_FOUND tells them the expected parent doesn't exist. These exceptions act as guardrails, catching data or logic errors immediately.

But when the same logic moves into a SQL JOIN, those guardrails disappear. A join that silently matches multiple rows simply multiplies the result set — no error, no warning, just quietly wrong numbers. A join that finds no match either drops the row (inner join) or pads it with NULLs (outer join), but never raises an alarm about violated assumptions.

JOIN TO ONE bridges this gap. It brings the equivalent of TOO_MANY_ROWS protection into SQL joins: if a join that you declared as "to one" ever reaches a second row, Oracle raises a runtime error instead of silently corrupting your results. The default outer-join behavior handles the "zero matches" case gracefully (like a NO_DATA_FOUND that returns NULL columns instead of erroring), and you can override it to INNER JOIN TO ONE when the absence of a match should eliminate the row.

A note on naming: JOIN TO ONE is semantically JOIN TO ZERO OR ONE AND ONLY ONE (for the default outer case) or JOIN TO ONE AND ONLY ONE (for the inner case). SQL has never been shy about verbosity, so a more precise name might have been warranted.

Schema & Sample Data

Following Codd's example, and the previous blog post, we have suppliers, parts, projects, and two independent many-to-many relationships — now with quantities to make the consequences of the trap concrete:

CREATE TABLE suppliers (
    supplier_id VARCHAR2(10) PRIMARY KEY
);

CREATE TABLE parts (
    part_id VARCHAR2(10) PRIMARY KEY
);

CREATE TABLE projects (
    project_id VARCHAR2(10) PRIMARY KEY
);

-- Supplier supplies parts
CREATE TABLE supplier_part (
    supplier_id   VARCHAR2(10) REFERENCES suppliers,
    part_id       VARCHAR2(10) REFERENCES parts,
    qty_available INT NOT NULL,
    PRIMARY KEY (supplier_id, part_id)
);

-- Project uses parts
CREATE TABLE project_part (
    project_id VARCHAR2(10) REFERENCES projects,
    part_id    VARCHAR2(10) REFERENCES parts,
    qty_needed INT NOT NULL,
    PRIMARY KEY (project_id, part_id)
);

-- Reference data
INSERT INTO suppliers VALUES ('S1');
INSERT INTO suppliers VALUES ('S2');

INSERT INTO parts VALUES ('P1');
INSERT INTO parts VALUES ('P2');
INSERT INTO parts VALUES ('P3');

INSERT INTO projects VALUES ('Alpha');
INSERT INTO projects VALUES ('Beta'); 

-- S1 supplies P1 (100 units) and P2 (200 units)
-- S2 supplies P2 (150 units) and P3 (300 units)
INSERT INTO supplier_part VALUES ('S1', 'P1', 100);
INSERT INTO supplier_part VALUES ('S1', 'P2', 200);
INSERT INTO supplier_part VALUES ('S2', 'P2', 150);
INSERT INTO supplier_part VALUES ('S2', 'P3', 300);

-- Alpha uses P1 (50 units) and P2 (75 units)
INSERT INTO project_part VALUES ('Alpha', 'P1', 50);
INSERT INTO project_part VALUES ('Alpha', 'P2', 75);

-- Alpha uses P1 (50 units) and P2 (75 units)  
-- Beta  uses P2 (60 units) and P3 (90 units)  
INSERT INTO project_part VALUES ('Alpha', 'P1', 50);  
INSERT INTO project_part VALUES ('Alpha', 'P2', 75);  
INSERT INTO project_part VALUES ('Beta',  'P2', 60);  
INSERT INTO project_part VALUES ('Beta',  'P3', 90); 

COMMIT;

The Connection Trap in Action

A developer wants to know which suppliers are connected to which projects and executes the following query:

SELECT sp.supplier_id,
       pp.project_id,
       sp.part_id,
       sp.qty_available,
       pp.qty_needed
FROM   supplier_part sp
JOIN   project_part pp ON sp.part_id = pp.part_id
ORDER  BY sp.supplier_id, sp.part_id
;

SUPPLIER_ID    PROJECT_ID    PART_ID       QTY_AVAILABLE    QTY_NEEDED
______________ _____________ __________ ________________ _____________
S1             Alpha         P1                      100            50
S1             Alpha         P2                      200            75
S1             Beta          P2                      200            60
S2             Alpha         P2                      150            75
S2             Beta          P2                      150            60
S2             Beta          P3                      300            90

6 rows selected.

6 rows from only 4 supplier-part rows and 4 project-part rows. The query asserts, for example, "S2 supplies P2 to Alpha" — but our data only says S2 can supply P2 and Alpha needs P2. The join inferred relationships through the shared attribute part_id that were never recorded as facts.

As Codd warned in his 1970 paper, this is exactly the connection trap: deriving relationships that were never asserted.

The Damage with Aggregates

Now the developer summarizes:

SELECT sp.supplier_id,
       SUM(sp.qty_available) AS total_available,
       SUM(pp.qty_needed)    AS total_needed
FROM   supplier_part sp
JOIN   project_part pp ON sp.part_id = pp.part_id
GROUP  BY sp.supplier_id
ORDER  BY sp.supplier_id
;

SUPPLIER_ID       TOTAL_AVAILABLE    TOTAL_NEEDED
______________ __________________ _______________
S1                            500             185
S2                            600             225

Compare with the actual totals from each table independently:

SELECT supplier_id, SUM(qty_available) AS total_available  
FROM   supplier_part  
GROUP  BY supplier_id  
ORDER  BY supplier_id;  

SUPPLIER_ID       TOTAL_AVAILABLE
______________ __________________
S1                            300
S2                            450

SELECT project_id, SUM(qty_needed) AS total_needed  
FROM   project_part  
GROUP  BY project_id  
ORDER  BY project_id;  

PROJECT_ID       TOTAL_NEEDED
_____________ _______________
Alpha                     125
Beta                      150

The connection trap inflated both sides:

  • S1's availability jumped from 300 to 500: P2's 200 was counted twice (once for Alpha, once for Beta)
  • S2's availability jumped from 450 to 600: P2's 150 was counted twice (once for Alpha, once for Beta)
  • Needs were scrambled: the 225 attributed to S2 mixes Alpha's and Beta's needs, double-counting P2's demand

The trap corrupts aggregates in whichever direction the data happens to push — inflation, deflation, or both at once — and it does so silently. In application code, a lookup-by-key that returns two rows would raise TOO_MANY_ROWS. In a SQL join, the same situation just silently multiplies your totals.

This explains why, in a data warehouse, we denormalize into a dimensional model, or star schema, with a single fact table and dimension tables. Normalization makes the relational schema unsafe for users who see only the SQL schema, without the details of the domain model or the safeguards provided by the application.

The Join Graph: Why There Is No RWT

Oracle's JOIN TO ONE documentation introduces the concept of a Row-Widened Table (RWT) — a table from which all other tables can be reached through unique (many-to-one) joins, ensuring the query result maps one-to-one to the RWT rows. A query where such an RWT exists is a Row Widening Only Query (RWOQ), and it's almost always what you need for correct results.

Here's the join graph of our broken query:

   supplier_part ──→ parts ←── project_part
         (FK)        (PK)         (FK)

The "parts" table is a parent node reached from two sibling child nodes. This is the chasm trap:

  • Starting from "supplier_part": the path to "project_part" via "parts" goes many-to-one then one-to-many — not unique
  • Starting from "project_part': same problem in reverse
  • Starting from "parts": both children fan out

No table qualifies as a RWT. This is not a RWOQ. The output rows don't map one-to-one to any table's rows.

How JOIN TO ONE Catches the Trap

With Oracle 26ai's JOIN TO ONE, attempting to write this query produces an error:

-- THIS FAILS — and that's exactly what we want  
SELECT sp.supplier_id,  
       pp.project_id,  
       sp.qty_available,  
       pp.qty_needed  
FROM   supplier_part sp  
JOIN TO ONE (parts p, project_part pp); 

Error at Command Line : 6 Column : 23
Error report -
SQL Error: ORA-18641: No join key found for "PROJECT_PART"

JOIN TO ONE requires every table inside the parentheses to be reachable from the leading RWT through a chain of unique joins. The path supplier_part → parts is many-to-one (valid), but parts → project_part is one-to-many (invalid). Oracle detects that part_id alone is not unique in project_part (the PK is (project_id, part_id)) and blocks the query.

Even forcing an explicit ON clause doesn't help:

SELECT sp.supplier_id,  
       pp.project_id,  
       sp.qty_available,  
       pp.qty_needed 
FROM supplier_part sp
JOIN TO ONE (
    parts p,
    project_part pp ON p.part_id = pp.part_id
);

Error at Command Line : 8 Column : 5
Error report -
SQL Error: ORA-18640: JOIN TO ONE reached multiple rows joining to "PP", resulting in a non-unique join

https://docs.oracle.com/error-help/db/ora-18640/

Oracle either rejects at parse time or raises a runtime error the moment a part matches multiple project_part rows — the SQL equivalent of the TOO_MANY_ROWS exception that application developers rely on. Instead of silently producing wrong numbers for months or years, you get an immediate, clear signal: this query structure doesn't support the one-to-one mapping you're claiming.

The Correct Solutions: two Separate RWOQs

Since the schema doesn't record the three-way relationship, we must run two separate queries, and that's exactly what JOIN TO ONE forces us to do:

-- RWOQ 1: "What can each supplier supply?"  
-- RWT = supplier_part → unique joins to suppliers and parts  
SELECT sp.supplier_id,  
       sp.part_id,  
       sp.qty_available  
FROM   supplier_part sp  
JOIN TO ONE (suppliers s, parts p)  
ORDER  BY sp.supplier_id, sp.part_id;  

SUPPLIER_ID    PART_ID       QTY_AVAILABLE
______________ __________ ________________
S1             P1                      100
S1             P2                      200
S2             P2                      150
S2             P3                      300

-- RWOQ 2: "What does each project need?"  
-- RWT = project_part → unique joins to projects and parts  
SELECT pp.project_id,  
       pp.part_id,  
       pp.qty_needed  
FROM   project_part pp  
JOIN TO ONE (projects j, parts p)  
ORDER  BY pp.project_id, pp.part_id;  

PROJECT_ID    PART_ID       QTY_NEEDED
_____________ __________ _____________
Alpha         P1                    50
Alpha         P2                    75
Beta          P2                    60
Beta          P3                    90

Both are valid RWOQs. Clean star-shaped join graphs. No spurious combinations. Aggregates on qty_available or qty_needed are guaranteed correct.

Conclusion

Codd identified the connection trap in 1970: inferring relationships from shared attributes produces combinations that could be true, not combinations that are true. Over fifty years later, this trap remains one of the most common sources of silently wrong SQL — aggregates that are "slightly off," duplicates masked by DISTINCT, totals that nobody questions because they look plausible.

Application developers have long relied on TOO_MANY_ROWS and NO_DATA_FOUND exceptions to catch violated uniqueness assumptions in procedural lookups. But the moment those lookups become SQL joins, the safety net vanishes — a many-to-one assumption that silently becomes many-to-many just multiplies rows without complaint.

Oracle's JOIN TO ONE in Database 26ai brings that safety net back into SQL:

Traditional JOIN JOIN TO ONE
Connection trap ⚠️ Silently produces wrong results ⛔️ Blocked at parse/runtime
Row multiplication ⚠️ Cartesian per shared parent ⛔️ Prevented by RWOQ enforcement
Aggregates ⚠️ Inflated or deflated silently ✅ Guaranteed by one-to-one mapping
Equivalent of TOO_MANY_ROWS ❌ Not available in joins 🛑 Runtime error on violated uniqueness
Developer awareness ⚠️ Can go unnoticed for years 🛑 Immediate error

The rule is the same whether you use normalized relations, star schemas, or document models: if a relationship is a fact, it must be stored as one — not derived through joins. JOIN TO ONE ensures that when you do join, the result stays faithful to the facts your schema actually records, or the query fails.

If you think SQL databases, normalization, and referential integrity automatically protect data consistency better than denormalized models, this is proof that they do not. A document model can preserve business invariants by storing them consistent, whereas normalization can break them across multiple tables to be joined. The relational model is an abstraction that simplifies data relationships and can hide business invariants that exist in the domain model and the application. Applications must then compensate by writing safer queries, often by running multiple queries at a performance cost. The new JOIN TO ONE syntax helps SQL users find the right balance by declaring their intent: to look up additional columns from dimensions without changing the number of fact rows.

May 01, 2026

Village News: MySQL/Database News (1 May 2026)

Stay ahead with the latest MySQL & database news! Get the scoop on MySQL 9.7 LTS, the new hypergraph optimizer, and upcoming events like Percona Live ‘26.

Run an ALTER TABLE for a huge table in Aurora

Recently, we received an alert for one of our Managed Services customers indicating that the auto_increment value for the table was 80% of its maximum capacity. The column was INT UNSIGNED, which has a limit of 4,294,967,295. At 80%, we have enough time to change it to BIGINT.…. Right? Let’s see. So we used pt-online-schema-change … Continued

The post <H1> Run an ALTER TABLE for a huge table in Aurora appeared first on Percona.

April 30, 2026

Managing Valkey Cluster in Kubernetes

Over the last several years, Percona has introduced several rock-star Kubernetes Operators for managing MySQL, Percona XtraDB Cluster, MongoDB, and PostgreSQL. For Valkey, we are actively working with the community to contribute our knowledge, and experience to help brainstorm, develop, and test the official Valkey Operator for Kubernetes. While the Valkey Operator has not yet … Continued

The post Managing Valkey Cluster in Kubernetes appeared first on Percona.

Continued Commitment to Percona XtraDB Cluster

At Percona, our priority has always been to provide the open source database solutions that our users can count on for the long term. Percona XtraDB Cluster (PXC) is a core part of that promise, delivering the high availability, scalability, and data integrity that mission-critical MySQL deployments depend on. MariaDB has announced that September 30, … Continued

The post Continued Commitment to Percona XtraDB Cluster appeared first on Percona.

Troubleshooting logical replication delay made easy

This blog is based on a real production case in which users experienced a serious delay in logical replication. Let me try to explain how to approach similar cases and analyze them in an easy method, because lag in logical replication is a common problem, and we should expect it to come up for different … Continued

The post Troubleshooting logical replication delay made easy appeared first on Percona.

Announcing Vitess 24

Announcing Vitess 24 # The Vitess maintainers are happy to announce the release of version 24.0.0, along with version 2.17.0 of the Vitess Kubernetes Operator. Version 24.0.0 expands query serving capabilities for sharded keyspaces, modernizes Vitess's observability stack, and introduces faster replica provisioning through native MySQL CLONE support. The companion v2.17.0 operator release brings significant improvements to scheduled backups, with new cluster- and keyspace-level schedules that make production backup management much easier to configure at scale.

RLS sounds great until it isn't

PostgreSQL's Row Level Security sounds like a clean way to enforce access control at the database layer, but the foot-guns, pooling incompatibilities, and performance traps often make it more trouble than it's worth.

April 29, 2026

XtraBackup incremental prepare phase is 2x-3x faster!

TL;DR Percona XtraBackup is a 100% open-source backup solution for Percona Server for MySQL and MySQL®. It is designed for high-availability environments, performing online, non-blocking, and highly secure backups of transactional systems without interrupting your production traffic. While full backups work for small databases, large-scale systems rely on incremental backups to save space and time. … Continued

The post XtraBackup incremental prepare phase is 2x-3x faster! appeared first on Percona.

Orchestrator’s Next Chapter: What It Means for Percona Customers

Last week, ProxySQL announced that they are taking over the maintenance and development of Orchestrator, the MySQL high-availability and topology management tool originally authored by Shlomi Noach. You can read their announcement here: Announcing the future of Orchestrator. We want to briefly share Percona’s position on the news. We welcome this Orchestrator became the de … Continued

The post Orchestrator’s Next Chapter: What It Means for Percona Customers appeared first on Percona.

April 28, 2026

Ensuring PostgreSQL Backup Continuity: A pgBackRest Update

pgBackRest is a foundational component of the PostgreSQL backup solutions supported by Percona, playing a critical role in ensuring reliable and resilient data protection for our customers. It is a testament to the strength of the open source community that pgBackRest has become such a robust, widely trusted tool over the years. Recently, changes around … Continued

The post Ensuring PostgreSQL Backup Continuity: A pgBackRest Update appeared first on Percona.

MySQL with AI Inside

Learn how to use the VillageSQL vsql-ai extension to run AI models like Claude, Gemini, ChatGPT, and local within MySQL (VillageSQL).

Introducing the OSSCAR Index

Announcing the OSSCAR Index: a quarterly ranking of the fastest-growing open source organizations. The site, the data, and the scoring code are all open source.

April 27, 2026

DSQL SQL Dialect: How Amazon Aurora DSQL differs from single-instance PostgreSQL

This post is for database architects, developers, and DBAs who must evaluate Amazon Aurora DSQL or work with PostgreSQL workloads on a distributed database. Knowing exactly where Amazon Aurora DSQL aligns with standard PostgreSQL and where it diverges helps you to reduce risk and design schemas that perform well from day one. You might find that most existing PostgreSQL applications work with minimal changes.

How Kajabi optimized costs with Amazon Aurora upgrades

In this post, we show you how Kajabi navigated complex Aurora PostgreSQL database upgrades and achieved an 80.53% cost reduction through strategic planning and technical execution. You'll discover their hybrid approach combining Amazon Aurora blue/green deployments with PostgreSQL native replication. You'll also learn about their implementation of Aurora I/O-Optimized storage and the key lessons from their journey. Whether you're managing large-scale databases or planning your own upgrade path, Kajabi's experience offers valuable insights. You'll see how to balance performance requirements with cost optimization while maintaining continuous availability.

BugBash'26 Afternoon of Day 1

These are my notes from the afternoon sessions of BugBash'26. We had a 75 minute lunch break. Nice lunch, but there were no vegetarian entries, which made Peter Alvaro hangry. I don't blame him, I would be too. 


Informal methods

Ben Eggers, Member of Technical Staff @ OpenAI

This was a fun and also thought provoking talk. The premise is "Nothing has changed about software development". Really? After the LLMs eating software like a wildfire, and particularly rocking at code generation in the last 3 months?? And this is coming from an OpenAI infrastructure engineer, who was once a  8th highest 7d token user. How come? 

The talk has two parts:

  • writing code was where the hard parts surfaced
  • agents move the work, but do not obviate it

Ok, now it makes more sense. Both of these are sensible statements. Ben followed with a couple disclaimers, that he is talking about deep narrow systems, and not about broad high surface area systems, because he has experience in the former, and didn't want to make claims for the latter.  Ben is a funny guy, and refreshingly honest. He made fun of LLMs mistake in code generation through 3 popup quizzes through out his talk. 

Part 1: Writing code has always been where the hard part surfaced

Ben harkened back to Will Wilson's claim that before LLMs,  50% of the time was spent in writing code, and 50% in testing. Ben asked, hey, what happened to design, discovery, integration, and correctness?  He made the case that these were the hard parts, and writing code forces people to address them:

  1. decide intent
  2. discover the shape of the problem
  3. turn boundaries into contracts
  4. notice weirdness before anyone else does
  5. fill in the code

This is a long arduous process. The slowness of writing code was load-bearing! When you notice your code start crossing boundaries, this exposed bad interfaces in your components. Wiring paths end-to-end helped expose missing cases. Writing tests (yes this was the first thing we lost to the LLM wildfire) forced expected behavior to become explicit (test your interfaces)

Before LLMs,  how fast code appeared matched how fast humans could reason about it. And yes LLMs broke this part! But maybe not so drastically. Tech leads already managed stochastic work, and knew how to break the problem, and manage junior engineers, and interns writing code. The job was always narrowing distribution, and turning a large spectrum of possible outcomes into a tighter reliable band.

Part 2: Agents move the work, but don't make it disappear

Models crossed a usefulness threshold ~3 months ago. (This point kept coming up in many of the talks. Some people claimed it happened it November-December, but everyone--except Gary Marcus-- agreed a corner was turned.)

But the code you get back is proportional to the leg work you do. Models write better code when you do the leading: tell them what success means, give them the shape of the solution, determine the behavior up-front.

You need to be incredibly specific in your prompt, and in the limit prompts become math-like! (You mean TLA+ specs?) So Ben recommends:

  • design: do the hardest parts yourself 
  • write your schemas by hand
  • write your APIs and interfaces by hand
  • correctness: give it rails

Unit testing is kinda dead, LLMs do a great job of that. But always implement tests in a different context. Write interfaces, write tests, and tell the LLM to not to touch the test, and ask it to write the code. This is... exactly like managing an army of interns.

So, Ben claims, nothing changed overall. Code got cheap, but correctness did not. You can outsource your coding, but you can outsource your thinking/understanding. 


Now more than ever: building reliable software in the age of agents

Ron Minsky, Co-head of Technology @ Jane Street

I didn't take much notes in this talk, and took some headspace time in the second part. 

Protocol-aware deterministic simulation testing

Chaitanya Bhandari, Distributed Systems Engineer @ TigerBeetle

Chaitanya is really smart and gave a decent talk. Again I didn't take much notes, and hit the hallway track. Waking up at 4am to fly in the same day from Buffalo to DC took its toll on me. 


Fast and fault-tolerant: pick two

Matt Barrett, Founder & CEO @ Adaptive

The Adaptive company helps moving a lot of cash around the globe. Stressful work. Matt talked about what he claims is the world's fastest (in terms of low-latency) Raft implementation. This was built 8 years ago. Antithesis (tested couple months ago) said it is one of the most reliable Raft implementations they saw. It's used for trading systems/infrastructure. It supports 100K transactions/sec, and provides low double digit microseconds with low variance.

Aeron cluster, their fast and fault-tolerant Raft, builds on opensource Aeron as a low latency high throughput messaging layer. It is based on individual byte replication, not message replication! They moved from a message index to a byte index, with natural batching at all levels. Matt said business logic runs in the cluster for low latency. I don't know what he means exactly by that.  

But, why didn't we hear about this Raft implementation before? Also the talk did not mention any protocol innovations. It looks like there isn't much protocol level innovation at the distributed systems level or algorithmic level, and the innovation may be at the lower layers, at data handling and networking implementation. I still don't have a good idea of their Raft implementation after the talk.


Making high performance storage boring

Corwin Coburn, Uber Tech Lead, Parallel File Systems @ Google

The point is you want to keep storage boring. Storage is about writes and reads. It is a utility, and, hence, is boring. Nobody calls the plumber when things are fine. They built at Google the fastest luster filesystem in the world with 10 TB/sec.

Parallel file systems in the cloud requires capacity, performance, availability, security, ease of use. They put a lot of effort to keep the storage boring by building on reliability infrastructure, proven software (lustre), strict tenancy isolation, providing limited configurations, and achievable SLOs.

This part is important do not overpromise, and not overdeliver! If you overdeliver, and customers get accustomed to it (Hyrum's law), when you go normal, that breaks the customers.

Since this conference care a lot about testing, what about testing at scale?

  • try to avoid this
  • but test each component at large scale
  • and test e2e at small scale
  • do the math
  • and finally do limited test at full scale

Innovation << Reliability  << Inertia

This is also a big tenet of keeping it boring.  Nobody rewrites their applications to use your uber/super API. It has to be boring, remember, utility is boring.

A tip for the developers. When you don't use storage properly, it performs badly. Most developers don't know how to use storage properly. One important thing is: don't use filesystem metadata for query intensive operations.