May 17, 2025
May 16, 2025
Freedom and Flexibility: Rethinking Your MongoDB Cloud Strategy Beyond Atlas
May 15, 2025
New in Percona Everest 1.6.0: Easily Deploy PMM with a Single Helm Command
Learn MySQL Performance
Every year new cohorts of programmers, DBAs, managers, and more learn MySQL. After MySQL and HeatWave Summit 2025 a few weeks ago, I realized what job needs to be done for these people and how I could help them make progress: revamping this website to focus on “the path”.
Query Response Time
Context
This is why you need to learn MySQL query response time:
This book explores that idea from various angles with a single intent: to help you achieve remarkable MySQL performance. Efficient MySQL performance means focusing on the best practices and techniques that directly affect MySQL performance—no superfluous details or deep internals required by DBAs and experts. I presume that you’re a busy professional who is using MySQL, not managing it, and that you need the most results for the least effort. That’s not laziness, that’s efficiency. To that end, this book is direct and to the point. And by the end, you will be able to achieve remarkable MySQL performance.
Indexes and Indexing
Context
This is why you need to learn MySQL indexes and indexing:
MySQL leverages hardware, optimizations, and indexes to achieve performance when accessing data. Hardware is an obvious leverage because MySQL runs on hardware: the faster the hardware, the better the performance. Less obvious and perhaps more surprising is that hardware provides the least leverage. I explain why in a moment. Optimizations refer to the numerous techniques, algorithms, and data structures that enable MySQL to utilize hardware efficiently. Optimizations bring the power of hardware into focus. And focus is the difference between a light bulb and a laser. Consequently, optimizations provide more leverage than hardware. If databases were small, hardware and optimizations would be sufficient. But increasing data size deleverages the benefits of hardware and optimizations. Without indexes, performance is severely limited.
Data and Access Patterns
Context
This is why you need to learn about MySQL data storage and access patterns:
Imagine that your job is to move rocks, and you have three piles of different sized rocks. The first pile contains pebbles: very light, no larger than your thumbnail. The second pile contains cobbles: heavy but light enough to pick up, no larger than your head. The third pile contains boulders: too large and heavy to pick up; you need leverage or a machine to move them. Your job is to move one pile from the bottom of a hill to the top. Which pile do you choose?
Sharding
Context
This is why you need to learn about sharding MySQL:
Sharding a database is the common and widely used technique of scaling out (or, horizontal scaling): increasing performance by distributing the workload across multiple databases. (By contrast, scaling up, or vertical scaling, increases performance by increasing hardware capacity.) Sharding divides one database into many databases. Each database is a shard, and each shard is typically stored on a separate MySQL instance running on separate hardware. Shards are physically separate but logically the same (very large) database.
Server Metrics and InnoDB
Context
This is why you need to learn about MySQL server metrics, especially InnoDB metrics:
That view is not unreasonable (or uncommon) because MySQL metrics are often discussed but never taught.
Even in my career with MySQL, I have never read or heard an exposition of MySQL metrics—and I have worked with people who created them.
The lack of pedagogy for MySQL metrics is due to a false presumption that metrics do not require understanding or interpretation because their meaning is self-evident.
That presumption has a semblance of truth when considering a single metric in isolation, like Threads_running: the number of threads running—what more is there to know?
But isolation is the fallacy: MySQL performance is revealed through a spectrum of MySQL metrics.
Replication
Context
This is why you need to learn the basics of MySQL replication, especially the cause of replication lag:
Replication lag is data loss. Seriously. Do not dismiss replication lag.
Transactions and Data Locks
Context
This is why you need to learn about MySQL transactions and data locks, which are closely related:
SELECT statement.
From our point of view as programmers, transactions appear conceptual: BEGIN, execute queries, and COMMIT.
Then we trust MySQL (and InnoDB) to uphold the ACID properties: atomicity, consistency, isolation, and durability.
When the application workload—queries, indexes, data, and access patterns—is well optimized, transactions are a nonissue with respect to performance.
(Most database topics are a nonissue when the workload is well optimized.)
But behind the scenes, transactions invoke a whole new world of considerations because upholding ACID properties while maintaining performance is not an easy feat.
Fortunately, MySQL shines at executing transactions.
Common Challenges
Context
This is why you need to learn about common challenges when using MySQL:
Copyright 2021 Daniel Nichter
No reproduction of this excerpt without permission
Key Points
- Split-brain occurs when two or more MySQL instances in the same replication topology are written to
- Split-brain is a detriment to data integrity—the data can no longer be trusted
- Data drift occurs when a replica becomes out of sync with the source
- Data drift is real but the origin of the drift is virtually impossible to pinpoint
- Data drift can be detected with pt-table-checksum
- ORMs can generate very poor queries and overall performance
- Schemas always change, so an online schema change (OSC) tool is must-have
- There are three popular OSC tools: pt-online-schema-change, gh-ost, and Spirit
- MySQL has non-standard SQL statements, options, and clauses
- Applications do not fail gracefully by default; it takes effort to fail gracefully
- High performance MySQL is difficult
Pitfalls
- Not taking into account the key points above
Hack MySQL Articles
Additional Resources
| Resource | Type | About |
|---|---|---|
| InnoDB and Online DDL | MySQL manual | Since schema changes are routine for developers, it’s important to understand how MySQL (and InnoDB) handle them because there are many edge cases depending on the specific type of change. |
| Spirit | Open source tool | The fastest, most advanced online schema change (OSC) change tool available. |
| gh-ost | Open source tool | The predecessor of Spirit. Both are solid tools, but use Spirit if possible. |
| pt-table-checksum | Open source tool | Still the only tool that can detection data drift on replicas |
Before gh-ost and Spirit, pt-online-schema-change (pt-osc) was the de facto standard tool. It's still widely used and referenced today, but as the author of pt-osc I strongly advise that you use Spirit instead. The only reason to use pt-osc is for rare (and risky) edge cases that Spirit (and usually gh-ost) do not support for good reason.
Cloud Performance
Context
This is why you need to learn about the reality of using MySQL in the cloud:
I wish it were as simple as “optimize the workload and you’re done”, but MySQL in the cloud raises unique considerations. The goal is to know and mitigate these cloud considerations so that you can focus on MySQL, not the cloud. After all, the cloud is nothing special: behind the proverbial curtain, it’s physical servers in a data center running programs like MySQL.
May 14, 2025
Create a unit testing framework for PostgreSQL using the pgTAP extension
Chapter 2: Serializability Theory (Concurrency Control Book)
Chapter 2 of Concurrency Control and Recovery in Database Systems (1987) by Bernstein, Hadzilacos, and Goodman is a foundational treatment of serializability theory. It is precise, formal, yet simple and elegant, a rare combination for foundational theory in a systems domain. Databases got lucky here: serializability theory is both powerful and clean.
The chapter builds up the theory step by step, introducing:
- Histories
- Serializable histories
- The Serializability Theorem
- Recoverability and its variants
- Generalized operations beyond reads/writes
- View equivalence
Each section motivates definitions clearly, presents tight formalism, and illustrates ideas with well-chosen examples.
2.1 Histories
This section lays the groundwork. It starts slow, and doesn't do anything fancy. It first defines what it means for the operations within a transaction to form a well-founded partial order. This intra-transaction ordering extends naturally to inter-transaction operations, forming a superset relation. Building on this, the book then defines a history as a partial order over operations from a set of transactions. Operations include reads (r₁[x]), writes (w₁[x]), commits (c₁), and aborts (a₁). Here the subscripts denote these operations all belong to transaction T₁. A history models the interleaving of these operations from different transactions, respecting the per-transaction order and ensuring all conflicting operations are ordered.
The abstraction here is elegant. The model omits values, conditionals, assignments, anything not visible to the scheduler. Only the structure of dependencies matters. This minimalism is a strength: it gives just enough to reason about correctness. The section is also careful to handle incomplete histories (i.e., prefixes), which is crucial for modeling crashes and recovery.
The discussion is very scheduler-centric. Since the scheduler is the component responsible for enforcing serializability, the model is tailored to what the scheduler can observe and act on. But this leads to missed opportunities as I mention below in Section 2.6 View equivalence.
2.2 Serializable Histories
This is where things get real. The goal is to define when a concurrent execution (history) is “as good as” some serial execution. The section has a simple game plan: define equivalence between histories (conflict equivalence), define serial histories, and finally say a history is serializable if it is equivalent to some serial history.
The chapter introduces the serialization graph (SG). Nodes are committed transactions. Edges capture conflicts: if T_i writes x before T_j reads or writes x, we add an edge T_i → T_j.
The formalism is tight, but not heavy. A good example is Fig 2-1 (p. 30), which compares equivalent and non-equivalent histories. (Minor typo: second w₁[x] in H₃ should be w₁[y].)
Theorem 2.1 (Serializability Theorem): A history is serializable iff its serialization graph is acyclic.
That's a beautiful punchline: serializability reduces to a cycle check in the conflict graph.
Credit is where it is due. In this case it is Jim Gray again! At the end of the section there is an acknowledgment stating that: The definition of equivalence and serializability used here and the Serializability Theorem are from "Gray, J.N., Lorie, R.A., Putzulo, G.R., Traiger, I.L. Granularity of Locks and Degrees of Consistency in a Shared Database. Research Report, IBM, September, 1975."
2.3 The Serializability Theorem
This section proves the big theorem about equivalence to a serial execution.
Intuitively, if SG(H) is acyclic, we can topologically sort it into a serial history equivalent to H. And if SG(H) has a cycle, then H cannot be serialized.
The figure shows an example. Edges in SG(H) constrain any valid serial order. A cycle means contradictory constraints, and hence, no valid serial order.
2.4 Recoverable Histories
Serializability ensures correctness in a crash-free world. But in practice, systems crash, transactions abort, and partial executions matter. To account for faults, this section defines increasingly strict notions:
Recoverable (RC): If T_j reads from T_i, then T_i must commit before T_j.
Avoids Cascading Aborts (ACA): T_j can only read from committed transactions.
Strict (ST): Reads and overwrites can happen only after the previous writer commits or aborts.
Theorem 2.2: ST ⊂ ACA ⊂ RC
The inclusion hierarchy is illustrated in Fig 2-2. The intersection of these properties with SR defines realistic schedules that can tolerate failures and support recovery.
Note that RC, ACA, and ST are prefix commit-closed properties, that is if they hold for history H, they also hold for any prefix of H. Nice and clean.
2.5 Operations Beyond Reads and Writes
I was happy to see this section. The authors show how the theory generalizes to operations beyond reads and writes, as long as we redefine the notion of conflict appropriately.
Two operations conflict if the order in which they execute affects: the final state of the database, or the values returned by operations. Fig. 2-3 gives the compatibility matrix to capture conflict relations between Read/Write/Inc/Dec. Increment(x) adds 1 to data item x and Decrement(x) subtracts 1 from x. An Inc or Dec does not return a value to the transaction that issues it.
The example history H₁₁ shows how to construct SG(H) even with these new operations. The theory lifts cleanly. Since SG(H_11) is acyclic, the generalized Serializability Theorem says that H_11 is SR. It is equivalent to the serial history T1 T3 T2 T4 which can be obtained by topologically sorting SG(H_11). DAGs FTW!
Still, this is a limited discussion. It's not expressive enough to model more general commutativity. But the seeds of CRDTs are here. In CRDTs, you model operations based on whether they commute under all interleavings. There is an on-going research direction trying to build a nice theory around more general operations and monotonicity concepts.
2.6 View Equivalence
This section introduces view serializability (VSR), a more permissive alternative to conflict serializability. Two histories are view equivalent if:
- Every read in one history reads from the same write as in the other.
- The final write to each data item is the same in both histories.
VSR allows more schedules than CSR. Consider the example below. T3 overwriting both x and y saves the day and masks the conflicts in T1 and T2. Interesting special case!
The book says that testing for VSR is NP-complete, and that kills its usefulness for schedulers. The authors argue that all practical schedulers use conflict serializability, and view serializability is too expensive to enforce. Yet they acknowledge it’s conceptually useful, especially for reasoning about multicopy databases (Chapters 5 and 8).
Reading this section, I had an interesting question. Is view serializability the same as client-centric consistency? They seem very similar. Client-centric consistency, as discussed in Crooks et al., PODC 2017, defines correctness in terms of what reads observe. VSR is similarly observational.
So while VSR wasn’t practical for scheduling, it found a second life in defining observational consistency in distributed systems. I think there's more fruit on this tree.
Configuring PgBouncer auth_type with Trust and HBA: Examples and Known Issues
Announcing the CedarDB Community Edition
Almost a year ago, we launched our company with our “Ode to Postgres”, with the goal of making the fruits of the highly successful Umbra research project available to you and everyone else. Umbra was created to incorporate all the lessons learned from the first 40 years of database systems into a new system built from the ground up to take advantage of modern hardware such as multi-core CPUs, fast SSDs, and cloud storage.