January 20, 2022
Supabase Beta December 2021
January 18, 2022
How our Rails test suite runs in 1 minute on Buildkite
January 16, 2022
Configuring MySQL Query Metrics
Editors and technical reviewers suggested that I cover how to configure MySQL query metrics in chapter 1 of Efficient MySQL Performance, but I deferred because it was out of scope for the book, which focuses on engineers using MySQL, not DBAs. As such, there’s only a note in chapter 1 that says: “Ask your DBA or read the MySQL manual.” But I’ll cover the topic here because that’s what this blog post series is for: behind the book.
Are you sure you want to use MMAP in your database management system?
Many database management systems carefully manage disk I/O operations and explicitly cache pages in main memory. Operating systems implement a page cache to speed up recurring disk accesses as well, and even allow transparent access to disk files through the mmap system call. Why do most database systems then even implement I/O handling and a caching component if the OS provides these features through mmap? Andrew Pavlo, Andrew Crotty, and myself tried to answer this question in a CIDR 2022 paper. This is quite a contentious question as the Hacker News discussion of the paper shows.
The paper argues that using mmap in database systems is almost always a bad idea. To implement transactions and crash recovery with mmap, the DBMS has to write any change out-of-place because there is no way to prevent write back of a particular page. This makes it impossible to implement classical ARIES-style transactions. Furthermore, data access through mmap can take a handful of nanoseconds (if the data is in the CPU cache) or milliseconds (if it's on disk). If a page is not cached, it will be read through a synchronous page fault and there is no interface for asynchronous I/O. I/O errors, on the other hand, are communicated through signals rather than a local error code. These problems are caused by mmap's interface, which is too high-level and does not give the database system enough control.
In addition to discussing these interface problems, the paper also shows that Linux' page cache and mmap implementation cannot achieve the bandwidth of modern storage devices. One PCIe 4.0 NVMe SSD can read over 6 GB/s and upcoming PCIe 5.0 SSDs will almost double this number. To achieve this performance, one needs to schedule hundreds or even thousands (if one has multiple SSDs) of concurrent I/O requests. Doing this in a synchronous fashion by starting hundreds of threads will not work well. Other kernel-level performance issues are single-threaded page eviction and TLB shootdowns. Overall, this is an example of OS evolution lagging behind hardware evolution.
The OS has one big advantage over the DBMS though: it has control over the page table. Once a page is mapped, accessing it becomes transparent and as fast as ordinary memory. Any manually-implemented buffer manager, in contrast, will have some form of indirection, which causes some overhead. Pointer swizzling as implemented in LeanStore and Umbra is a fast alternative but is also more difficult to implement than a traditional buffer manager and only supports tree-like data structures. Therefore, an interesting question is whether it would be possible to have an mmap-like interface, but with more control and better performance. Generally I believe this kind of research between different areas should be more common.
January 11, 2022
dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
This is an external post of mine. Click here if you are not redirected.
From Kafka streams to data products
Vitess Schema Tracking
January 06, 2022
Analyzing large JSON files via partial JSON parsing
This is an external post of mine. Click here if you are not redirected.
January 05, 2022
The year in books: 11 to recommend in 2021
Last year (2021) I finished 17 books, a five year low. But that's ok! 4 fiction and 13 non-fiction. Another 30 started but not finished.
Non-fiction
It seems I was pretty focused on business history books and history of tech. The 8 non-fiction books I liked the most:
- Designing Data-Intensive Applications, a must-read for anyone interacting with a database
- My Years with General Motors, the business school classic; truly a good read. But sad to know that shortly after written, GM succumbs to the Japanese and South Korean competition
- No Rules Rules: Netflix and the Culture of Reinvention
- Working Backwards: Insights, Stories, and Secrets from Inside Amazon
- Working in Public: The Making and Maintenance of Open Source Software, my review here
- The Intel Trinity, an early history of Intel
- The HP Way
- Play Nice But Win, the story of Dell computers
- West with the Night, beautiful memoir recommended by Ernest Hemingway and written in a similar style. Much more enjoyable than the other more popular colonial-African memoir, Out of Africa.
The rest
- Pour Your Heart Into It: How Starbucks Built a Company One Cup at a Time
- Jump-Starting America: How Breakthrough Science Can Revive Economic Growth and the American Dream
- ReWork
- Russia and the Russians: A History
Fiction
The 3 fiction books I liked the most:
- A Very British Coup, hilarious and depressing. A great companion to the TV show "Yes, Minister"
- Mort, Terry Pratchett is a very funny author
- Selected Stories of Philip K Dick, depressing and dystopian but very well written. I would not read again because it's too depressing
The rest
- There and NEVER, EVER BACK AGAIN: A Dark Lord's Diary, I was looking for more parodies like Bored of the Rings (which itself wasn't great). This was worse
2022
This year I'm interested in continuing to find good business books and good books on the history of tech. I'm also getting into more American history to make up for all the years of not paying attention in high school.
I'm continuing to try to find good memoirs and fiction by non-English authors.
Starting the blog-year off gently with my recap of 2021 in books.
— Phil Eaton (@phil_eaton) January 5, 2022
I spent too much time watching TV and trying new video games to keep up with past years 😅https://t.co/5mfXbBnihk pic.twitter.com/ZHmPsUcr3g