October 14, 2025
October 13, 2025
Postgres 18.0 vs sysbench on a 32-core server
This is yet another great result for Postgres 18.0 vs sysbench. This time I used a 32-core server. Results for a 24-core server are here. The goal for this benchmark is to check for regressions from new CPU overhead and mutex contention.
I repeated the benchmark twice because I had some uncertainty about platform variance (HW and SW) on the first run.
tl;dr, from Postgres 17.6 to 18.0
- There might be regressions from 17.6 to 18.0 but they are small (usually <= 3%)
tl;dr, from Postgres 12.22 through 18.0
- the hot-points test is almost 2X faster starting in 17.6
- scan is ~1.2X faster starting in 14.19
- all write tests are much faster staring in 17.6
For 18.0 I tried 3 configuration files:
- conf.diff.cx10b_c32r128 (x10b) - uses io_method=sync
- conf.diff.cx10c_c32r128 (x10c) - uses io_method=worker
- conf.diff.cx10d_c32r128 (x10d) - uses io_method=io_uring
Benchmark
The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 900 seconds.
The benchmark is run with 24 clients and 8 tables with 10M rows per table. The purpose is to search for regressions from new CPU overhead and mutex contention.
In the second run, all results were collected within 7 days and I am less concerned about variance there.
I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for base version)
I present results for:
- versions 12 through 18 using 12.22 as the base version
- versions 17.6 and 18.0 using 17.6 as the base version
- 18.0 looks better relative to 17.6 in the second run and I explain my uncertainty about the first run above
- But I am skeptical about the great result for 18.0 on the full scan test (scan_range=100) in the second run. That might be variance induced by vacuum.
- There might be regressions from 17.6 to 18.0 but they are small (usually <= 3%)
- The small regression in read-only_range=10 might be from new optimizer overhead, because it doesn't reproduce when the length of the range query is increased -- see read-only_range=100 and read-only_range=10000.
- the hot-points test is almost 2X faster starting in 17.6
- scan is ~1.2X faster starting in 14.19
- all write tests are much faster staring in 17.6
Copy-and-Patch: How It Works
Copy-and-Patch: A Copy-and-Patch Tutorial
October 11, 2025
Geoblocking Multiple Localities With Nginx
A few months back I wound up concluding, based on conversations with Ofcom, that aphyr.com might be illegal in the UK due to the UK Online Safety Act. I wrote a short tutorial on geoblocking a single country using Nginx on Debian.
Now Mississippi’s 2024 HB 1126 has made it illegal for essentially any web site to know a user’s e-mail address, or other “personal identifying information”, unless that site also takes steps to "verify the age of the person creating an account”. Bluesky wound up geoblocking Mississippi. Over on a small forum I help run, we paid our lawyers to look into HB 1126, and the conclusion was that we were likely in the same boat. Collecting email addresses put us in scope of the bill, and it wasn’t clear whether the LLC would shield officers (hi) from personal liability.
This blog has the same problem: people use email addresses to post and confirm their comments. I think my personal blog is probably at low risk, but a.) I’d like to draw attention to this legislation, and b.) my risk is elevated by being gay online, and having written and called a whole bunch of Mississippi legislators about HB 1126. Long story short, I’d like to block both a country and an individual state. Here’s how:
First, set up geoipupdate as before. Then, in /etc/nginx/conf.d.geoblock.conf, pull in the country and city databases, and map the countries and states you’d like to block to short strings explaining the applicable law. This creates variables $geoblock_country_law and $geoblock_state_law.
geoip2 /var/lib/GeoIP/GeoLite2-Country.mmdb {
$geoip2_data_country_iso_code country iso_code;
}
geoip2 /var/lib/GeoIP/GeoLite2-City.mmdb {
$geoip2_data_state_name subdivisions 0 names en;
}
map $geoip2_data_country_iso_code $geoblock_country_law {
GB "the UK Online Safety Act";
default "";
}
map $geoip2_data_state_name $geoblock_state_law {
Mississippi "Mississippi HB 1126";
default "";
}
Create an HTML page to show to geoblocked IPs. I’ve put mine in /var/www/custom_errors/451.html. The special comments here are Server-Side Include (SSI) directives; they’ll insert the contents of the $geoblock_law variable from nginx, which we’ll set shortly.
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Unavailable Due to
<!--# echo var="geoblock_law" default=""-->
</title>
</head>
<body>
<h1>Unavailable Due to
<!--# echo var="geoblock_law" default=""-->
</h1>
</body>
</html>
Then, in /etc/nginx/sites-enabled/whatever.conf, add an error page for status code 451 (unavailable for legal reasons). In the main location block, check the $geoblock_country_law and $geoblock_state_law variables, and use them to return status 451, and set the $geoblock_law variable for the SSI template:
server {
...
# Status 451 renders this page
error_page 451 /451.html;
location /451.html {
ssi on;
internal;
root /var/www/custom_errors/;
}
location / {
# If either geoblock variable is set, return status 451
if ($geoblock_state_law != "") {
set $geoblock_law $geoblock_state_law;
return 451;
}
if ($geoblock_country_law != "") {
set $geoblock_law $geoblock_country_law;
return 451;
}
}
}
Test with nginx -t, and reload with service nginx reload, as usual.
Geoblocking is a bad experience in general. In Amsterdam and Frankfurt, I’ve seen my cell phone’s 5G connection and hotel WiFi improperly identified as being in the UK. I’m certain this is going to block people who aren’t in Mississippi either. If you don’t want to live in this world either, start calling your representatives to demand better legislation.
October 10, 2025
Academic chat: On PhD
This week, Aleksey and I met not to dissect a research paper, but to chat about "the process of PhD". I had recently wrote a post titled "The Invisible Curriculum of Research", where I framed research as an iceberg, with the small visible parts (papers, conferences) resting on the hidden 5 Cs:
- Curiosity/Taste: what problems are worth solving.
- Clarity: how to ask precise and abstracting questions.
- Craft: writing, experimentation, presentation.
- Community: collaboration and contribution.
- Courage: resilience through setbacks.
Above is the video of our chat, with a lot of personal anecdotes and a few rants. But if you want to cut to the chase, the highlight reel is below.
What a PhD Really Produces
The real product of a PhD is not the thesis, but you, the researcher! The thesis is just the residue of this long internal transformation. Like martial arts, the training breaks you and rebuilds you into someone who sees and thinks differently. This transformation cannot be faked and you should take your time to grow your wings. But you can be effective about it.
Curiosity and taste
Taste blends curiosity, creativity, and judgment of what's important. Curiosity alone can lure you into many bottomless technical rabbit holes. Taste filters what matters and channels curiosity into focus. And you definitely need passion to sustain yourself through the ups and downs of the arduous PhD journey.
The serendipitous path to research
Many researchers stumble into research through chance encounters, unexpected opportunities, and detours. So it is worth keeping an open mind, noticing what sparks your curiosity and suits you best, and following it. Aleksey shares his own unlikely path to a PhD, which is well worth watching. I have written before about how I started, but here I go deeper into where those interests first took root.
Growing through friction and mentorship
Taste, curiosity, and confidence grow through friction. The best labs are loud where debates spill into hallways. When Aleksey, Aili, and I worked together, neighboring faculty sometimes complained about the noise, wondering why we were always arguing. But intellectual sparring sharpens your ideas. Research maturity comes from questioning, defending, refining. In this type of hands-on, messy mentorship, taste, passion, craft all rub off.
Asking good questions and abstracting well
Abstraction is the art of asking the right questions. The best questions cut away accidental complexity and get to the essence of the problem. Leslie Lamport's genius was exactly knowing what to ignore/abstract-away. "Craft is knowing how to work, and art is knowing when to stop."
By finding the right question/framing/abstraction, you can pivot a project that is not getting any lift into an impactful hit! (This I believe.)
The Craft of Research
Most research is unglamorous: debugging, writing, revising, rejections. But you gotta do the craft, and do it well, for your ideas to lift off. You need routines and ritual to keep you steady and improving. Aleksey's productivity routine involves daily 90-minute walks as his "thinking time". Thinking, for him, is a physical process. We used to walk a lot when we worked together, but somehow I have fallen off that wagon. My thinking time now comes through freewriting on Emacs or on my tablet, and arguing with myself on the page. We both agree, though, that talking/arguing with collaborators forces clarity and generates ideas.
On Courage and Resilience
Every researcher fails as much as (if not more than) they succeed. The researcher needs to endure through failures and rejections. You need to keep showing up to write the next draft, rerun the next experiment, submit again. Passion helps, without it, survival in research is unlikely. But you also need to make a habit of endurance. Courage also means questioning norms and pursuing ideas that may not yet be fashionable but feel true.
But, sometimes (ah Retroscope) you have to take the loss, cut your losses, and move on. Maybe you can return later at a more opportune time.
Top skills/qualities for a PhD
We discussed our picks for top three skills needed for a successful Phd. For me, it is writing/communication, asking the right questions, and metacognition (knowing when to stop, reframe, or abstract; seeing the essence rather than surface detail). Reading skills came up very high in our discussion too. You can't outsource that to ChatGPT. People skills also matter: work well with your collaborators. Conferences and brutal rankings in academia can feel like SquidGames at times, but what truly matters is people, mentorship, and the craft itself.
What makes a bad researcher
Bad research habits are easy to spot: over-competition, turf-guarding, incremental work, rigidity, and a lack of intellectual flexibility. Bad science follows bad incentives such as benchmarks over ideas, and performance over understanding. These days the pressure to run endless evaluations has distorted the research and publishing process. Too many papers now stage elaborate experiments to impress reviewers instead of illuminating them with insights. Historically, the best work always stood on its own, by its simplicity and clarity.
Onboarding and Departmental Support
Advisor fit is crucial, and students should be free to explore before committing. Early rotations and cohort boot camps, which Aleksey mentioned is common in biomedical programs, help build both skills and faculty connections. Unfortunately, computer science still lacks this scaffolding. Industry treats onboarding as an investment, with structured mentorship, regular check-ins, and clear expectations. Academia, by contrast, seems to treat the absence of onboarding as a filtering mechanism. New PhD students are frequently left on their own for months, without direction, feedback, or a sense of belonging. Even small rituals (weekly meetings, mentorship pairings, consistent feedback) could change and catch struggling/blocked students early rather than years later.
Open Source Is Not Just Code: It’s Integrity
From Text to Token: How Tokenization Pipelines Work
Master ClickHouse® array functions for data manipulation and analysis
A guide to ClickHouse® deployment options
How to perform case-insensitive string matching using ILIKE in ClickHouse®
Follow these steps to optimize your ClickHouse® cluster for peak performance
How to stream Kafka topics to ClickHouse® in real-time
How to set up your ClickHouse® config.xml file (with examples)
From Text to Token: How Tokenization Pipelines Work
October 09, 2025
Advanced observability and troubleshooting with Amazon RDS event monitoring pipelines
A Guide to Redis Performance Best Practices
October 08, 2025
These are the best cloud-based managed ClickHouse® services in 2025
OLAP databases: what's new and what's best in 2025
A quick review of different ClickHouse® MCP servers
ClickHouse® schema migrations: how to avoid data loss in production
What is the fastest database for analytics? (2025 update)
OLTP vs OLAP: when to use each (and when to use both)
Triplit joins Supabase
Tiga: Accelerating Geo-Distributed Transactions with Synchronized Clocks
This paper (to appear at SOSP'25) is one of the latest efforts exploring the dream of a one-round commit for geo-replicated databases. TAPIR tried to fuse concurrency control and consensus into one layer. Tempo and Detock went further using dependency graphs.
Aleksey and I did our usual thing. We recorded our first blind read of the paper. I also annotated a copy while reading, which you can access here.
We liked the paper overall. This is a thoughtful piece of engineering, not a conceptual breakthrough. It uses future timestamps to align replicas in a slightly new way, and the results are solid. But the presentation needs refinement and stronger formalization. (See our livereading video about how these problems manifested themselves.) Another study to add to my survey, showing how, with modern clocks, time itself is becoming a coordination primitive.
The Big Idea
Tiga claims to do strictly serializable, fault-tolerant transactions in one wide-area round trip (1-WRTT) most of the time by predicting/tracking the future commit times of the transactions. Instead of waiting for messages to arrive and then ordering them, Tiga assigns each transaction a future timestamp at submission.
If all goes well, the transaction arrives before that timestamp at all replicas, waits until the local clock catches up, and then executes in order.
There is no dependency graph to track. Synchronized clocks and flight-of-message prediction promise to still get us strict serializability with 1-WRTT for most cases. Well, at least for more cases than the competition. You don't need to outrun the bear, but just the other campers.
This is essentially the Deadline-Ordered Multicast (DOM) idea from the Nezha paper. Figures 1–2 in the paper show the contrast with Tapir. Tapir commits optimistically and fails when transactions arrive in different orders at different regions. Tiga fixes this by giving both transactions predetermined timestamps: all servers delay execution until their clocks reach those timestamps, ensuring consistent order.
Tiga also merges consensus and concurrency control into a single timestamp-based protocol. I think the "Unanimous 2PC: Fault-tolerant Distributed Transactions Can be Fast and Simple" is a very relevant protocol to compare with here, but unfortunately, Tiga fails to cite U2PC.
Algorithm in a Nutshell
In the best case (steps 1-3), Tiga commits a transaction in 1-WRTT, essentially by predicting the correct global order instead of discovering it. If the prediction falters, steps 4-6 reconcile timestamps and logs, recovering correctness at the cost of another half to one full round trip.
1. Timestamp Initialization: The coordinator uses the measured one-way delays (OWDs) to each replica to predict when the transaction should arrive everywhere. It assigns the transaction a future timestamp t = send_time + max_OWD + Δ, where Δ is a small safety headroom (≈10 ms). This t represents the intended global serialization time. The coordinator then multicasts the transaction T and its timestamp to all shards.
2. Optimistic Execution: Upon receipt, each server buffers T in a priority queue sorted by timestamp. When the local clock reaches t, followers simply release T (they do not execute yet) while leaders execute T optimistically, assuming their local timestamp ordering will hold. The green bars in Figure 3 mark this optimistic execution phase.
3. Quorum Check of Fast Path: The coordinator collects fast-replies from a super quorum on each shard (the leader + f + ⌈f / 2⌉ followers). If the replies agree on the same log hash and timestamp, T is fast-committed. This completes the ideal 1-WRTT commit: half a round from coordinator to replicas, half back. (The other leader-inclusive paper I remember is Nezha, prior work to this one.)
4. Timestamp Agreement: Sometimes leaders execute with slightly different timestamps due to delays or clock drift. They then exchange their local timestamps to compute a common agreed value (the maximum). If all timestamps already match, the process costs 0.5 WRTT. If some leaders lag, another half round (total 1-WRTT) ensures alignment. If any executed with an older timestamp, that execution is revoked and T is re-executed at the new agreed time (slow path). This phase corresponds to the curved inter-leader arrows in the figure.
5. Log Synchronization: After leaders finalize timestamps, they propagate the consistent log to their followers. Followers update their logs to match the leader’s view and advance their sync-point. This ensures replicas are consistent before commit acknowledgment. The figure shows this as another 0.5 WRTT of leader-to-follower synchronization.
6. Quorum Check of Slow Path: Finally, the coordinator verifies that enough followers (≥ f) have acknowledged the synchronized logs. Once that quorum is reached, T is committed via the slow path. Even in this fallback case, the total latency stays within 1.5–2 WRTTs.
I am skipping details and optimizations. Leaders across many shards being located in the same datacenter/AZ is an optimization to improve the latency of timestamp-agreement (that this paper seem to have borrowed from the recent OSDI'25 Mako paper.) This then opens the door for a preventive flavor of the Tiga workflow as shown in Figure 6.
Evaluation Highlights
Running on Google Cloud across three regions, Tiga outperforms Janus, Tapir, and Calvin+ by 1.3–7x in throughput and 1.4–4x lower latency. In low-contention microbenchmarks, it easily sustains 1-WRTT commits. Under high contention, Calvin+ catches up somewhat but with 30% higher latency. Calvin+ replaces Calvin's Paxos-based consensus layer with Nezha, saving at least 1-WRTT in committing transactions. A lot of work must have gone into these evaluation results.