December 05, 2024
High Performance Disk
Notes On: Disaggregated OLTP Systems
December 04, 2024
Supabase Cron
Supabase CLI v2: Config as Code
December 03, 2024
Building Real-Time Live Sports Viewer Analytics with Tinybird and AWS
Supabase Edge Functions: Introducing Background Tasks, Ephemeral Storage, and WebSockets
The History of the Decline and Fall of In-Memory Database Systems
The History of the Decline and Fall of In-Memory Database Systems
In the early 2010s, the drop in memory prices combined with an overall increase in the reliability of computer hardware fueled a minor revolution in the world of database systems. Traditionally, slow but durable magnetic disk storage was the source of truth for a database system. Only when data needed to be analyzed or updated would it be briefly cached in memory by the buffer manager. And as memory got bigger and faster, the access latency of magnetic disks quickly became a bottleneck for many systems.
December 02, 2024
Supabase AI Assistant v2
December 01, 2024
Threads Won't Take You South of Market
In June 2023, when Threads announced their plans to federate with other Fediverse instances, there was a good deal of debate around whether smaller instances should allow federation or block it pre-emptively. As one of the admins of woof.group, I wrote about some of the potential risks and rewards of federating with Threads. We decided to wait and see.
In my queer and leather circles, Facebook and Instagram have been generally understood as hostile environments for over a decade. In 2014, their “Real Name” policy made life particularly difficult for trans people, drag queens, sex workers, and people who, for various reasons, needed to keep their real name disconnected from their queer life. My friends have been repeatedly suspended from both platforms for showing too much skin, or using the peach emoji. Meta’s moderation has been aggressive, opaque, and wildly inconsistent: sometimes full nudity is fine; other times a kiss or swimsuit is beyond the line. In some circles, maintaining a series of backup accounts in advance of one’s ban became de rigueur.
I’d hoped that federation between Threads and the broader Fediverse might allow a more nuanced spectrum of moderation norms. Threads might opt for a more conservative environment locally, but through federation, allow their users to interact with friends on instances with more liberal norms. Conversely, most of my real-life friends are still on Meta services—I’d love to see their posts and chat with them again. Threads could communicate with Gay Fedi (using the term in the broadest sense), and de-rank or hide content they don’t like on a per-post or per-account basis.
This world seems technically feasible. Meta reports 275 million Monthly Active Users (MAUs), and over three billion accross other Meta services. Fediverse has something like one million MAUs across various instances. This is not a large jump in processing or storage; nor would it seem to require a large increase in moderation staff. Threads has already committed to doing the requisite engineering, user experience, and legal work to allow federation across a broad range of instances. Meta is swimming in cash.
All this seems a moot point. A year and a half later, Threads is barely half federated. It publishes Threads posts to the world, but only if you dig in to the settings and check the “Fediverse Sharing” box. Threads users can see replies to their posts, but can’t talk back. Threads users can’t mention others, see mentions from other people, or follow anyone outside Threads. This may work for syndication, but is essentially unusable for conversation.
Despite the fact that Threads users can’t follow or see mentions from people on other instances, Threads has already opted to block a slew of instances where gay & leather people congregate. Threads blocks hypno.social, rubber.social, 4bear.com, nsfw.lgbt, kinkyelephant.com, kinktroet.social, barkclub.xyz, mastobate.social, and kinky.business. They also block the (now-defunct) instances bear.community, gaybdsm.group, and gearheads.social. They block more general queer-friendly instances like bark.lgbt, super-gay.co, gay.camera, and gaygeek.social. They block sex-positive instances like nsfwphotography.social, nsfw.social, and net4sw.com. All these instances are blocked for having “violated our Community Standards or Terms of Use”. Others like fisting.social, mastodon.hypnoguys.com, abdl.link, qaf.men, and social.rubber.family, are blocked for having “no publicly accessible feed”. I don’t know what this means: hypnoguys.social, for instance, has the usual Mastodon publically accessible local feed.
It’s not like these instances are hotbeds of spam, hate speech, or harassment: woof.group federates heavily with most of the servers I mentioned above, and we rarely have problems with their moderation. Most have reasonable and enforced media policies requiring sensitive-media flags for genitals, heavy play, and so on. Those policies are generally speaking looser than Threads (woof.group, for instance, allows butts!) but there are plenty of accounts and posts on these instances which would be anodyne under Threads’ rules.
I am shocked that woof.group is not on Threads’ blocklist yet. We have similar users who post similar things. Our content policies are broadly similar—several of the instances Threads blocks actually adopted woof.group’s specific policy language. I doubt it’s our size: Threads blocks several instances with less than ten MAUs, and woof.group has over seven hundred.
I’ve been out of the valley for nearly a decade, and I don’t have insight into Meta’s policies or decision-making. I’m sure Threads has their reasons. Whatever they are, Threads, like all of Meta’s services, feels distinctly uncomfortable with sex, and sexual expression is a vibrant aspect of gay culture.
This is part of why I started woof.group: we deserve spaces moderated with our subculture in mind. But I also hoped that by designing a moderation policy which compromised with normative sensibilities, we might retain connections to a broader set of publics. This particular leather bar need not be an invite-only clubhouse; it can be a part of a walkable neighborhood. For nearly five years we’ve kept that balance, retaining open federation with most all the Fediverse. I get the sense that Threads intends to wall its users off from our world altogether—to make “bad gays” invisible. If Threads were a taxi service, it wouldn’t take you South of Market.
OrioleDB Public Alpha
November 30, 2024
Building a distributed log using S3 (under 150 lines of Go)
RocksDB on a big server: LRU vs hyperclock, v2
This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a large server to show the speedup from the hyperclock block cache implementation for different concurrency levels with RocksDB 9.6. Here I share results from the same server and different (old and new) RocksDB releases.
Results are amazing on a large (48 cores) server with 40 client threads
- ~2X more QPS for range queries with hyperclock
- ~3X more QPS for point queries with hyperclock
I used RocksDB versions 6.0.2, 6.29.5, 7.0.4, 7.6.0, 7.7.8, 8.5.4, 8.6.7, 9.0.1, 9.1.2, 9.3.2, 9.5.2, 9.7.4 and 9.9.0. Everything was compiled with gcc 11.4.0.
The --cache_type argument selected the block cache implementation:
- lru_cache was used for versions 7.6 and earlier. Because some of the oldest releases don't support --cache_type I also used --undef_params=...,cache_type
- hyper_clock_cache was used for versions 7.7 through 8.5
- auth_hyper_clock_cache was used for versions 8.5+
Hardware
The server is an ax162-s from Hetzner with an AMD EPYC 9454P processor, 48 cores, AMD SMT disabled and 128G RAM. The OS is Ubuntu 22.04. Storage is 2 NVMe devices with SW RAID 1 and ext4.
Benchmark
Overviews on how I use db_bench are here and here.
All of my tests here use a CPU-bound workload with a database that is cached by RocksDB and the benchmark is run for 40 threads.
I focus on the read-heavy benchmark steps:
- revrangeww (reverse range while writing) - this does short reverse range scans
- fwdrangeww (forward range while writing) - this does short forward range scans
- readww (read while writing) - this does point queries
For each of these there is a fixed rate for writes done in the background and performance is reported for the reads. I prefer to measure read performance when there are concurrent writes because read-only benchmarks with an LSM suffer from non-determinism as the state (shape) of the LSM tree has a large impact on CPU overhead and throughput.
Results
All results are in this spreadsheet and the performance summary is here.
The graph below shows relative QPS which is: (QPS for my version / QPS for RocksDB 6.0.2) and the results are amazing:
- ~2X more QPS for range queries with hyperclock
- ~3X more QPS for point queries with hyperclock
The average values for vmstat metrics provide more detail on why hyperclock is so good for performance. The context switch rate drops dramatically when it is enabled because there is much less mutex contention. The user CPU utilization increases by ~1.6X because more useful work can get done when there is less mutex contention.
* cs - context switches per second per vmstat
* us - user CPU utilization per vmstat
* sy - system CPU utilization per vmstat
* id - idle CPU utilization per vmstat
* wa - wait CPU utilization per vmstat
* version - RocksDB version
cs us sy us+sy id wa version
1495325 50.3 14.0 64.3 18.5 0.1 7.6.0
2360 82.7 14.0 96.7 16.6 0.1 9.9.0
November 28, 2024
1 million page views
I was delighted to notice this morning that this site has recently passed 1M page views. And since Murat wrote about his 1M page view accomplishment at the time, I felt compelled to now too.
I started regularly blogging in 2018. For some reason I decided to write a blog post every month. And while I have definitely skipped a month or two here or there, on average I've written 2 posts per month.
Tooling
Since at least 2018 this site has been built with a static site generator. I might have used a 3rd-party generator at one point, but for as long as I can remember most of this site has been built with a little Python script I wrote.
I used to get so pissed when static site generators would pointlessly change their APIs and I'd have to make pointless changes. I have not had to make any significant changes to my build code in many years.
I hosted the site itself on GitHub Pages for many years. But I wanted more flexibility with subdomains (ultimately not something I liked) and the ability to view server-side logs (ultimately not something I ever do).
I think this site is hosted on an OVH machine now. But at this point it is inertia keeping me there. If you have no strong feelings otherwise, GitHub Pages is perfect.
I used to use Google Analytics but then they shut down the old version. The new version was incredibly confusing to use. I could not find some very basic information. So I moved to Fathom which has been great.
I used to track all subscribers in a Google Form and bcc them but this became untenable eventually after 1000 subscribers due to GMail rate limits. I currently use MailerLite for subscriptions and sending email about new posts. But this is an absolutely terrible service. They proxy all links behind a domain that adblockers hate and they also visually shorten the URL so you can't copy the text of the URL.
I just want a service that has a hosted form for collecting
subscribers and a <textarea>
that lets me dump raw HTML and send
that as an email to my subscribers. No branding, no watermarks, no
link proxying. This apparently doesn't exist. I am too lazy to
figure out Amazon SES so I stick with MailerLite for now.
Evolution
In the beginning I talked about little interpreters in JavaScript, about programming languages, about Scheme. I was into functional programming. Over time I moved into little emulators and bytecode VMs. And for the last four years I became obsessed with databases and distributed systems.
I have almost always written about little projects to teach myself a concept. Writing a bytecode VM in Rust, emulating a subset of x86 in Go, implementing Raft in Go, implementing MVCC isolation levels in Go, and so on.
So many times when I tried to learn a concept I would find blog posts with only partial code. The post would link to a GitHub repo that, by the time I got to the post, had evolved significantly beyond what was described in the post. The repo code had by then become too complex for me to follow. So I was motivated to write minimal implementations and walk through the code in its entirety.
I have also had a blast writing survey posts such as how various databases execute expressions, analyzing non-V8 JavaScript implementations, how various programming language implementations parse code, and how various database systems build on top of key-value databases.
The last two posts have even each been cited in a research paper (here and here).
Editing
In terms of quality, my single greatest trick is to read the post out loud. Multiple times. Notice parts that are awkward or unclear and rewrite them.
My second greatest trick is to ask friends for review. Some posts like an intuition for distributed consensus and a write-ahead log is not a universal part of durability would simply not have been correct or credible without my fantastic reviewers. And I'm proud to have played that part a few times in turn.
We also have a fantastic #writing-and-drafts channel on the Software Internals Discord where folks (myself occasionally included) come for post review.
Context
I've lost count of the total number of times that these posts have been on the front page of Hacker News or that a tweet announcing a post has reached triple digits likes. I think I've had 9 posts on the front of HN this year. I do know that my single best year for HN was 12 months between 2022-2023 where 20 of my posts or projects were on the front page.
Every time a post does well there's a part of me that worries that I've peaked. But the way to deal with this has been to ignore that little voice and to just keep learning new things. I haven't stopped finding things confusing yet, and confusion is a phenomenal muse.
And also to, like, go out and meet friends for dinner, run meetups, run book clubs, chat with you fascinating internet strangers, play volleyball, and so on.
It's always been about cultivating healthy obsessions.
Benediction
In parting, I'll remind you:
- It is definitely worth writing about, whatever "it" is
- You're not writing enough
- And some ideas for posts I want to hear about if you write about them
I wrote a little reflection on writing after noticing I passed 1M page views this morning.https://t.co/eIlMDVHNht pic.twitter.com/EKSiiDUz5G
— Phil Eaton (@eatonphil) November 28, 2024