a curated list of database news from authoritative sources

December 06, 2024

December 05, 2024

Supabase Queues

Durable Message Queues with Guaranteed Delivery in Postgres

High Performance Disk

Store up to 60 TB of data with 100x improved durability and 5x more IOPS

December 04, 2024

Supabase Cron

Schedule Recurring Jobs in Postgres

December 03, 2024

Building Real-Time Live Sports Viewer Analytics with Tinybird and AWS

Ever tried to show millions of viewers real-time stats about how many other people like them are watching the same event? It's a bit like trying to count grains of sand while they're being poured into your bucket. Fun times! Let's look at how to build this without breaking the bank (or your sanity). The Challenge: Fan Engagement at Massive Scale Imagine you're streaming a major live event and want to show each viewer some engaging stats: * How many people in their state are watching? * How

The History of the Decline and Fall of In-Memory Database Systems

The History of the Decline and Fall of In-Memory Database Systems

In the early 2010s, the drop in memory prices combined with an overall increase in the reliability of computer hardware fueled a minor revolution in the world of database systems. Traditionally, slow but durable magnetic disk storage was the source of truth for a database system. Only when data needed to be analyzed or updated would it be briefly cached in memory by the buffer manager. And as memory got bigger and faster, the access latency of magnetic disks quickly became a bottleneck for many systems.

December 02, 2024

Convex: The Database that Made Me Switch Careers

Today is my first day working for Convex šŸŽ‰, so I thought I’d take the opportunity to share my thoughts on why I decided to join and what excites me about the technology.

December 01, 2024

Threads Won't Take You South of Market

In June 2023, when Threads announced their plans to federate with other Fediverse instances, there was a good deal of debate around whether smaller instances should allow federation or block it pre-emptively. As one of the admins of woof.group, I wrote about some of the potential risks and rewards of federating with Threads. We decided to wait and see.

In my queer and leather circles, Facebook and Instagram have been generally understood as hostile environments for over a decade. In 2014, their ā€œReal Nameā€ policy made life particularly difficult for trans people, drag queens, sex workers, and people who, for various reasons, needed to keep their real name disconnected from their queer life. My friends have been repeatedly suspended from both platforms for showing too much skin, or using the peach emoji. Meta’s moderation has been aggressive, opaque, and wildly inconsistent: sometimes full nudity is fine; other times a kiss or swimsuit is beyond the line. In some circles, maintaining a series of backup accounts in advance of one’s ban became de rigueur.

I’d hoped that federation between Threads and the broader Fediverse might allow a more nuanced spectrum of moderation norms. Threads might opt for a more conservative environment locally, but through federation, allow their users to interact with friends on instances with more liberal norms. Conversely, most of my real-life friends are still on Meta services—I’d love to see their posts and chat with them again. Threads could communicate with Gay Fedi (using the term in the broadest sense), and de-rank or hide content they don’t like on a per-post or per-account basis.

This world seems technically feasible. Meta reports 275 million Monthly Active Users (MAUs), and over three billion accross other Meta services. Fediverse has something like one million MAUs across various instances. This is not a large jump in processing or storage; nor would it seem to require a large increase in moderation staff. Threads has already committed to doing the requisite engineering, user experience, and legal work to allow federation across a broad range of instances. Meta is swimming in cash.

All this seems a moot point. A year and a half later, Threads is barely half federated. It publishes Threads posts to the world, but only if you dig in to the settings and check the ā€œFediverse Sharingā€ box. Threads users can see replies to their posts, but can’t talk back. Threads users can’t mention others, see mentions from other people, or follow anyone outside Threads. This may work for syndication, but is essentially unusable for conversation.

Despite the fact that Threads users can’t follow or see mentions from people on other instances, Threads has already opted to block a slew of instances where gay & leather people congregate. Threads blocks hypno.social, rubber.social, 4bear.com, nsfw.lgbt, kinkyelephant.com, kinktroet.social, barkclub.xyz, mastobate.social, and kinky.business. They also block the (now-defunct) instances bear.community, gaybdsm.group, and gearheads.social. They block more general queer-friendly instances like bark.lgbt, super-gay.co, gay.camera, and gaygeek.social. They block sex-positive instances like nsfwphotography.social, nsfw.social, and net4sw.com. All these instances are blocked for having ā€œviolated our Community Standards or Terms of Useā€. Others like fisting.social, mastodon.hypnoguys.com, abdl.link, qaf.men, and social.rubber.family, are blocked for having ā€œno publicly accessible feedā€. I don’t know what this means: hypnoguys.social, for instance, has the usual Mastodon publically accessible local feed.

It’s not like these instances are hotbeds of spam, hate speech, or harassment: woof.group federates heavily with most of the servers I mentioned above, and we rarely have problems with their moderation. Most have reasonable and enforced media policies requiring sensitive-media flags for genitals, heavy play, and so on. Those policies are generally speaking looser than Threads (woof.group, for instance, allows butts!) but there are plenty of accounts and posts on these instances which would be anodyne under Threads’ rules.

I am shocked that woof.group is not on Threads’ blocklist yet. We have similar users who post similar things. Our content policies are broadly similar—several of the instances Threads blocks actually adopted woof.group’s specific policy language. I doubt it’s our size: Threads blocks several instances with less than ten MAUs, and woof.group has over seven hundred.

I’ve been out of the valley for nearly a decade, and I don’t have insight into Meta’s policies or decision-making. I’m sure Threads has their reasons. Whatever they are, Threads, like all of Meta’s services, feels distinctly uncomfortable with sex, and sexual expression is a vibrant aspect of gay culture.

This is part of why I started woof.group: we deserve spaces moderated with our subculture in mind. But I also hoped that by designing a moderation policy which compromised with normative sensibilities, we might retain connections to a broader set of publics. This particular leather bar need not be an invite-only clubhouse; it can be a part of a walkable neighborhood. For nearly five years we’ve kept that balance, retaining open federation with most all the Fediverse. I get the sense that Threads intends to wall its users off from our world altogether—to make ā€œbad gaysā€ invisible. If Threads were a taxi service, it wouldn’t take you South of Market.

November 30, 2024

RocksDB on a big server: LRU vs hyperclock, v2

This post show that RocksDB has gotten much faster over time for the read-heavy benchmarks that I use. I recently shared results from a large server to show the speedup from the hyperclock block cache implementation for different concurrency levels with RocksDB 9.6. Here I share results from the same server and different (old and new) RocksDB releases.

Results are amazing on a large (48 cores) server with 40 client threads

  • ~2X more QPS for range queries with hyperclock
  • ~3X more QPS for point queries with hyperclock

Software

I used RocksDB versions 6.0.2, 6.29.5, 7.0.4, 7.6.0, 7.7.8, 8.5.4, 8.6.7, 9.0.1, 9.1.2, 9.3.2, 9.5.2, 9.7.4 and 9.9.0. Everything was compiled with gcc 11.4.0.

The --cache_type argument selected the block cache implementation:

  • lru_cache was used for versions 7.6 and earlier. Because some of the oldest releases don't support --cache_type I also used --undef_params=...,cache_type
  • hyper_clock_cache was used for versions 7.7 through 8.5
  • auth_hyper_clock_cache was used for versions 8.5+

Hardware

The server is an ax162-s from Hetzner with an AMD EPYC 9454P processor, 48 cores, AMD SMT disabled and 128G RAM. The OS is Ubuntu 22.04. Storage is 2 NVMe devices with SW RAID 1 and ext4.

Benchmark

Overviews on how I use db_bench are here and here.

All of my tests here use a CPU-bound workload with a database that is cached by RocksDB and the benchmark is run for 40 threads.

I focus on the read-heavy benchmark steps:

  • revrangeww (reverse range while writing) - this does short reverse range scans
  • fwdrangeww (forward range while writing) - this does short forward range scans
  • readww (read while writing) - this does point queries

For each of these there is a fixed rate for writes done in the background and performance is reported for the reads. I prefer to measure read performance when there are concurrent writes because read-only benchmarks with an LSM suffer from non-determinism as the state (shape) of the LSM tree has a large impact on CPU overhead and throughput.

Results

All results are in this spreadsheet and the performance summary is here.

The graph below shows relative QPS which is: (QPS for my version / QPS for RocksDB 6.0.2) and the results are amazing:

  • ~2X more QPS for range queries with hyperclock
  • ~3X more QPS for point queries with hyperclock

The average values for vmstat metrics provide more detail on why hyperclock is so good for performance. The context switch rate drops dramatically when it is enabled because there is much less mutex contention. The user CPU utilization increases by ~1.6X because more useful work can get done when there is less mutex contention.

legend
* cs - context switches per second per vmstat
* us - user CPU utilization per vmstat
* sy - system CPU utilization per vmstat
* id - idle CPU utilization per vmstat
* wa - wait CPU utilization per vmstat
* version - RocksDB version

cs      us      sy      us+sy   id      wa      version
1495325 50.3    14.0    64.3    18.5    0.1     7.6.0
2360    82.7    14.0    96.7    16.6    0.1     9.9.0