a curated list of database news from authoritative sources

December 12, 2023

December 11, 2023

December 08, 2023

How design works at Supabase

The transformative journey of Supabase's Design team and its unique culture to enhance the output and quality of the entire company.

December 05, 2023

Why is Jepsen Written in Clojure?

People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp?

Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it easier to write correct concurrent programs, and the language and runtime have excellent concurrency support: real threads, promises, futures, atoms, locks, queues, cyclic barriers, all of java.util.concurrent, etc. I also considered languages (like Haskell) with more rigorous control over side effects, but decided that Clojure’s less-dogmatic approach was preferable.

Because Jepsen tests databases, it needs broad client support. Almost every database has a JVM client, typically written in Java, and Clojure has decent Java interop.

Because testing is experimental work, I needed a language which was concise, adaptable, and well-suited to prototyping. Clojure is terse, and its syntactic flexibility–in particular, its macro system–work well for that. In particular the threading macros make chained transformations readable, and macros enable re-usable error handling and easy control of resource scopes. The Clojure REPL is really handy for exploring the data a test run produces.

Tests involve representing, transforming, and inspecting complex, nested data structures. Clojure’s data structures and standard library functions are possibly the best I’ve ever seen. I also print a lot of structures to the console and files: Clojure’s data syntax (EDN) is fantastic for this.

Because tests involve manipulating a decent, but not huge, chunk of data, I needed a language with “good enough” performance. Clojure’s certainly not the fastest language out there, but idiomatic Clojure is usually within an order of magnitude or two of Java, and I can shave off the difference where critical. The JVM has excellent profiling tools, and these work well with Clojure.

Jepsen’s (gosh) about a decade old now: I wanted a language with a mature core and emphasis on stability. Clojure is remarkably stable, both in terms of JVM target and the language itself. Libraries don’t “rot” anywhere near as quickly as in Scala or Ruby.

Clojure does have significant drawbacks. It has a small engineering community and no (broadly-accepted, successful) static typing system. Both of these would constrain a large team, but Jepsen’s maintained and used by only 1-3 people at a time. Working with JVM primitives can be frustrating without dropping to Java; I do this on occasion. Some aspects of the polymorphism system are lacking, but these can be worked around with libraries. The error messages are terrible. I have no apologetics for this. ;-)

I prototyped Jepsen in a few different languages before settling on Clojure. A decade in, I think it was a pretty good tradeoff.

December 01, 2023

What is HTAP?

Learn what HTAP is, how HTAP compares to OLAP and OLTP, and some pros and cons of HTAP.

Automatic CLI login

Explore the technical implementation and security measures behind CLI's new automatic login feature.

November 28, 2023

The Infamous ORDER BY LIMIT Query Optimizer Bug

Which is faster: LIMIT 1 or LIMIT 20? Presumably, fetching less rows is faster than fetching more rows. But for 16 years (since 2007) the MySQL query optimizer has had a “bug”† that not only makes LIMIT 1 slower than LIMIT 20 but can also make the former a table scan, which tends to cause problems. This happened last week where I work, and although MySQL DBAs are familiar with this bug, I’m writing this blog post for developers to more clearly illustrate and explain what’s going on and why because it’s really counterintuitive.

Introducing Insights Anomalies

This new update to PlanetScale Insights introduces smart query monitoring to detect slower than expected queries in your database.

Tinybird expands self-service real-time analytics to AWS

Starting today, our self-service customers can select Amazon Web Services as their deployment environment when creating a new Tinybird Workspace. Tinybird is already trusted by AWS customers, including FanDuel, Factorial, and Canva.

November 21, 2023

Are Vector Databases Dead?

This year vector databases have sprung up like mushrooms to enable applications to retrieve context based on semantic search. A large portion of these applications have used the retrieved context to augment the ability of large language models (LLMs) in a pattern known as RAG. On November 7th OpenAI released its Assistants API, enabling the implementation of AI chat interfaces with context retrieval without needing a separate message store or vector database. Does this new API make vector databases obsolete?

Build AI Chat with Convex Vector Search

Convex is a full-stack development platform and cloud database, including built-in vector search. In this third post in our [series](https://stack.convex.dev/ai-chat-using-openai-assistants-api), we’ll build an AI-powered chat interface using Convex, with our own message storage and context retrieval.

Build AI Chat with OpenAI's Assistants API

On November 7th OpenAI released its Assistants API, enabling chat bot with context retrieval implementations without needing a messages or vector database. In this post, we’ll cover how to leverage this API to build a fully functioning AI chat interface.

Webhook security: a hands-on guide

Learn what went into building PlanetScale webhooks from a security perspective. This article covers SSRF, webhook validation, DDoS, and more.