July 29, 2024
Building Data Pipelines With Vitess
July 25, 2024
The Hidden Cost of Data Movement
The Hidden Cost of Data Movement
Recently, Mark Raasveldt of DuckDB wrote an excellent post about why memory management is crucial for efficient data processing. In his post, he focuses on the cost of having data on disk and moving it to memory. After all, everyone knows that having data in memory is what you want. As Jim Gray famously said in 2006:
Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King
July 23, 2024
The State of Online Schema Migrations in MySQL
July 22, 2024
Optimizing aggregation in the Vitess query planner
An Interesting Optimization
July 19, 2024
Behind the scenes of Tinybird's big frontend refactor
July 16, 2024
Best practices for timestamps and time zones in databases
Announcing Supabase on JSR
Why German Strings are Everywhere
German Strings
Strings are conceptually very simple: It’s essentially just a sequence of characters, right? Why, then, does every programming language have their own slightly different string implementation? It turns out that there is a lot more to a string than “just a sequence of characters”1.
We’re no different and built our own custom string type that is highly optimized for data processing. Even though we didn’t expect it when we first wrote about it in our inaugural Umbra research paper, a lot of new systems adopted our format. They are now implemented in DuckDB, Apache Arrow, Polars, and Facebook Velox.