a curated list of database news from authoritative sources

November 18, 2020

Postgres Views

Creating and using a view in PostgreSQL.

November 11, 2020

The impact of management teams as a decision-making group, in startups and enterprise

Ambitious companies form management teams at every level above you, sometimes including you. Management teams meet periodically and have private chat rooms. They discuss customers, product and organizational direction. Sometimes discussions are well documented and periodically public. Sometimes decisions are poorly telegraphed out.

Management teams do no inherent harm in a company with customers; employees outside of the management team can unearth customer usage data to discover meaningful places to contribute. For example, graphing historic server logs to discover slowest requests, figure out why and how to fix. Or even just paying attention to the most frequent questions sales asks and finding ways to clarify. (All of this under the assumption that even when there is solid product direction, good employees tend to have extra time at work and want to make good use of it.)

For the first few years even in a well-funded startup with solid founders, there are few customers. Even under a solid product team, the product direction is not yet completely clear. The management team includes founders and non-engineering executives. As a decision making group they are opaque. Employees outside the management team face a barrier in finding ways to meaningful contribute. Ambitious, dedicated folks outside the team leave.

So what?

It is not clear to me how the natural (and not inherently bad) concept of management teams attracts and retains ambitious, dedicated non-founders at small companies. Maybe disenfranchisement is not important, or even necessary.

Or maybe management teams as a decision-making group are too easily a substitute for developing a grassroots culture of collaboration and trust between marketing, sales, product and development.

November 09, 2020

Vitess Operator for Kubernetes

Introduction # In this blog, I would like to explore Vitess Operator for Kubernetes. This post demonstrates the sample implementation of Vitess in Kubernetes topology. I also explore common DBA tasks by demonstrating how they are handled in the Vitess ecosystem. Vitess, out of the box, comes with a lot of tools and utilities that one has to either incorporate or develop to manage MySQL topology. Let’s take a look at the capabilities of Vitess in these areas and demonstrate how they are performed under the operator realm.

November 03, 2020

Streaming Vitess at Bolt

Previously posted on link at Nov 3, 2020. Traditionally, MySQL has been used to power most of the backend services at Bolt. We've designed our schemas in a way that they're sharded into different MySQL clusters. Each MySQL cluster contains a subset of data and consists of one primary and multiple replication nodes. Once data is persisted to the database, we use the Debezium MySQL Connector to capture data change events and send them to Kafka.

November 02, 2020

Query Metrics Requirements for MySQL

Let’s answer a question which, to my knowledge, has never been systematically addressed: What are the requirements for a production-ready query metrics app at scale? I am uniquely qualified to answer that because I have written four query metrics apps, two of which are the open-source standard for MySQL: pt-query-digest and the original (v1) code behind the query metrics/analysis part of Percona Monitoring and Management. I’ve also published a couple of packages related to query metrics: go-mysql/slowlog and go-mysql/query.

October 30, 2020

C++ Concurrency Model on x86 for Dummies

Since C++11, multi-threaded C++ code has been governed by a rigorous memory model. The model allows implementing concurrent code such as low-level synchronization primitives or lock-free data structures in a portable fashion. To use the memory model, programmers need to do two things: First, they have to use the std::atomic type for concurrently-accessed memory locations. Second, each atomic operation requires a memory order argument with six options determining the concurrency semantics in terms of which re-orderings are allowed. (Some operations even allow specifying two memory orders!)

While there are a number of attempts to describe the model, I always found the full semantics very hard to understand and consequently concurrent code hard to write and reason about. And since we are talking about low-level concurrent code here, making a mistake (like picking the wrong memory order) can lead to disastrous consequences.

Luckily, at least on x86, a small subset of the full C++11 memory model is sufficient. In this post, I'll present such a subset that is sufficient to write high-performance concurrent code on x86. This simplification has the advantage that the resulting code is much more likely to be correct, without leaving any performance on the table. (On non-x86 platforms like ARM, code written based on this simplified model will still be correct, but might potentially be slightly slower than necessary.) 

There are only six things one needs to know to write high-performance concurrent code on x86.

1. Data races are undefined

If a data race occurs in C++, the behavior of the program is undefined. Let's unpack that statement. A data race can be defined as two or more threads accessing the same memory location with at least one of the accesses being a write. By default (i.e., without using std::atomic), the compiler may assume that no other thread is concurrently modifying memory. This allows the compiler to optimize the code, for example by reordering or optimizing away memory accesses. Consider the following example:

void wait(bool* flag) {
    while (*flag);
}


Because data races are undefined, the compiler may assume that the value behind the pointer flag is not concurrently modified by another thread. Using this assumption, gcc translates the code to a return statement while clang translates it to an infinite loop if flag is initially true. Both translations are likely not what the code intends. To avoid undefined code, it is necessary use std::atomic for variables where race conditions may happen:

void wait(std::atomic<bool>* flag) {
    while (*flag); // same as while(flag->load(std::memory_order_seq_cst));
}


*flag is equivalent to flag.load(std::memory_order_seq_cst), i.e., the default memory order is sequential consistency. Sequential consistency is the strongest memory order guaranteeing that atomic operations are executed in program order. The compiler is not allowed to reorder memory operations or optimize them away.

2. Sequentially-consistent loads are fast


Making the flag atomic may seem expensive, but luckily atomic loads are cheap on x86. Indeed, our wait function is translated to a simple loop with a simple MOV instruction, without any barrier/fence instruction. This is great as it means that on x86 an atomic, sequentially-consistent load can be just as fast as a normal load. It also means that on x86 there is no performance benefit of using any weaker memory order for atomic loads. For loads all memory orders are simply translated to MOV.

3. Sequentially-consistent stores are slow


While sequentially-consistent atomic loads are as cheap as normal loads, this is not the case for stores, as can be observed from the following example:

void unwait(std::atomic<bool>* flag) {
    *flag = false; // same as flag->store(false, std::memory_order_seq_cst);
}


As with atomic loads, atomic stores are sequentially consistent if no explicit memory order is specified. In clang and gcc 10, the store translates to an XCHG instruction rather than a MOV instruction (older gcc versions translate it to a MOV plus MFENCE.). XCHG and MFENCE are fairly expensive instructions but are required for sequentially consistent stores on x86. (The CPU's store buffer must be flushed to L1 cache to make the write visible to other threads through cache coherency.)

4. Stores that can be delayed can benefit from the release memory order


Because sequentially-consistent stores are fairly expensive, there are situations where a weaker memory order can improve performance. A common case is when the effect of a store can be delayed. The classical example is unlocking a mutex. The unlocking thread does not have to synchronously wait for the unlocking to become visible, but can continue executing other instructions. Another way of saying this is that it is correct to move instructions into the critical section, but not out of it. In C++, this weaker form of store consistency is available through the release memory order.

In our unwait example, the store can indeed be delayed, which is why we can use the release memory order for the store:

void unwait(std::atomic<bool>* flag) {
    flag->store(false, std::memory_order_release);
}


This code is translated to a simple MOV instruction, which means it can be as efficient as a non-atomic store.

5. Some atomic operations are always sequentially consistent


Besides loads and stores, std::atomic also offers the high-level atomic operations compare_exchange, exchange, add, sub, and, or, xor. On x86, these are always directly translated to sequentially-consistent CPU instructions. This means that there is no performance benefit from specifying weaker memory orders on any of these operations. (An atomic increment with release semantics would be useful in many situations, but alas is not available.)

6. Scalable code should avoid cache invalidations


I mentioned above that sequentially-consistent loads and release stores may be as cheap as non-atomic loads/store. Unfortunately, this is not always the case. Because CPUs have per-core caches, the performance of concurrent programs depends on the dynamic access pattern. Every store has to invalidate any cached copies of that cache line on other cores. This can cause parallel implementations to be slower than single-threaded code. Therefore, to write scalable code, it is important to minimize the number of writes to shared memory locations. A positive way of saying this is that as long as the program does not frequently write to memory locations that are frequently being read or written, the program will scale very well. (Optimistic lock coupling, for example, is a general-purpose concurrency scheme for synchronizing data structures that exploits this observation.)

Summary


The full C++ memory model is notoriously hard to understand. x86, on the other hand, has a fairly strong memory model (x86-TSO) that is quite intuitive: basically everything is in-order, except for writes, which are delayed by the write buffer. Exploiting x86's memory model, I presented a simplified subset of the C++ memory model that is sufficient to write scalable, high-performance concurrent programs on x86.

Supabase.js 1.0

We're releasing a new version of our Supabase client with some awesome new improvements.

October 27, 2020

Announcing Vitess 8.0

On behalf of the Vitess maintainers team I am pleased to announce the general availability of Vitess 8 for MySQL.

Announcing Vitess 8

On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 8. Major Themes # In this release, we have continued to make important improvements to the Vitess project with over 200 PRs in several areas. Some of the major bug fixes and changes in behaviors are documented in the Release Notes. Please read them carefully and report any issues via GitHub. We would like to highlight the following themes for this release.