a curated list of database news from authoritative sources

January 20, 2025

Building a Resend analytics dashboard

Resend is a developer platform for sending transactional and marketing emails. If you don’t do much with email, well done, you’ve won in life. But if you do, Resend is probably going to save you a bunch of headaches. Once you’re set up to send with Resend, how do you know you’re “doing email right”? Resend captures events about the status of your emails - when it's sent, delivered, bounced, etc. - and offers webhooks so you can push these events elsewhere. Sending these webhooks to Tinybird’s E

January 17, 2025

Outgrowing Postgres: Handling growing data volumes

Managing terabyte-scale data in Postgres? From basic maintenance to advanced techniques like partitioning and materialized views, learn how to scale your database effectively. Get practical advice on optimizing performance and knowing when it's time to explore other options.

Outgrowing Postgres: Handling growing data volumes

Managing terabyte-scale data in Postgres? From basic maintenance to advanced techniques like partitioning and materialized views, learn how to scale your database effectively. Get practical advice on optimizing performance and knowing when it's time to explore other options.

January 16, 2025

January 14, 2025

Use of Time in Distributed Databases (part 5): Lessons learned

This concludes our series on the use of time in distributed databases, where we explored how use of time in distributed systems evolved from a simple ordering mechanism to a sophisticated tool for coordination and performance optimization.

A key takeaway is that time serves as a shared reference frame that enables nodes to make consistent decisions without constant communication. While the AI community grapples with alignment challenges, in distributed systems we have long confronted our own fundamental alignment problem. When nodes operate independently, they essentially exist in their own temporal universes. Synchronized time provides the global reference frame that bridges these isolated worlds, allowing nodes to align their events and states coherently.

At its core, synchronized time serves as an alignment mechanism in distributed systems. As explored in Part 1, synchronized clocks enable nodes to establish "common knowledge" through a shared time reference, which is powerful for conveying the presence (not just absence) of information like lease validity without requiring complex two-way communication protocols to account for message delays and asymmetry.

This alignment manifests in several powerful ways. When nodes need to agree on a consistent snapshot of the database, time provides a natural coordination point. So a big benefit of aligning timelines is effortless MVCC consistent snapshots. Commit timestamping creates transaction ordering, and version management uses timestamps to track different versions of data in MVCC systems. Snapshot selection uses timestamps to identify consistent read points at each node independently. Many of the systems we reviewed, including Spanner, CockroachDB, Aurora Limitless and DSQL, demonstrate the power of consistent global reads using timestamps.

Alignment also enables effective conflict detection. By comparing timestamps, systems can detect concurrent modifications—a foundational element of distributed transactions. This particularly benefits OCC-based systems and those using snapshot isolation, including DynamoDB, MongoDB, Aurora Limitless, and DSQL.

Another benefit of alignment is to serve as a fencing mechanism, effectively creating "one-way doors" that prevent stale operations from causing inconsistencies. A simple realization is leases. Aurora Limitless uses consistency leases that expire after a few seconds to prevent stale reads from zombie nodes. This is a classic example of time-based fencing - once a node's lease expires, it cannot serve reads until it obtains a new lease.  Aurora DSQL employs time for fencing its adjudicator service. Instead of using Paxos for leader election, adjudicators use time ranges to fence off their authority over key ranges. When an adjudicator takes over responsibility for a key range, it establishes a time-based fence that prevents older adjudicators from making decisions about that range. These applications all stem from time's fundamental role as a shared reference frame, separating past from frothiness of the present. 

Another benefit of alignment is speculative execution to achieve better performance on the common case while still maintaining consistency/correctness across all scenarios. Modern systems increasingly embrace time-based speculation. Nezha introduced Deadline-Ordered Multicast (DOM) primitive that assigns deadline timestamps to message-requests and only delivers them after the deadline is reached. This creates a temporal fence ensuring consistent ordering across receivers while providing a speculation window for execution preparation. Aurora DSQL uses predictive commit timing, where it sets commit timestamps slightly in the future. This is a form of speculation that maintains read performance by ensuring reads are never blocked by writes while still preserving consistency guarantees.

The key in both cases is that time serves as a performance improvement without bearing the burden of correctness. Ultimately, we still need information transfer to ensure that the correct scenario played out, and there were no omitted messages, failures, etc. When our speculation fails, a post validation/recovery/cleanup mechanism kicks in. This approach introduces some metastability risk, so careful implementation is required. 


Correctness

That brings us to correctness. Barbara Liskov's 1991 principle remains fundamental: leverage time for performance optimization, not correctness. This separation of concerns appears consistently in modern systems. Google's Spanner uses TrueTime for performance optimization while relying on traditional locking and consensus for correctness. DynamoDB leverages timestamps to streamline transaction processing but doesn't depend on perfect synchronization for correctness. Similarly, Accord and Nezha employ synchronized clocks for fast-path execution while maintaining correctness through fallback mechanisms.

A critical aspect when using time is the requirement for monotonic clocks, as clock regression would break causality guarantees for reads. Hybrid Logical Clocks (HLC) provide a good insurance in NTP-based systems. By combining physical clock time with the strengths of Lamport logical clocks, HLC provides this monotonicity. So rather than relying solely on physical time, most of the contemporary databases use HLC to merge physical and logical time to get the benefits of both.

Modern clocks provide dynamically bounded uncertainty, by providing lower and upper time intervals, to account for imperfect synchronization. These clockbound APIs also help prevent putting unfounded confidence for a given timestamp. When using these tightly synchronized clocks with clockbound APIs, two things need to go wrong for correctness to be violated:

  • the clock/timestamp needs to go wrong, and
  • the clockbound API should still continue showing high confidence in that wrong clock, rather than having increased bounds.

Ok, let's say these HLC or clockbound safeguards failed, what could go wrong? Obviously, you could read a stale value from an MVCC system, if you demand to read with a timestamp in the past. On the other hand, the correctness on the write path of the distributed databases is well-preserved by locks. A full study of correctness is needed to get a more comprehensive understanding, but my understanding is that all of these systems maintain data integrity even when clock synchronization falters. Performance decreases, but write-path correctness persists. 


Future trends

Several patterns are emerging. There is definitely an accelerating trend of adoption of synchronized clocks in distributed databases, as our survey shows. We also see that distributed systems and databases are becoming more sophisticated about their use of time. Early systems leveraged synchronized clocks primarily for read-only transactions and snapshots, reaping low-hanging fruit. Over time, these systems evolved to tackle read-write transactions and employ more advanced techniques and synchronized clocks take on increasingly critical roles in database design. By aligning nodes to a shared reference frame, synchronized time eliminate the need for some coordination message exchanges across nodes/partitions, and help cut down latency and boost throughput in distributed multi-key operations.

We also see time-based speculation gaining traction in distributed databases. Maybe as another level of improvement, we will see systems automatically adjusting their use of time-based speculation depending on the smoothness of the operating conditions. The more a system relies on synchronized time for common-case optimization, the more vulnerable it becomes to failure of the common-case scenario or the degradation of clock synchronization. So we need smoother transitions between modes for speculation-based systems. 

More research is needed on the tradeoffs between time synchronization precision and isolation guarantees. Stricter isolation levels like external consistency require tighter clock synchronization and longer commit wait times. This raises important questions about the value proposition of strict-serializability. Snapshot isolation hits a really nice tradeoff in the isolation space, and it would be interesting to put more research into isolation-performance tradeoffs using synchronized clocks. 

On the infrastructure front, cloud time services like AWS TimeSync grow increasingly crucial as databases migrate to cloud environments. These services offer better precision and availability to support the use of time in distributed systems.

January 13, 2025

Evaluating vector indexes in MariaDB and pgvector: part 2

This post has results from the ann-benchmarks with the fashion-mnist-784-euclidean dataset for MariaDB and Postgres (pgvector) with concurrent queries (--batch). My previous post has results when not using concurrent queries. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.

I compare MariaDB with pgvector because I respect the work that the Postgres community has done to support vector search. And I am happy to report that MariaDB has also done a great job on this. While I don't know the full story of the development effort, this feature came from the MariaDB Foundation and the community and it is wonderful to see that collaboration.

Performance for MariaDB is excellent
  • peak QPS for a given recall target is much better than pgvector
  • time to create the index to achieve a given recall target is almost always much better than pgvector

Benchmark

Part 1 has all of the details.

I am using the ann-benchmarks via my fork of a fork of a fork at this commit. I added --batch to the command lines listed in part 1 to repeat the benchmark using concurrent queries. Note that parallel index create was disabled for Postgres and isn't (yet) supported by MariaDB.

With --batch there is one concurrent query per CPU core and my server has 8 cores.

Files related to these tests are archived here.

Results: QPS vs recall graphs

The recall vs QPS graph is created by: python3 plot.py --dataset fashion-mnist-784-euclidean --batch. This chart is very similar to the chart in part 1. One difference is that the peak QPS for MariaDB and Postgres are ~6000/s and ~4000/s there vs ~22000/s and ~15000/s here.

The results below for MariaDB are excellent. It gets more QPS than pgvector at a given recall target.


Results: create index

I am still trying to figure out how to present this data. All of the numbers are here for the time to create an index and the size of the index. The database configs for Postgres and MariaDB are shared above, and parallel index create is disabled by the config for Postgres (and not supported yet by MariaDB). The summary is:
  • Indexes have a similar size with MariaDB and Postgres with halfvec. The Postgres index without halfvec is about 2X larger.
  • Time to create the index for Postgres is similar with and without halfvec
  • Time to create the index for MariaDB is less than for pgvector. Perhaps the best way to compare this is the time to create the index for a similar point on the QPS/recall graph (see the last section of this blog post)
Results: best QPS for a given recall

Many benchmark results are marketed via peak performance (max throughput or min response time) but these are usually constrained optimization problems -- determine peak performance that satisfies some SLA. And the SLA might be response time or efficiency (cost).

With ann-benchmarks the constraint might be recall where you seek the best QPS that satisfies a recall target. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms (MariaDB, pgvector with float32, pgvector with float16).

For all cases below except the first (recall = 1.000) the time to create the index is less with MariaDB.

For all cases below the best QPS at a given recall target is much better, sometimes 3X better, with MariaDB than with pgvector.

Legend
* recall & QPS - results from the benchmark
* isecs - number of seconds to create the index for M and ef_cons (ef_cons)

Best QPS with recall = 1.000
recall  QPS     isecs   algorithm
1.000    4727   115.9   PGVector(m=16, ef_cons=256, ef_search=120)
1.000    5479    98.6   PGVector_halfvec(m=16, ef_cons=192, ef_search=120)
1.000   13828   108.5   MariaDB(m=32, ef_search=10)

Best QPS with recall >= 0.99
recall  QPS     isecs   algorithm
0.990   10704    71.2   PGVector(m=16, ef_cons=96, ef_search=20)
0.991   11377    90.1   PGVector_halfvec(m=16, ef_cons=256, ef_search=20)
0.995   17642    24.5   MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.98
recall  QPS     isecs   algorithm
0.985   10843    51.0   PGVector(m=16, ef_cons=32, ef_search=20)
0.984   11749    44.5   PGVector_halfvec(m=16, ef_cons=32, ef_search=20)
0.995   17642    24.5   MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.97
recall  QPS     isecs   algorithm
0.970   12836    35.1   PGVector(m=8, ef_cons=64, ef_search=20)
0.973   13742    33.8   PGVector_halfvec(m=8, ef_cons=96, ef_search=20)
0.995   17642    24.5   MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.96
recall  QPS     isecs   algorithm
0.970   12836    35.1   PGVector(m=8, ef_cons=64, ef_search=20)
0.973   13742    33.8   PGVector_halfvec(m=8, ef_cons=96, ef_search=20)
0.995   17642    24.5   MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.95
recall  QPS     isecs   algorithm
0.970   12836    35.1   PGVector(m=8, ef_cons=64, ef_search=20)
0.973   13742    33.8   PGVector_halfvec(m=8, ef_cons=96, ef_search=20)
0.995   17642    24.5   MariaDB(m=12, ef_search=10)

January 12, 2025

Evaluating vector indexes in MariaDB and pgvector: part 1

This post has results for vector index support in MariaDB and Postgres. I am new to vector indexes so I will start small and over time add more results.  This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.

I compare MariaDB with pgvector because I respect the work that the Postgres community has done to support vector search workloads. And I am happy to report that MariaDB has also done a great job on this. While I don't know the full story of the development effort, this feature came from the MariaDB Foundation and the community and it is wonderful to see that collaboration.

tl;dr

  • Performance for MariaDB is excellent
    • peak QPS for a given recall target is much better than pgvector
    • time to create the index to achieve a given recall target is almost always much better than pgvector
  • MariaDB is easier to tune than pgvector
  • MariaDB needs more documentation
The good and the less good

The good for MariaDB starts with performance. The results I get are great and match the results in this blog post from upstream. Another good thing about MariaDB is that it is easier to tune. With pgvector I need to set M and ef_construction while creating an index and then ef_search while running a query, With MariaDB there is no option to set ef_construction. And evaluations are easier when there are fewer options to tune. Note that the M with pgvector isn't the same thing as the M with MariaDB, but they are similar (at least in spirit).

The less good for MariaDB is the documentation. We need more, but that is easy to fix. I have been using:

Things that need more documentation
  • the algorithm is described as a variation of HNSW (MHNSW) and the community will benefit from more details on what has been changed. For example, I can't set ef_construction and it always uses float16 in the index. But with pgvector I can use either float32 (default) or float16 (via halfvec).
  • What transaction isolation level is supported? I get that workloads will be read-heavy but I assume that some won't be read-only so I won't to know whether repeatable read and read committed are provided.
  • What concurrent operations are supported? Obviously, reads can be concurrent with other reads. But can writes be concurrent with reads or with other writes on the same table?
Hardware

The hardware is a Beelink SER7 7840HS with a Ryzen 7 7840HS CPU, 32G of RAM and Ubuntu 22.04.

While I already installed some dependencies on this server long ago, to run this benchmark I did:

sudo apt install libmariadb3 libmariadb-dev

pip3 install mariadb

pip3 install pgvector psycopg


Database software

For Postgres I compiled version 17.2 and pgvector 0.8.0 from source. Files that I used include:
For MariaDB I compiled version 11.7.1 from source. Files that I used include:
Benchmark

I am using the ann-benchmarks via my fork of a fork of a fork at this commit. These forks have changes to run the benchmark for MariaDB and pgvector without using Docker containers. A request I have for anyone writing a benchmark client is to limit dependencies, or at least make them optional. I just want to point the benchmark client at an existing installation.

The ann-benchmarks configuration files are here for MariaDB and for pgvector. I am open to feedback that I should try different parameters. I added support to use float16 (halfvec) for the pgvector index (but only the index, not for the vector stored in the table).

In this post I use the fashion-mnist-784-euclidean dataset in non-batch mode where non-batch mode doesn't run concurrent queries and batch mode does. Note that I don't set --count when running the benchmark which means that all of the queries use LIMIT 10.

Files related to these tests are archived here.

The command lines for non-batch mode are:
POSTGRES_CONN_ARGS=root:pw:127.0.0.1:5432 POSTGRES_DB_NAME=ib \
    python3 -u run.py  --algorithm pgvector --dataset fashion-mnist-784-euclidean --local

POSTGRES_CONN_ARGS=root:pw:127.0.0.1:5432 POSTGRES_DB_NAME=ib \
    python3 -u run.py  --algorithm pgvector_halfvec --dataset fashion-mnist-784-euclidean --local

MARIADB_CONN_ARGS=root:pw:127.0.0.1:3306 MARIADB_DB_NAME=test \
    python3 -u run.py  --algorithm mariadb --dataset fashion-mnist-784-euclidean --local

Results: QPS vs recall graphs

The recall vs QPS graph is created by: python3 plot.py --dataset fashion-mnist-784-euclidean

The results below for MariaDB are excellent. It gets more QPS than pgvector at a given recall target.


Results: create index

I am still trying to figure out how to present this data. All of the numbers are here for the time to create an index and the size of the index. The database configs for Postgres and MariaDB are shared above, and parallel index create is disabled by the config for Postgres (and not supported yet by MariaDB). The summary is:
  • Indexes have a similar size with MariaDB and Postgres with halfvec. The Postgres index without halfvec is about 2X larger.
  • Time to create the index for Postgres is similar with and without halfvec
  • Time to create the index for MariaDB is less than for pgvector. Perhaps the best way to compare this is the time to create the index for a similar point on the QPS/recall graph (see the last section of this blog post)
Results: best QPS for a given recall

Many benchmark results are marketed via peak performance (max throughput or min response time) but these are usually constrained optimization problems -- determine peak performance that satisfies some SLA. And the SLA might be response time or efficiency (cost).

With ann-benchmarks the constraint might be recall where you seek the best QPS that satisfies a recall target. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms (MariaDB, pgvector with float32, pgvector with float16).

For all cases below except the first (recall = 1.000) the time to create the index is about 2X or more longer with pgvector than with MariaDB.

For all cases below the best QPS at a given recall target is much better, sometimes 3X better, with MariaDB than with pgvector.

Legend
* recall & QPS - results from the benchmark
* isecs - number of seconds to create the index for M and ef_construction (ef_cons)

Best QPS with recall >= 1.000
recall  QPS  isecs  algorithm
1.000   920  97.4   PGVector(m=16, ef_cons=192, ef_search=120)
1.000   991  89.8   PGVector_halfvec(m=16, ef_cons=256, ef_search=120)
1.000  3226 111.8   MariaDB(m=32, ef_search=10)

Best QPS with recall >= 0.99
recall  QPS  isecs  algorithm
0.990  2337  70.7   PGVector(m=16, ef_cons=96, ef_search=20)
0.991  2558  78.3   PGVector_halfvec(m=16, ef_cons=192, ef_search=20)
0.995  4745  27.4   MariaDB(m=12, ef_search=10)

Best QPS with recall >= 0.98
recall  QPS  isecs  algorithm
0.985  2383  50.4   PGVector(m=16, ef_cons=32, ef_search=20)
0.988  2608  52.8   PGVector_halfvec(m=16, ef_cons=64, ef_search=20)
0.984  5351  18.1   MariaDB(m=8, ef_search=10)

Best QPS with recall >= 0.97
recall  QPS  isecs  algorithm
0.972  3033  42.3   PGVector(m=8, ef_cons=96, ef_search=20)
0.973  3185  89.8   PGVector_halfvec(m=16, ef_cons=256, ef_search=10)
0.984  5351  18.1   MariaDB(m=8, ef_search=10)

Best QPS with recall >= 0.96
recall  QPS  isecs  algorithm
0.961  3044   50.4  PGVector(m=16, ef_cons=32, ef_search=10)
0.967  3250   52.8  PGVector_halfvec(m=16, ef_cons=64, ef_search=10)
0.961  5691   14.8  MariaDB(m=6, ef_search=10)

Best QPS with recall >= 0.95
recall  QPS  isecs  algorithm
0.961  3044  50.4   PGVector(m=16, ef_cons=32, ef_search=10)
0.967  3250  52.8   PGVector_halfvec(m=16, ef_cons=64, ef_search=10)
0.961  5691  14.8   MariaDB(m=6, ef_search=10)

January 11, 2025