a curated list of database news from authoritative sources

October 10, 2023

October 08, 2023

October 06, 2023

October 05, 2023

PlanetScale vs. Amazon Aurora

Compare PlanetScale’s multi-cloud MySQL platform built on Vitess, with Amazon Aurora, one of AWS’s managed relational database services. Discover the strengths, features, and benefits of each platform to choose the best fit for your database needs.

PlanetScale vs. Amazon RDS

Compare PlanetScale’s multi-cloud MySQL platform built on Vitess, with Amazon RDS, AWS’s managed relational database service. Discover the strengths, features, and benefits of each platform to choose the best fit for your database needs.

Watch Out For This Use Case When Using Read Committed Snapshot Isolation

Takeaway: If you want to extract rows from a table periodically as part of an ETL operation and if you use Read Committed Snapshot Isolation (RCSI), be very careful or you may miss some rows. Yesterday, Kendra Little talked a bit about Lost Updates under RCSI. It’s a minor issue that can pop up after […]

The post Watch Out For This Use Case When Using Read Committed Snapshot Isolation first appeared on Michael J. Swart.

Go database driver overhead on insert-heavy workloads

The most popular SQLite and PostgreSQL database drivers in Go are (roughly) 20-76% slower than alternative Go drivers on insert-heavy benchmarks of mine. So if you are bulk-inserting data with Go (and potentially also bulk-retrieving data with Go), you may want to consider the driver carefully. And you may want to consider avoiding database/sql.

Some driver authors have noted and benchmarked issues with database/sql.

So it may be the case that database/sql is responsible for some of this overhead. And indeed the variations between drivers in this post will be demonstrated by using database/sql and avoiding it. This post won't specifically prove that the variation is due to the database/sql interface. But that doesn't change the premise.

Not covered in this post but something to consider: JetBrains has suggested that other frontends like sqlc, sqlx, and GORM do worse than database/sql.

This post is built on the workload, environment, libraries, and methodology in my databases-intuition repo on GitHub. See the repo for details that will help you reproduce or correct me.

INSERT workload

In this workload, the data is random and there are no indexes. Neither of these aspects matter for this post though because we're comparing behavior within the same database among different drivers. This was just a workload I already had.

Two different data sizes are tested:

  1. 10M rows with 16 columns, each column is 32 bytes
  2. 10M rows with 3 columns, each column is 8 bytes

Each test is run 10 times and we record median, standard deviation, min, max and throughput.

SQLite

Both variations presented here load 10M rows using a single prepared statement called for each row within a single transaction.

The most popular driver is mattn/go-sqlite3.

It is roughly 20-40% slower than another driver that avoids database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 56.53 ± 1.26s, Min: 55.05s, Max: 59.62s
Throughput: 176,893.65 ± 3,853.90 rows/s, Min: 167,719.97 rows/s, Max: 181,646.02 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 15.92 ± 0.25s, Min: 15.69s, Max: 16.67s
Throughput: 628,044.37 ± 9,703.92 rows/s, Min: 599,852.91 rows/s, Max: 637,435.60 rows/s

The other driver I tested is my own fork of bvinc/go-sqlite-lite called eatonphil/gosqlite. I forked it because it is unmaintained and I wanted to bring it up-to-date for tests like this.

10M Rows, 16 columns, each column 32 bytes:

Timing: 45.51 ± 0.70s, Min: 43.72s, Max: 45.93s
Throughput: 219,729.65 ± 3,447.56 rows/s, Min: 217,742.98 rows/s, Max: 228,711.51 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 10.44 ± 0.20s, Min: 10.02s, Max: 10.68s
Throughput: 957,939.60 ± 18,879.43 rows/s, Min: 936,114.60 rows/s, Max: 998,426.62 rows/s

PostgreSQL

Both variations presented use PostgreSQL's COPY FROM support. This is significantly faster for PostgreSQL than doing the prepared statement we do in SQLite. (Here are my results for doing prepared statement INSERTs in PostgreSQL if you are curious.)

The most popular PostgreSQL driver is lib/pq. The performance issues with lib/pq are well-known, and the repo itself is marked as no longer developed.

It is roughly 44-76% slower than an alternative driver that avoids database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 104.53 ± 2.40s, Min: 102.57s, Max: 110.08s
Throughput: 95,665.37 ± 2,129.25 rows/s, Min: 90,847.08 rows/s, Max: 97,490.96 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 8.16 ± 0.43s, Min: 7.44s, Max: 8.80s
Throughput: 1,225,986.47 ± 66,631.53 rows/s, Min: 1,136,581.82 rows/s, Max: 1,343,441.37 rows

The other driver I tested is jackc/pgx, without database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 46.54 ± 1.60s, Min: 44.09s, Max: 49.51s
Throughput: 214,869.42 ± 7,265.10 rows/s, Min: 201,991.37 rows/s, Max: 226,801.07 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 5.20 ± 0.44s, Min: 4.71s, Max: 5.96s
Throughput: 1,923,722.79 ± 156,820.46 rows/s, Min: 1,676,894.32 rows/s, Max: 2,124,966.60 rows/

The discrepancies here are even greater than with the different SQLite drivers.

Workloads with small resultset

I won't go into as much detail but if you're doing queries that don't return many rows, the difference between drivers is negligible.

See here for details.

Conclusion

If you are doing INSERT-heavy workloads, or you are processing large number of rows returned from your SQL database, you might want to try benchmarking the same workload with different drivers.

And specifically, there is likely no good reason to use lib/pq anymore for accessing PostgreSQL from Go. Just use jackc/pgx.

October 04, 2023

Supabase Beta September 2023

September was packed with so many new features and releases that it might have seemed like another Launch Week, but it wasn't!

October 03, 2023

October 02, 2023