database.news | database technology news aggregator

October 11, 2023

Tinybird Engineering Blog

How to query Google Sheets with SQL in real time

Want to access real-time data from Google Sheets? Follow this tutorial to query your Google Sheets data with a real-time SQL engine.

October 10, 2023

PlanetScale Blog

What is MySQL partitioning?

Learn the basics of MySQL partitioning, including partitioning with range, list, and key strategies, as well as how partitioning relates to database sharding.

PlanetScale Blog

MySQL High Availability: Connection handling and concurrency

Deep dive into MySQL’s connection handling mechanisms for optimal connection pooling and improved concurrency.

Supabase Blog

pgvector vs Pinecone: cost and performance

Direct performance comparison between pgvector and Pinecone.

October 08, 2023

Supabase Blog

Offline-first React Native Apps with Expo, WatermelonDB, and Supabase

Store your data locally and sync it with Postgres using WatermelonDB!

October 06, 2023

PlanetScale Blog

Personalizing your onboarding with Markdoc

Learn how we utilized Markdoc to create custom, extendable product onboarding at PlanetScale.

October 05, 2023

PlanetScale Blog

Compare PlanetScale’s multi-cloud MySQL platform built on Vitess, with Amazon Aurora, one of AWS’s managed relational database services. Discover the strengths, features, and benefits of each platform to choose the best fit for your database needs.

PlanetScale Blog

PlanetScale vs. Amazon RDS

Compare PlanetScale’s multi-cloud MySQL platform built on Vitess, with Amazon RDS, AWS’s managed relational database service. Discover the strengths, features, and benefits of each platform to choose the best fit for your database needs.

Michael J. Swart - Database Whisperer

Watch Out For This Use Case When Using Read Committed Snapshot Isolation

Takeaway: If you want to extract rows from a table periodically as part of an ETL operation and if you use Read Committed Snapshot Isolation (RCSI), be very careful or you may miss some rows. Yesterday, Kendra Little talked a bit about Lost Updates under RCSI. It’s a minor issue that can pop up after […]

The post Watch Out For This Use Case When Using Read Committed Snapshot Isolation first appeared on Michael J. Swart.

by Michael J. Swart

Phil Eaton

Go database driver overhead on insert-heavy workloads

The most popular SQLite and PostgreSQL database drivers in Go are (roughly) 20-76% slower than alternative Go drivers on insert-heavy benchmarks of mine. So if you are bulk-inserting data with Go (and potentially also bulk-retrieving data with Go), you may want to consider the driver carefully. And you may want to consider avoiding database/sql.

Some driver authors have noted and benchmarked issues with database/sql.

So it may be the case that database/sql is responsible for some of this overhead. And indeed the variations between drivers in this post will be demonstrated by using database/sql and avoiding it. This post won't specifically prove that the variation is due to the database/sql interface. But that doesn't change the premise.

Not covered in this post but something to consider: JetBrains has suggested that other frontends like sqlc, sqlx, and GORM do worse than database/sql.

This post is built on the workload, environment, libraries, and methodology in my databases-intuition repo on GitHub. See the repo for details that will help you reproduce or correct me.

INSERT workload

In this workload, the data is random and there are no indexes. Neither of these aspects matter for this post though because we're comparing behavior within the same database among different drivers. This was just a workload I already had.

Two different data sizes are tested:

10M rows with 16 columns, each column is 32 bytes
10M rows with 3 columns, each column is 8 bytes

Each test is run 10 times and we record median, standard deviation, min, max and throughput.

SQLite

Both variations presented here load 10M rows using a single prepared statement called for each row within a single transaction.

The most popular driver is mattn/go-sqlite3.

It is roughly 20-40% slower than another driver that avoids database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 56.53 ± 1.26s, Min: 55.05s, Max: 59.62s
Throughput: 176,893.65 ± 3,853.90 rows/s, Min: 167,719.97 rows/s, Max: 181,646.02 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 15.92 ± 0.25s, Min: 15.69s, Max: 16.67s
Throughput: 628,044.37 ± 9,703.92 rows/s, Min: 599,852.91 rows/s, Max: 637,435.60 rows/s

The other driver I tested is my own fork of bvinc/go-sqlite-lite called eatonphil/gosqlite. I forked it because it is unmaintained and I wanted to bring it up-to-date for tests like this.

10M Rows, 16 columns, each column 32 bytes:

Timing: 45.51 ± 0.70s, Min: 43.72s, Max: 45.93s
Throughput: 219,729.65 ± 3,447.56 rows/s, Min: 217,742.98 rows/s, Max: 228,711.51 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 10.44 ± 0.20s, Min: 10.02s, Max: 10.68s
Throughput: 957,939.60 ± 18,879.43 rows/s, Min: 936,114.60 rows/s, Max: 998,426.62 rows/s

PostgreSQL

Both variations presented use PostgreSQL's COPY FROM support. This is significantly faster for PostgreSQL than doing the prepared statement we do in SQLite. (Here are my results for doing prepared statement INSERTs in PostgreSQL if you are curious.)

The most popular PostgreSQL driver is lib/pq. The performance issues with lib/pq are well-known, and the repo itself is marked as no longer developed.

It is roughly 44-76% slower than an alternative driver that avoids database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 104.53 ± 2.40s, Min: 102.57s, Max: 110.08s
Throughput: 95,665.37 ± 2,129.25 rows/s, Min: 90,847.08 rows/s, Max: 97,490.96 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 8.16 ± 0.43s, Min: 7.44s, Max: 8.80s
Throughput: 1,225,986.47 ± 66,631.53 rows/s, Min: 1,136,581.82 rows/s, Max: 1,343,441.37 rows

The other driver I tested is jackc/pgx, without database/sql.

10M Rows, 16 columns, each column 32 bytes:

Timing: 46.54 ± 1.60s, Min: 44.09s, Max: 49.51s
Throughput: 214,869.42 ± 7,265.10 rows/s, Min: 201,991.37 rows/s, Max: 226,801.07 rows/s

10M Rows, 3 columns, each column 8 bytes:

Timing: 5.20 ± 0.44s, Min: 4.71s, Max: 5.96s
Throughput: 1,923,722.79 ± 156,820.46 rows/s, Min: 1,676,894.32 rows/s, Max: 2,124,966.60 rows/

The discrepancies here are even greater than with the different SQLite drivers.

Workloads with small resultset

I won't go into as much detail but if you're doing queries that don't return many rows, the difference between drivers is negligible.

See here for details.

Conclusion

If you are doing INSERT-heavy workloads, or you are processing large number of rows returned from your SQL database, you might want to try benchmarking the same workload with different drivers.

And specifically, there is likely no good reason to use lib/pq anymore for accessing PostgreSQL from Go. Just use jackc/pgx.

For INSERT-heavy workloads in Go, you may want to switch database drivers. For PostgreSQL and SQLite, the popular drivers are 20-76% slower for this workload in my tests.

Some driver developers have reported issues with database/sql as an interface.https://t.co/NLVp0P2uiV pic.twitter.com/RxTbgMZ1MG
— Phil Eaton (@eatonphil) October 6, 2023

October 04, 2023

Supabase Blog

Supabase Beta September 2023

September was packed with so many new features and releases that it might have seemed like another Launch Week, but it wasn't!

October 03, 2023

PlanetScale Blog

PlanetScale is bringing vector search and storage to MySQL

We are adding vector storage and search to MySQL, enabling you to use PlanetScale for your AI use cases.

Supabase Blog

Dynamic Table Partitioning in Postgres

Learn how to scale large postgres tables in place and increase query performance.