a curated list of database news from authoritative sources

October 23, 2025

Barbarians at the Gate: How AI is Upending Systems Research

This recent paper from the Berkeley Sky Computing Lab has been making waves in systems community. Of course, Aleksey and I did our live blind read of it, which you can watch below. My annotated copy of the paper is also available here.

This is a fascinating and timely paper. It raises deep questions about how LLMs will shape the research process, and how that could look like. Below, I start with a short technical review, then move to the broader discussion topics.


Technical review

The paper introduces AI-Driven Research for Systems (ADRS) framework. By leveraging the OpenEvolve framework,  ADRS integrates LLMs directly into the systems research workflow to automate much of the solution-tweaking and evaluation process. As shown in Figure 3, ADRS operates as a closed feedback loop in which the LLM ensemble iteratively proposes, tests, and refines solutions to a given systems problem. This automation targets the two most labor-intensive stages of the research cycle, solution tweaking and evaluation, leaving the creative areas (problem formulation, interpreting results, and coming up with insights) untouched.

Within the inner loop, four key components work together. The Prompt Generator creates context-rich prompts that seed the LLM ensemble (Solution Generator), which outputs candidate designs or algorithms. These are then assessed by the Evaluator, a simulator or benchmark written by humans, for gathering quantitative feedback. The Solution Selector identifies the most promising variants, which are stored along with their scores in the Storage module to inform subsequent iterations. This automated loop runs rapidly and at scale, and enables exploration of large design spaces within hours rather than weeks! They applied ADRS to several systems problems, including cloud job scheduling, load balancing, transaction scheduling, and LLM inference optimization. In each case, the AI improved on prior human-designed algorithms, often within a few hours of automated search. Reported gains include up to 5x faster performance or 30–50% cost reductions compared to published baselines, which are achieved in a fraction of the time and cost of traditional research cycles.

Outside the optimization loop, the creative and difficult work happens. The scientist identifies the research problem, directs the search, and decides which hills are worth climbing. Machines handle the iterative grunt work of tweaking and testing solutions, while humans deal with abstraction, framing, and insight.

There are several other important limitations for the framework's effectiveness as well. The paper's examples mostly involve trivial correctness, and also no concurrency, security, or fault-tolerance concerns. These domains require reasoning beyond performance tuning. Another limitation is that these LLMs focus/update one component only, and can't handle system-wide interactions yet.

Simulator-based evaluation makes this approach feasible, but the systems field undervalues simulation work and this leads to limited infrastructure for automated testing. Similarly, evaluators also pose risks: poorly designed ones invite reward hacking, where LLMs exploit loopholes rather than learn real improvements. If AI-driven research is to scale, we need richer evaluators, stronger specifications, and broader respect for simulation as a first-class research tool.


Discussion topics

Here I wax philosophical on many interesting questions this work raises.


LLMs provide breadth, but research demands depth

LLMs excel at high-throughput mediocrity. By design, they replicate what has already been done, and optimize across the surface of knowledge. Research, however, advances through novelty, depth, and high-value insight.

"Research is to see what everybody else has seen, and to think what nobody else has thought."

-- Albert Szent-Györgyi (Nobel laureate)

In this sense, LLMs are not as dangerous as the "Barbarians" at the gates. They are more like "Barbies" at the gates, with gloss, confidence, and some hollowness. They may dazzle with presentation but they will lack the inner substance/insights/value that mastery, curiosity, and struggle bring.


LLMs address only the tip of the iceberg

LLMs operate on the visible tip of the research iceberg I described earlier. They cannot handle the deep layers that matter: Curiosity, Clarity, Craft, Community, Courage.

Worse, they may even erode those qualities. The danger in the short-term is not invasion, but imitation: the replacement of thought with performance, and depth with polish. We risk mistaking synthetic polish with genuine understanding.

In the long term though, I am not worried. In the long term, we are all dead.

I'm kidding, ok. In the long term, we may be screwed as well. The 2004 movie "Idiocracy" rings more true every day. I am worried that due to the inherent laziness of our nature, we may end up leaning more and more on AI to navigate literature, frame questions, or spin hypotheses, that we may not get enough chances to exercise our curiosity or improve our clarity of understanding.


LLMs are bad researchers, but can they still make good collaborators?

In our academic chat follow-up to the iceberg post, I wrote about what makes a bad researcher:

Bad research habits are easy to spot: over-competition, turf-guarding, incremental work, rigidity, and a lack of intellectual flexibility. Bad science follows bad incentives such as benchmarks over ideas, and performance over understanding. These days the pressure to run endless evaluations has distorted the research and publishing process. Too many papers now stage elaborate experiments to impress reviewers instead of illuminating them with insights. Historically, the best work always stood on its own, by its simplicity and clarity. 

LLMs are bad researchers. The shoe fits. 

But can they still be good collaborators? Is it still worth working with them? The hierarchy is simple:

Good collaborators  >  No  collaborators  >  Bad collaborators

Used wisely, LLMs can climb high enough to reach the lowest range of the good collaborator category. If you give them bite-sized well defined work, they can reduce friction, preserve your momentum, and speed up parts of your work significantly. In a sense, they can make you technically fearless. I believe that when used for rapid prototyping, LLMs can help improve the design. And, through faster iteration, you may uncover some high-value insights.

But speed cuts both ways, because premature optimization is the root of all evil. If doing evaluations and optimizations becomes very cheap and effortless, we will more readily jump to this step, without nothing forcing us to think harder. Human brains are lazy by design. They don't want to think hard, and they will take the quick superficial route out, and we don't get to go deep. 

So, we need to tread carefully here as well.


Can we scale human oversight?

The worst time I ever had as an advisor was when I had to manage 6-7 (six-seveeeen!) PhD students at once. I would much rather work with 2 sharp creative students I support myself than 50 mediocre ones handed to me for free. The former process of working is more productive and it results in deep work and valuable research. Focus is the key, and it does not scale. 

The same holds for LLM-augmented research. Validation (via human focus) remains as the bottleneck. They can generate endless results, but without distilling those results into insight or wisdom, they all remain as AI slop in abundance.


Can clear insights distill without dust, tear, and sweat?

One may argue that with machines handling the grunt work, the researchers would finally get more time for thinking. Our brains are --what?-- yes, lazy. Left idle, they will scroll Reddit/Twitter rather than solve concurrency bugs.

I suspect we need some friction/irritation to nudge us to think in the background. And I suspect this is what happens when we are doing the boring work and working in the trenches. While writing a similar code snippet for the fifth time in our codebase, an optimization opportunity or an abstraction would occur to us. Very hard problems are impossible to tackle head on. Doing the legwork, I suspect we approach the problem sideways, and have a chance to make some leeway.

Yes, doing evaluation work sucks. But it is often necessary to generate the friction and space to get you think about the performance, and more importantly the logic/point of your system.  Through that suffering, you gradually get transformed and enlightened. Working in the trenches, you may even realize your entire setup is flawed, and your measurements are garbage due to using closed loop clients instead of open loop ones.

What happens when we stop getting our hands dirty? We risk distilling nothing at all. Insights don't bubble up while we are sitting in comfort and scrolling cat videos. In an earlier post, Looming Liability Machines (LLMs), I argued that offloading root-cause analysis to AI misses the point. RCA isn't about assigning blame to a component. It is an opportunity to think about the system holistically, and understand it better, and improve. Outsourcing this to LLMs strike me as a very stupid thing to do. We need to keep exercising those muscles, otherwise they would atrophy alongside our understanding of the system.


What will happen to the publication process?

In his insightful blog post on this paper, Brooker concludes:

Which leads systems to a tough spot. More bottlenecked than ever on the most difficult things to do. In some sense, this is a great problem to have, because it opens the doors for higher quality with less effort. But it also opens the doors for higher volumes of meaningless hill climbing and less insight (much of which we’re already seeing play out in more directly AI-related research). Conference organizers, program committees, funding bodies, and lab leaders will all be part of setting the tone for the next decade. If that goes well, we could be in for the best decade of systems research ever. If it goes badly, we could be in for 100x more papers and 10x less insight.

Given my firm belief in human laziness, I would bet on the latter. I have been predicting the collapse of the publishing system for a decade, and the flood of LLM-aided research may finally finish the job. That might not be a bad outcome either. We are due for a better model/process anyways.

October 22, 2025

Advanced Query Capabilities 👉🏻 aggregation pipelines

Although MongoDB has supported ACID transactions and sophisticated aggregation features for years, certain publications still promote outdated misconceptions, claiming that only SQL databases provide robust data consistency and powerful querying capabilities. The “Benefits of Migrating” section in a spreadsheet company’s article is a recent example. It's yet another chance to learn from—and correct—misleading claims.

The claims ignore MongoDB’s advanced querying and multi-document transaction support. Written to market migration tools, this overlooks that MongoDB’s simple CRUD API is efficient for single-document tasks, and as a general-purpose database, it also offers explicit transactions and strong aggregation queries like SQL.

Enhanced Data Consistency and Reliability

The migration tool company justifies migrating by stating:

PostgreSQL’s ACID compliance ensures that all transactions are processed reliably, maintaining data integrity even in the event of system failures. This is particularly important for applications that require strong consistency, such as financial systems or inventory management.

Yes, PostgreSQL does provide ACID transactions and strong consistency, but this is mainly true for single-node deployments. In high-availability and sharded settings, achieving strong consistency and ACID properties is more complicated (see an example, and another example).

Therefore, highlighting ACID compliance as a reason to migrate from another database—when that alternative also supports ACID transactions—is not correct. For instance, single-node MongoDB has offered ACID compliance for years, and since v4.2, it supports multi-document transactions across replica sets and sharded clusters. Let's provide some syntax examples for the domains they mentioned.

Example: Financial system

Transfer $100 from Alice’s account to Bob’s account

// Initialize data  
db.accounts.insertMany([  
  { account_id: "A123", name: "Alice", balance: 500 },  
  { account_id: "B456", name: "Bob", balance: 300 }  
]);  

// Start a transaciton in a session
const session = db.getMongo().startSession();

try {
  accounts = session.getDatabase(db.getName()).accounts
  session.startTransaction();

  // Deduct $100 from Alice
  accounts.updateOne(
    { account_id: "A123" },
    { $inc: { balance: -100 } }
  );

  // Add $100 to Bob
  accounts.updateOne(
    { account_id: "B456" },
    { $inc: { balance: 100 } }
  );

  session.commitTransaction();
} catch (error) {
  session.abortTransaction();
  console.error("Transaction aborted due to error:", error);
} finally {
  session.endSession();
}

Why ACID matters in MongoDB here:

  • Atomicity: Deduct and credit, either both happen or neither happens.
  • Consistency: The total balance across accounts remains accurate.
  • Isolation: Other concurrent transfers won’t interfere mid-flight.
  • Durability: Once committed, changes survive crashes.

Example: Inventory management

Selling a product and recording that sale.


try {
  inventory = session.getDatabase(db.getName()).inventory
  session.startTransaction();

  // Reduce inventory count
  inventory.updateOne(
    { product_id: "P100" },
    { $inc: { quantity: -1 } }
  );

  // Add a record of the sale
  sales.insertOne(
    { product_id: "P100", sale_date: new Date(), quantity: 1 }
  );

  session.commitTransaction();
} catch (error) {
  session.abortTransaction();
  console.error("Transaction aborted due to error:", error);
} finally {
  session.endSession();
}

ACID guarantees in MongoDB:

  • No partial updates
  • Inventory stays synchronized with sales records
  • Safe for concurrent orders
  • Durable once committed

Advanced Query Capabilities

The migration tool vendor justifies migrating by stating:

PostgreSQL offers powerful querying capabilities, including:

  • Complex joins across multiple tables
  • Advanced aggregations and window functions
  • Full-text search with features like ranking and highlighting
  • Support for geospatial data and queries These allow for more sophisticated data analysis and reporting compared to MongoDB’s more limited querying capabilities.

This completely overlooks MongoDB’s aggregation pipeline.

Complex joins

MongoDB’s $lookup stage joins collections, even multiple times if you want.

Example: Join orders with customers to get customer names.

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customer_id",
      foreignField: "_id",
      as: "customer_info"
    }
  },
  { $unwind: "$customer_info" },
  {
    $project: {
      order_id: 1,
      product: 1,
      "customer_info.name": 1
    }
  }
]);

Advanced aggregations

Operators like $group, $sum, $avg, $count handle numeric calculations with ease.

Example: Total sales amount per product.

db.sales.aggregate([
  {
    $group: {
      _id: "$product_id",
      totalRevenue: { $sum: "$amount" },
      avgRevenue: { $avg: "$amount" }
    }
  },
  { $sort: { totalRevenue: -1 } }
]);

Window-like functions

MongoDB has $setWindowFields for operations akin to SQL window functions.

Running total of sales, sorted by date:

db.sales.aggregate([
  { $sort: { sale_date: 1 } },
  {
    $setWindowFields: {
      sortBy: { sale_date: 1 },
      output: {
        runningTotal: {
          $sum: "$amount",
          window: { documents: ["unbounded", "current"] }
        }
      }
    }
  }
]);

Full-text search with ranking & highlighting

MongoDB supports both simple text indexes and Atlas Search (powered by Apache Lucene).

Example with Atlas Search: Search in articles and highlight matches.

db.articles.aggregate([
  {
    $search: {
      index: "default",
      text: {
        query: "machine learning",
        path: ["title", "body"]
      },
      highlight: { path: "body" }
    }
  },
  {
    $project: {
      title: "1,"
      score: { $meta: "searchScore" },
      highlights: { $meta: "searchHighlights" }
    }
  }
]);

Geospatial queries

Native geospatial indexing with operators like $near.

Example: Find restaurants within 1 km of a point.

db.restaurants.createIndex({ location: "2dsphere" });

db.restaurants.find({
  location: {
    $near: {
      $geometry: { type: "Point", coordinates: [-73.97, 40.77] },
      $maxDistance: 1000
    }
  }
});

Conclusion

MongoDB and PostgreSQL have equivalent capabilities for ACID transactions and “advanced” queries — the difference lies in syntax and data model.

MongoDB transactions don’t rely on blocking locks. They detect conflicts and let the application wait and retry if necessary.

And instead of SQL in text strings sent to the database server to be interpreted at runtime, MongoDB uses a staged aggregation pipeline, fully integrated in your application language.

Migrating to PostgreSQL doesn’t magically grant you ACID or advanced analytics — if you’re already using MongoDB’s features, you already have them.

Customizing the New MongoDB Concurrency Algorithm

On some occasions, we realize the necessity of throttling the number of requests that MongoDB tries to execute per second, be it due to resource saturation remediation, machine change planning, or performance tests. The most direct way of doing this is by tuning the WiredTiger transaction ticket parameters. Applying this throttle provides more controlled and […]

October 21, 2025

October 20, 2025

Determine how much concurrency to use on a benchmark for small, medium and large servers

What I describe here works for me given my goal, which is to find performance regressions. A benchmark run at low concurrency is used to find regressions from CPU overhead. A benchmark run at high concurrency is used to find regressions from mutex contention. A benchmark run at medium concurrency might help find both.

My informal way for classifying servers by size is:

  • small - has less than 10 cores
  • medium - has between 10 and 20 cores
  • large - has more than 20 cores
How much concurrency?

I almost always co-locate benchmark clients and the DBMS on the same server. This comes at a cost (less CPU and RAM is available for the DBMS) and might have odd artifacts because clients in the real world are usually not co-located. But it has benefits that matter to me. First, I don't worry about variance from changes in network latency. Second, this is much easier to setup.

I try to not oversubscribe the CPU when I run a benchmark. For benchmarks where there are few waits for reads from or writes to storage, then I will limit the number of benchmark users so that the concurrent connection count is less than the number of CPU cores (cores, not VPUs) and I almost always use servers with Intel Hyperthreads and AMD SMT disabled. I do this because DBMS performance suffers when the CPU is oversubscribed and back when I was closer to production we did our best to avoid that state.

Even for benchmarks that have some benchmark steps where the workload will have IO waits, I will still limit the amount of concurrency unless all benchmark steps that I measure will have IO waits.

Assuming a benchmark is composed of a sequence of steps (at minimum: load, query) then I consider the number of concurrent connections per benchmark user. For sysbench, the number of concurrent connections is the same as the number of users, although sysbench uses the --threads argument to set the number of users. I am just getting started with TPROC-C via HammerDB and that appears to be like sysbench with one concurrent connection per virtual user (VU).

For the Insert Benchmark the number of concurrent connections is 2X the number of users on the l.i1 and l.i2 steps and then 3X the number of users on the range-query read-write steps (qr*) and the point-query read-write steps (qp*). And whether or not there are IO-waits for these users is complicated, so I tend to configure the benchmark so that the number of users is no more than half the number of CPU cores.

Finally, I usually set the benchmark concurrency level to be less than the number of CPU cores because I want to leave some cores for the DBMS to do the important background work, which is mostly MVCC garbage collection -- MyRocks compaction, InnoDB purge and dirty page writeback, Postgres vacuum.

October 16, 2025

Why is RocksDB spending so much time handling page faults?

This week I was running benchmarks to understand how fast RocksDB could do IO, and then compared that to fio to understand the CPU overhead added by RocksDB. While looking at flamegraphs taken during the benchmark I was confused that about 20% of the samples were from page fault handling. This confused me at first.

The lesson here is to run your benchmark long enough to reach a steady state before you measure things or there will be confusion. And I was definitely confused when I first saw this. Perhaps my post saves time for the next person who spots this.

The workload is db_bench with a database size that is much larger than memory and read-only microbenchmarks for point lookups and range scans.

Then I wondered if this was a transient issue that occurs while RocksDB is warming up the block cache and growing process RSS until the block cache has been fully allocated.

While b-trees as used by Postgres and MySQL will do a large allocation at process start, RocksDB does an allocation per block read, and when the block is evicted then the allocation is free'd. This can be a stress test for a memory allocator which is why jemalloc and tcmalloc work better than glibc malloc for RocksDB. I revisit the mallocator topic every few years and my most recent post is here.

In this case I use RocksDB with jemalloc. Even though per-block allocations are transient, the memory used by jemalloc is mostly not transient. While there are cases where jemalloc an return memory to the OS, with my usage that is unlikely to happen.

Were I to let the benchmark run for a long enough time, then eventually jemalloc would finish getting memory from the OS. However, my tests were running for about 10 minutes and doing about 10,000 block reads per second while I had configured RocksDB to use a block cache that was at least 36G and the block size was 8kb. So my tests weren't running long enough for the block cache to fill, which means that during the measurement period:

  • jemalloc was still asking for memory
  • block cache eviction wasn't needed and after each block read a new entry was added to the block cache
The result in this example is 22.69% of the samples are from page fault handling. That is the second large stack from the left. The RocksDB code where it happens is rocksdb::BlockFetcher::ReadBlockContents.

When I run the benchmark for more time, the CPU overhead from page fault handling goes away.




October 14, 2025

Security Advisory: CVE Affecting Percona Monitoring and Management (PMM)

A critical security vulnerability has been identified in the following software that Percona has made available and that you may be using:  PMM 3.x installations (that is, 3.0 and forward). The Common Vulnerabilities and Exposures (CVE) identifier for this issue is on request from mitre.org. Vulnerability details We were notified via an external report that […]

Amazon Aurora MySQL zero-ETL integration with Amazon SageMaker Lakehouse

In this post, we explore how zero-ETL integration works, the key benefits it delivers for data-driven teams, and how it aligns with the broader zero-ETL strategy in AWS services. You'll learn how this integration can enhance your data workflows, whether you're building predictive models, entering interactive SQL queries, or visualizing business trends. By eliminating traditional extract, transform, and load (ETL) processes, this solution enables real-time intelligence securely and at scale to help you make faster, data-driven decisions.

Is it time for TPC-BLOB?

If you want to store vectors in your database then what you store as a row, KV pair or document is likely to be larger than the fixed-page size (when your DBMS uses fixed-page sizes) and you will soon care about efficient and performant support for large objects. I assume this support hasn't been the top priority for many DBMS implementations and there will be some performance bugs.

In a SQL DBMS, support for large objects will use the plumbing created to handle LOB (Large OBject) datatypes. We should define what the L in LOB means here and I will wave my hands and claim larger than a fixed-page in your favorite DBMS but smaller than 512kb because I limit my focus to online workloads.

Perhaps now is the time for industry standard benchmarks for workloads with large objects. Should it be TPC-LOB or TPC-BLOB?

Most popular DBMS use fixed-size pages whether that storage is index-organized via an update-in-place b-tree (InnoDB) or heap-organized (Postgres, Oracle). For rows that are larger than the page size, which is usually between 4kb and 16kb, the entire row or largest columns will be stored out of line and likely split across several pages in the out of line storage. When the row is read, additional reads will be done to gather all of the too-large parts from the out of line locations.

This approach is far from optimal as there will be more CPU overhead, more random IO and might be more wasted space. But this was good enough because support for LOBs wasn't a priority for these DBMS as their focus was on OLTP where rows were likely to be smaller than a fixed-size page.

Perhaps by luck, perhaps it was fate, but WiredTiger is a great fit for MongoDB because it is more flexible about page sizes. And it is more flexible because it isn't an update-in-place b-tree, instead it is a copy-on-write random (CoW-R) b-tree that doesn't need or use out-of-line storage, although for extra large documents there might be a benefit from out-of-line.

MyRocks, and other LSM-based DBMS, also don't require out-of-line storage but they can benefit from it as shown by WiscKey and other engines that do key-value separation. Even the mighty RocksDB has an implementation of key-value separation via BlobDB.

Benchmarking Postgres 17 vs 18

Postgres 18 brings a significant improvement to read performance via async I/O and I/O worker threads. Here we compare its performance to Postgres 17.

Here are some ClickHouse® Cloud alternatives to consider

Explore alternatives to ClickHouse Cloud including managed ClickHouse providers, cloud data warehouses, and real-time OLAP engines. Compare performance, cost, developer experience, and learn when to choose each platform.

ClickHouse® vs BigQuery for real-time analytics

Compare ClickHouse and BigQuery for real-time analytics across architecture, query latency, cost models, and streaming ingestion. Learn when to choose each platform and how managed services simplify deployment.

ClickHouse® vs Firebolt for real-time data warehousing

Compare ClickHouse and Firebolt across architecture, real-time ingestion, query performance, and operational complexity. Learn when to choose each platform for your analytics workloads.

ClickHouse® vs PostgreSQL (with extensions)

Compare ClickHouse and PostgreSQL across storage models, performance benchmarks, scalability approaches, and popular extensions. Learn when to choose each database and how to migrate from PostgreSQL to ClickHouse for analytics.

October 13, 2025

Postgres 18.0 vs sysbench on a 32-core server

This is yet another great result for Postgres 18.0 vs sysbench. This time I used a 32-core server. Results for a 24-core server are here. The goal for this benchmark is to check for regressions from new CPU overhead and mutex contention.

I repeated the benchmark twice because I had some uncertainty about platform variance (HW and SW) on the first run.

tl;dr, from Postgres 17.6 to 18.0

  • There might be regressions from 17.6 to 18.0 but they are small (usually <= 3%)

tl;dr, from Postgres 12.22 through 18.0

  • the hot-points test is almost 2X faster starting in 17.6
  • scan is ~1.2X faster starting in 14.19
  • all write tests are much faster staring in 17.6
Builds, configuration and hardware

I compiled Postgres from source for versions 12.22, 13.22, 14.19, 15.14, 16.10, 17.6, and 18.0.

The server is a Dell Precision 7865 Tower Workstation with 1 socket, 128G RAM and an AMD Ryzen Threadripper PRO 5975WX with 32-Cores. The OS is Ubuntu 24.04 and storage is a 2TB m.2 SSD with ext-4 and discard enabled.

Prior to 18.0, the configuration file was named conf.diff.cx10a_c32r128 and is here for 12.2213.2214.1915.1416.10 and 17.6.

For 18.0 I tried 3 configuration files:

Benchmark

I used sysbench and my usage is explained here. To save time I only run 32 of the 42 microbenchmarks 
and most test only 1 type of SQL statement. Benchmarks are run with the database cached by Postgres.

The read-heavy microbenchmarks run for 600 seconds and the write-heavy for 900 seconds.

The benchmark is run with 24 clients and 8 tables with 10M rows per table. The purpose is to search for regressions from new CPU overhead and mutex contention.

I ran the benchmark twice. In the first run, there was several weeks between getting results for the older Postgres releases and Postgres 18.0 so I am less certain about variance from the hardware and softare. One concern is changes in daily temperature because I don't have a climate-controlled server room. Another concern is changes from updating my OS install.

In the second run, all results were collected within 7 days and I am less concerned about variance there.

Results

The microbenchmarks are split into 4 groups -- 1 for point queries, 2 for range queries, 1 for writes. For the range query microbenchmarks, part 1 has queries that don't do aggregation while part 2 has queries that do aggregation. 

I provide charts below with relative QPS. The relative QPS is the following:
(QPS for some version) / (QPS for base version)
When the relative QPS is > 1 then some version is faster than base version.  When it is < 1 then there might be a regression. Values from iostat and vmstat divided by QPS are also provided here. These can help to explain why something is faster or slower because it shows how much HW is used per request.

I present results for:
  • versions 12 through 18 using 12.22 as the base version
  • versions 17.6 and 18.0 using 17.6 as the base version
Results: Postgres 17.6 and 18.0

All files are here.

Results per microbenchmark from vmstat and iostat are here for the first and second run.

Some comments:
  • 18.0 looks better relative to 17.6 in the second run and I explain my uncertainty about the first run above
  • But I am skeptical about the great result for 18.0 on the full scan test (scan_range=100) in the second run. That might be variance induced by vacuum.
  • There might be regressions from 17.6 to 18.0 but they are small (usually <= 3%)
  • The small regression in read-only_range=10 might be from new optimizer overhead, because it doesn't reproduce when the length of the range query is increased -- see read-only_range=100 and read-only_range=10000.
Relative to: 17.6
col-1 : 18.0 with the x10b config that uses io_method=sync
col-2 : 18.0 with the x10c config that uses io_method=worker
col-3 : 18.0 with the x10d config that uses io_method=io_uring

col-1   col-2   col-3   point queries, first run
0.97    0.99    0.94    hot-points_range=100
0.97    0.98    0.96    point-query_range=100
1.00    0.99    0.99    points-covered-pk_range=100
0.99    1.00    1.00    points-covered-si_range=100
0.98    0.99    0.98    points-notcovered-pk_range=100
0.99    0.99    0.99    points-notcovered-si_range=100
1.00    1.00    0.99    random-points_range=1000
0.98    0.98    0.98    random-points_range=100
0.99    0.98    0.99    random-points_range=10

col-1   col-2   col-3   point queries, second run
0.98    1.00    0.99    hot-points_range=100
1.00    1.00    0.99    point-query_range=100
1.01    1.01    1.01    points-covered-pk_range=100
1.00    1.01    1.00    points-covered-si_range=100
1.00    0.98    1.00    points-notcovered-pk_range=100
1.00    1.00    1.01    points-notcovered-si_range=100
1.00    1.01    1.01    random-points_range=1000
1.00    0.99    1.01    random-points_range=100
0.99    0.99    1.00    random-points_range=10

col-1   col-2   col-3   range queries without aggregation, first run
0.97    0.98    0.95    range-covered-pk_range=100
0.97    0.97    0.94    range-covered-si_range=100
0.98    0.98    0.97    range-notcovered-pk_range=100
0.99    0.99    0.98    range-notcovered-si_range=100
0.97    0.99    0.96    scan_range=100

col-1   col-2   col-3   range queries without aggregation, second run
0.99    0.99    0.98    range-covered-pk_range=100
0.99    0.99    0.99    range-covered-si_range=100
0.98    0.99    0.98    range-notcovered-pk_range=100
0.99    1.00    1.00    range-notcovered-si_range=100
1.24    1.24    1.22    scan_range=100

col-1   col-2   col-3   range queries with aggregation, first run
0.99    1.00    1.00    read-only-count_range=1000
1.01    1.01    1.01    read-only-distinct_range=1000
1.01    1.01    1.00    read-only-order_range=1000
1.04    1.04    1.04    read-only_range=10000
0.99    0.99    0.98    read-only_range=100
0.97    0.98    0.97    read-only_range=10
0.99    0.98    0.98    read-only-simple_range=1000
0.99    0.99    0.99    read-only-sum_range=1000

col-1   col-2   col-3   range queries with aggregation, second run
0.99    1.00    1.00    read-only-count_range=1000
1.01    1.01    1.00    read-only-distinct_range=1000
0.99    0.99    1.00    read-only-order_range=1000
1.02    1.03    1.03    read-only_range=10000
0.99    0.99    0.99    read-only_range=100
0.99    0.99    0.98    read-only_range=10
0.99    1.00    1.01    read-only-simple_range=1000
1.00    1.00    1.00    read-only-sum_range=1000

col-1   col-2   col-3   writes, first run
0.99    0.98    0.96    delete_range=100
0.99    0.96    0.98    insert_range=100
1.00    0.99    0.98    read-write_range=100
0.99    0.98    0.98    read-write_range=10
1.00    0.99    1.00    update-index_range=100
1.03    0.95    1.01    update-inlist_range=100
0.99    0.99    1.00    update-nonindex_range=100
1.00    1.00    1.01    update-one_range=100
0.98    0.99    1.00    update-zipf_range=100
0.97    0.97    0.99    write-only_range=10000

col-1   col-2   col-3   writes, second run
0.97    0.97    0.98    delete_range=100
0.99    0.99    1.00    insert_range=100
0.99    0.99    0.98    read-write_range=100
0.98    0.98    0.98    read-write_range=10
0.97    0.98    0.97    update-index_range=100
0.98    0.99    1.04    update-inlist_range=100
0.98    0.99    0.98    update-nonindex_range=100
0.99    0.99    0.98    update-one_range=100
0.98    0.99    0.98    update-zipf_range=100
0.99    0.97    0.95    write-only_range=10000

Results: Postgres 12 to 18

All files are here.

Results per microbenchmark from vmstat and iostat are here for the first and second run.
The data below with a larger font is here.

Some comments:
  • the hot-points test is almost 2X faster starting in 17.6
  • scan is ~1.2X faster starting in 14.19
  • all write tests are much faster staring in 17.6
Relative to: 12.22
col-1 : 13.22
col-2 : 14.19
col-3 : 15.14
col-4 : 16.10
col-5 : 17.6
col-6 : 18.0 with the x10b config
col-7 : 18.0 with the x10c config
col-8 : 18.0 with the x10d config

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   point queries, first run
1.02    1.00    1.01    1.00    1.94    1.87    1.91    1.82    hot-points_range=100
1.01    1.02    1.02    1.00    1.02    0.99    1.00    0.98    point-query_range=100
1.02    1.02    1.01    1.03    1.01    1.01    1.00    1.00    points-covered-pk_range=100
1.01    1.04    1.03    1.05    1.03    1.02    1.03    1.03    points-covered-si_range=100
1.01    1.01    1.01    1.02    1.02    1.00    1.00    1.00    points-notcovered-pk_range=100
1.00    1.03    1.02    1.03    1.02    1.01    1.01    1.02    points-notcovered-si_range=100
1.01    1.02    1.02    1.03    1.00    1.00    1.00    0.99    random-points_range=1000
1.01    1.02    1.02    1.02    1.02    1.00    1.00    1.00    random-points_range=100
1.02    1.03    1.02    1.02    1.01    1.00    1.00    1.00    random-points_range=10

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   point queries, second run
1.00    0.98    0.99    1.00    1.94    1.90    1.93    1.92    hot-points_range=100
1.00    1.01    1.02    1.03    1.03    1.02    1.02    1.02    point-query_range=100
1.02    1.01    1.00    1.04    0.99    1.00    1.00    0.99    points-covered-pk_range=100
1.01    1.04    1.03    1.07    1.03    1.03    1.05    1.04    points-covered-si_range=100
1.01    1.02    1.03    1.04    1.01    1.00    0.99    1.01    points-notcovered-pk_range=100
1.02    1.05    1.05    1.05    1.03    1.03    1.03    1.04    points-notcovered-si_range=100
1.01    1.02    1.03    1.03    0.99    0.99    1.00    1.00    random-points_range=1000
1.02    1.02    1.03    1.04    1.01    1.01    1.00    1.01    random-points_range=100
1.02    1.02    1.02    1.03    1.02    1.01    1.01    1.02    random-points_range=10

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   range queries without aggregation, first run
1.00    1.02    1.02    1.01    1.00    0.97    0.98    0.95    range-covered-pk_range=100
1.00    1.02    1.02    1.01    1.00    0.97    0.97    0.94    range-covered-si_range=100
1.01    1.00    1.00    1.00    0.99    0.97    0.97    0.97    range-notcovered-pk_range=100
0.99    1.00    1.00    0.99    1.01    1.00    1.00    0.99    range-notcovered-si_range=100
0.98    1.24    1.11    1.13    1.16    1.12    1.14    1.11    scan_range=100

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   range queries without aggregation, second run
1.01    1.02    1.02    1.02    1.01    1.00    1.00    0.99    range-covered-pk_range=100
1.01    1.03    1.02    1.02    1.01    1.00    1.01    1.00    range-covered-si_range=100
1.00    0.99    1.00    1.00    0.99    0.97    0.98    0.98    range-notcovered-pk_range=100
1.00    1.00    1.00    0.98    1.01    1.00    1.01    1.01    range-notcovered-si_range=100
1.00    1.27    1.15    1.15    0.97    1.20    1.20    1.18    scan_range=100

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   range queries with aggregation, first run
1.02    1.00    1.00    1.01    0.97    0.96    0.97    0.97    read-only-count_range=1000
1.00    1.00    1.02    1.02    0.98    0.99    0.99    0.99    read-only-distinct_range=1000
1.01    1.00    1.03    1.03    1.00    1.01    1.01    1.01    read-only-order_range=1000
1.00    0.98    1.00    1.06    0.95    0.99    0.99    0.99    read-only_range=10000
1.00    1.00    1.00    1.00    1.00    0.98    0.98    0.98    read-only_range=100
1.00    1.01    1.01    1.00    1.01    0.98    0.99    0.98    read-only_range=10
1.01    1.00    1.02    1.01    1.00    0.99    0.98    0.98    read-only-simple_range=1000
1.00    1.00    1.01    1.00    0.99    0.98    0.98    0.98    read-only-sum_range=1000

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   range queries with aggregation, second run
1.03    1.02    1.02    1.03    0.97    0.97    0.97    0.98    read-only-count_range=1000
1.00    0.99    1.02    1.02    0.98    0.99    0.99    0.99    read-only-distinct_range=1000
1.00    0.99    1.02    1.04    1.02    1.01    1.01    1.02    read-only-order_range=1000
1.01    1.03    1.03    1.06    0.97    0.99    0.99    0.99    read-only_range=10000
0.99    1.00    1.00    1.01    1.00    0.99    0.99    0.99    read-only_range=100
0.99    1.00    1.00    1.00    1.01    0.99    1.00    0.99    read-only_range=10
1.00    0.99    1.01    1.00    0.99    0.98    0.98    0.99    read-only-simple_range=1000
1.00    1.00    1.01    1.01    0.99    0.98    0.98    0.98    read-only-sum_range=1000

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   writes, first run
1.00    1.08    1.08    1.05    1.25    1.24    1.23    1.20    delete_range=100
1.01    1.05    1.04    1.03    1.07    1.06    1.02    1.05    insert_range=100
1.00    1.06    1.07    1.07    1.10    1.09    1.08    1.07    read-write_range=100
1.00    1.07    1.08    1.07    1.13    1.13    1.11    1.11    read-write_range=10
0.99    1.04    1.04    0.90    1.43    1.43    1.41    1.43    update-index_range=100
1.00    1.09    1.08    1.08    1.11    1.15    1.06    1.12    update-inlist_range=100
1.00    1.05    1.05    1.04    1.35    1.34    1.34    1.35    update-nonindex_range=100
1.02    0.95    0.96    0.93    1.19    1.19    1.19    1.20    update-one_range=100
1.00    1.05    1.08    1.07    1.23    1.21    1.22    1.23    update-zipf_range=100
1.01    1.06    1.05    1.01    1.25    1.22    1.20    1.24    write-only_range=10000

col-1   col-2   col-3   col-4   col-5   col-6   col-7   col-8   writes, second run
1.00    1.06    1.07    1.07    1.26    1.23    1.23    1.24    delete_range=100
1.03    1.07    1.05    1.05    1.09    1.07    1.08    1.09    insert_range=100
1.01    1.07    1.08    1.07    1.11    1.10    1.10    1.09    read-write_range=100
0.99    1.04    1.06    1.07    1.13    1.11    1.11    1.12    read-write_range=10
0.99    1.02    1.04    0.87    1.44    1.40    1.41    1.40    update-index_range=100
1.00    1.11    1.12    1.09    1.17    1.14    1.16    1.22    update-inlist_range=100
1.01    1.04    1.06    1.03    1.36    1.33    1.35    1.34    update-nonindex_range=100
1.01    0.95    0.98    0.94    1.22    1.21    1.21    1.20    update-one_range=100
0.99    1.05    1.07    1.07    1.24    1.21    1.22    1.21    update-zipf_range=100
1.02    1.06    1.06    1.02    1.27    1.25    1.23    1.21    write-only_range=10000