February 17, 2025
My Time at MIT
Twenty years ago, in 2004-2005, I spent a year at MIT’s Computer Science department as a postdoc working with Professor Nancy Lynch. It was an extraordinary experience. Life at MIT felt like paradise, and leaving felt like being cast out.
MIT Culture
MIT’s Stata Center was the best CS building in the world at the time. Designed by Frank Gehry, it was a striking abstract architecture masterpiece (although like all abstractions it was a bit leaky). Furniture from Herman Miller complemented this design. I remember seeing price tags of $400 on simple yellow chairs.
The building buzzed with activity. Every two weeks, postdocs were invited to the faculty lunch on Thursdays, and alternating weeks we had group lunches. Free food seemed to materialize somewhere in the building almost daily, and the food trucks outside were also good. MIT thrived on constant research discussions, collaborations, and talks. Research talks were advertised on posters at the urinals, as a practical touch of MIT's hacker culture I guess.
Our research group occupied the 6th floor, which was home to theory and algorithms. From there, I would see Tim Berners-Lee meeting with colleagues on the floor below. The building’s open spaces and spiral staircases connected every pair of floors to foster interaction. The place radiated strong academic energy. One evening, I saw Piotr Indyk discussing something in front of one of the many whiteboards on the 6th floor. The next morning, he was still there, having spent the night working toward a paper deadline. Eric Demaine was on the same floor too. Once, I sent a long print job (a PhD thesis) accidentally to his office printer, and he was angry because of the wasted paper.
Nancy Lynch set a great example for us. She is a very detail-oriented person and she was able to find even the tiniest mistakes in the papers with ease. She once told me that her mind worked like a debugger when reading a paper, and these bugs jumped at her. The way she worked with students was that she would dedicate herself solely on a student/paper for the duration of an entire week. That week, she would avoid thinking or listening other works/students, even when she wanted to participate. This is because, she wanted to immerse and keep every parameter about the paper she is working on in her mind, and grok it.
People in Nancy's group were also incredibly sharp—Seth Gilbert, Rui Fan, Gregory Chockler, Cal Newport, and many other students and visiting researchers. Yes, that Cal Newport of "Deep Work" fame was a fresh PhD student back then. Looking back, I regret not making more friends, and not forging deeper connections.
Lessons Learned
Reflecting on my time at MIT, I wish I had been more intentional, more present, and more engaged. The experience was a gift, but I see now how much more I could have made of it.
I was young, naive, and plagued by impostor syndrome. I held back instead of exploring more, engaging more deeply, and seeking out more challenges. I allowed myself to be carried along by the current, rather than actively charting my own course. Youth is wasted on the young.
Why pretend to be smart and play it safe? True understanding is rare and hard-won, so why claim it before you are sure of it? Isn't it more advantageous to embrace your stupidity/ignorance and be underestimated? In research and academia, success often goes not to the one who understands first, but to the one who understands best. Even when speed matters, the real advantage comes from the deep, foundational insights that lead there.
When you approach work with humility and curiosity, you learn more and participate more fully. Good collaborators value these qualities. A beginner’s mind is an asset. Staying close to your authentic self helps you find your true calling.
February 15, 2025
Vector indexes, large server, dbpedia-openai dataset: MariaDB, Qdrant and pgvector
My previous post has results for MariaDB and pgvector on the dbpedia-openai dataset. This post adds results from Qdrant. This uses ann-benchmarks to compare MariaDB, Qdrant and Postgres (pgvector) with a larger dataset, dbpedia-openai at 500k rows. The dataset has 1536 dimensions and uses angular (cosine) as the distance metric. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.
tl;dr
- I am new to Qdrant so the chance that I made a mistake are larger than for MariaDB or Postgres
- If you already run MariaDB or Postgres then I suggest you also use them for vector indexes
- MariaDB usually gets ~2X more QPS than pgvector and ~1.5 more than Qdrant
- Production is expensive -- you have to worry about security, backups, operational support
- A new DBMS is expensive -- you have to spend time to learn how to use it
So I decided to try the Docker container they provide. I ended up not changing the Qdrant configuration provided in the Docker container. I spent some time doing performance debugging and didn't see anything to indicate that a config change was needed. For example, I didn't see disk IO during queries. But the performance debugging was harder because that Docker container image doesn't come with my favorite debug tools installed. Some of the tools were easy to install, others (perf) were not.
This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit.
The ann-benchmarks config files are here for MariaDB, Postgres and Qdrant. For Postgres I use the values for M and ef_construction. But MariaDB doesn't support ef_construction so I only specify the M values. While pgvector requires ef_construction to be >= 2*M, I do not know whether Qdrant has a similar requirement. Regardless I only test cases where that constraint is true.
- MariaDB uses 16-bit integers rather than float32
- pgvector uses float32, pgvector halfvec uses float16
- For Qdrant I used none (float32) and scalar (int8)
The command lines to run the benchmark using my helper scripts are:
These charts show the best QPS for a given recall. MariaDB gets more QPS than Qdrant and pgvector but that is harder to see as the recall approaches 1, so the next section has a table for best QPS per DBMS at a given recall.
Results: create index
- index sizes are similar between MariaDB and pgvector with halfvec
- time to create the index varies a lot and it is better to consider this in the context of recall which is done in next section. But Qdrant creates indexes a lot faster than MariaDB or pgvector.
- I did not find an accurate way to determine index size for Qdrant. There is a default method in ann-benchmarks that a DBMS can override. The default just compares process RSS before and after creating an index which isn't accurate for small indexes. The MariaDB and Postgres code override the default and query the data dictionary to get a more accurate estimate.
More details on index size and index create time for MariaDB and Postgres are in my previous post.
With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms -- MariaDB, pgvector with float32 and float16/halfvec, Qdrant with no and scalar quantization.
- Qdrant with scalar quantization does not get a result for recall=1.0 for the values of M, ef_construction and ef_search I used
- MariaDB usually gets ~2X more QPS than pgvector and ~1.5 more than Qdrant
- Index create time was much less for Qdrant (described above)
- recall, QPS - best QPS at that recall
- rel2ma - (QPS for me / QPS for MariaDB)
- m= is the value for M when creating the index
- ef_cons= is the value for ef_construction when creating the index
- ef_search= is the value for ef_search when running queries
- quant= is the quantization used by Qdrant
- dbms
- MariaDB - MariaDB, there is no option for quantization
- PGVector - Postgres with pgvector and float32
- PGVector_halfvec - Postgres with pgvector and halfvec (float16)
- Qdrant(..., quant=none) - Qdrant with no quantization
- Qdrant(..., quant=scalar) - Qdrant with scalar quantization
From web developer to database developer in 10 years
Last month I completed my first year at EnterpriseDB. I'm on the team that built and maintains pglogical and who, over the years, contributed a good chunk of the logical replication functionality that exists in community Postgres. Most of my work, our work, is in C and Rust with tests in Perl and Python. Our focus these days is a descendant of pglogical called Postgres Distributed which supports replicating DDL, tunable consistency across the cluster, etc.
This post is about how I got here.
Black boxes
I was a web developer from 2014-2021†. I wrote JavaScript and HTML and CSS and whatever server-side language: Python or Go or PHP. I was a hands-on engineering manager from 2017-2021. I was pretty clueless about databases and indeed database knowledge was not a serious part of any interview I did.
Throughout that time (2014-2021) I wanted to move my career forward as quickly as possible so I spent much of my free time doing educational projects and writing about them on this blog (or previous incarnations of it). I learned how to write primitive HTTP servers, how to write little parsers and interpreters and compilers. It was a virtuous cycle because the internet (Hacker News anyway) liked reading these posts and I wanted to learn how the black boxes worked.
But I shied away from data structures and algorithms (DSA) because they seemed complicated and useless to the work that I did. That is, until 2020 when an inbox page I built started loading more and more slowly as the inbox grew. My coworker pointed me at Use The Index, Luke and the DSA scales fell from my eyes. I wanted to understand this new black box so I built a little in-memory SQL database with support for indexes.
I'm a college dropout so even while I was interested in compilers and interpreters earlier in my career I never dreamed I could get a job working on them. Only geniuses and PhDs did that work and I was neither. The idea of working on a database felt the same. However, I could work on little database side projects like I had done before on other topics, so I did. Or a series of explorations of Raft implementations, others' and my own.
Startups
From 2021-2023 I tried to start a company and when that didn't pan out I joined TigerBeetle as a cofounder to work on marketing and community. It was during this time I started the Software Internals Discord and /r/databasedevelopment which have since kind of exploded in popularity among professionals and academics in database and distributed systems.
TigerBeetle was my first job at a database company, and while I contributed bits of code I was not a developer there. It was a way into the space. And indeed it was an incredible learning experience both on the cofounder side and on the database side. I wrote articles with King and Joran that helped teach and affirm for myself the basics of databases and consensus-based distributed systems.
Holding out
When I left TigerBeetle in 2023 I was still not sure if I could get a job as an actual database developer. My network had exploded since 2021 (when I started my own company that didn't pan out) so I had no trouble getting referrals at database companies.
But my background kept leading hiring managers to suggest putting me on cloud teams doing orchestration in Go around a database rather than working on the database itself.
I was unhappy with this type-casting so I held out while unemployed and continued to write posts and host virtual hackweeks messing with Postgres and MySQL. I started the first incarnation of the Software Internals Book Club during this time, reading Designing Data Intensive Applications with 5-10 other developers in Bryant Park. During this time I also started the NYC Systems Coffee Club.
Postgres
After about four months of searching I ended up with three good offers, all to do C and Rust development on Postgres (extensions) as an individual contributor. Working on extensions might sound like the definition of not-sexy, but Postgres APIs are so loosely abstracted it's really as if you're working on Postgres itself.
You can mess with almost anything in Postgres so you have to be very aware of what you're doing. And when you can't mess with something in Postgres because an API doesn't yet exist, companies have the tendency to just fork Postgres so they can. (This tendency isn't specific to Postgres, almost every open-source database company seems to have a long-running internal fork or two of the database.)
EnterpriseDB
Two of the three offers were from early-stage startups and after more than 3 years being part of the earliest stages of startups I was happy for a break. But the third offer was from one of the biggest contributors to Postgres, a 20-year old company called EnterpriseDB. (You can probably come up with different rankings of companies using different metrics so I'm only saying EnterpriseDB is one of the biggest contributors.)
It seemed like the best place to be to learn a lot and contribute something meaningful.
My coworkers are a mix of Postgres veterans (people who contributed the WAL to Postgres, who contributed MVCC to Postgres, who contributed logical decoding and logical replication, who contributed parallel queries; the list goes on and on) but also my developer-coworkers are people who started at EnterpriseDB on technical support, or who were previously Postgres administrators.
It's quite a mix. Relatively few geniuses or PhDs, despite what I used to think, but they certainly work hard and have hard-earned experience.
Anyway, I've now been working at EnterpriseDB for over a year so I wanted to share this retrospective. I also wanted to cover what it's like coming from engineering management and founding companies to going back to being an individual contributor. (Spoiler: incredibly enjoyable.) But it has been hard enough to make myself write this much so I'm calling it a day. :)
I wrote a post about the winding path I took from web developer to database developer over 10 years. pic.twitter.com/tf8bUDRzjV
— Phil Eaton (@eatonphil) February 15, 2025
† From 2011-2014 I also did contract web development but this was part-time while I was in school.
February 14, 2025
Orchestrator (for Managing MySQL) High Availability Using Raft
February 13, 2025
Build a data-intensive Next.js app with Tinybird and Cursor
Ship data as you ship code: Deploy changes with a single command
February 11, 2025
MongoDB Equality, Sort, Range (ESR) without Equality (SR): add an unbounded range predicate on the indexed sort field
In a previous post. I applied the ESR rule to multiple databases. I used a simplified example with no equality predicate to focus on covering the Sort and Range by the index. However, for the demo in MongoDB, I added an Equality predicate that doesn't filter anything, {$gt:MinKey}
, to get a full Equality, Sort, Range.
On my first day as a Developer advocate for MongoDB, I discussed this with the product managers, and the reason and possible optimization are tracked in the improvement idea Tighten index bounds and allow compound index to be chosen when predicate on leading field is not provided.
In this article, I'm showing it, with a simple workaround.
A table to experiment with the ESR rule
Here is a collection with fields that I'll use for queries with Equality, Sort, Range:
mdb> db.demoesr.drop();
mdb> db.demoesr.insertMany([
{ e:42, s:"a", r:10 },
{ e:42, s:"b", r:20 },
{ e:42, s:"b", r:10 },
{ e:42, s:"d", r:30 },
{ e:42, r:40 } // add one doc with no sort value
]);
Index and query on Equality, Sort, Range (ESR)
I create a compound index on fields "e", "s", and "r", in the order recommended by the ESR rule:
mdb> // index on Equality, Sort, Range
mdb> db.demoesr.createIndex(
{ e: 1 , s: 1, r : 1 }
);
e_1_s_1_r_1
I query with an equality predicate on "e", a range pI execute a query with an equality predicate on "e", a range predicate on "r", and a sort on "s", along with a projection of only indexed fields to ensure the query is fully covered by the index (the id must be explicitly excluded from the projection "_id":0
):
mdb> db.demoesr.find(
{ e:{$eq: 42} ,r:{$gt:10} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1});
[
{ e: 42, s: null, r: 40 },
{ e: 42, s: 'b', r: 20 },
{ e: 42, s: 'd', r: 30 }
]
I know that the result comes from the index entry directly, without fetching the document, because I can see s: null
for the document where "s" doesn't exist. The index has a value for all index key fields, with a null value when it does not exist in the document.
I confirm from the execution plan that there's no FETCH and no documents were read:
mdb> db.demoesr.find(
{ e:{$eq: 42} ,r:{$gt:10} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1}).explain("executionStats").executionStats;
{
nReturned: 3,
totalKeysExamined: 5,
totalDocsExamined: 0,
executionStages: {
stage: 'PROJECTION_COVERED',
nReturned: 3,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'IXSCAN',
nReturned: 3,
keyPattern: { e: 1, s: 1, r: 1 },
indexName: 'e_1_s_1_r_1',
direction: 'forward',
indexBounds: {
e: [ '[42, 42]' ],
s: [ '[MinKey, MaxKey]' ],
r: [ '(10, inf.0]' ]
},
keysExamined: 5,
seeks: 3,
}
}
This is the perfect execution plan for such queries as Equality, Sort, and Range were pushed down to the IXSCAN:
-
e: [ '[42, 42]' ]
is first in the index bounds and returns a single range for the filter on "e". The range holdskeysExamined: 5
index entries. -
s: [ '[MinKey, MaxKey]' ]
is second in the index bounds and returns this range ordered by "s", still returningkeysExamined: 5
index entries. -
r: [ '(10, inf.0]' ]
cannot use a single seek to get this range because there are multiple values in the preceding "s" but the filter is applied to the index entries and returnsnReturned: 3
index entries.
Additionally, because all fields in the projection are in the index entries, no documents had to be fetched: totalDocsExamined: 0
This is a perfect example of the ESR rule with a covering index where avoiding a sort is the goal.
In my example, I've only one value for "e". Let's remove the equality condition.
Index and query on Sort, Range (SR)
I want to run the same without an equality predicate. To avoid a sort operation, the index starts with the "s" column:
mdb> // index on Sort, Range
mdb> db.demoesr.createIndex(
{ s: 1, r : 1 }
);
s_1_r_1
Here is a query with only a Range and Sort:
mdb> db.demoesr.find(
{ r:{$gt:10} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1});
[
{ e: 42, r: 40 },
{ e: 42, s: 'b', r: 20 },
{ e: 42, s: 'd', r: 30 }
]
The non-existent "s" is absent in the result, instead of being shown as null
, which indicates that what is displayed was retrieved from the document rather than the index entry.
I verify this with the execution plan:
mdb> db.demoesr.find(
{ r:{$gt:10} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1}).explain("executionStats").executionStats;
n Returned: 3,
totalKeysExamined: 5,
totalDocsExamined: 5,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 3,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'FETCH',
filter: { r: { '$gt': 10 } },
nReturned: 3,
docsExamined: 5,
inputStage: {
stage: 'IXSCAN',
nReturned: 5,
keyPattern: { s: 1, r: 1 },
indexName: 's_1_r_1',
direction: 'forward',
indexBounds: { s: [ '[MinKey, MaxKey]' ], r: [ '[MinKey, MaxKey]' ] },
keysExamined: 5,
seeks: 1,
This is not good because no filtering has been applied to the IXSCAN: : [ '[MinKey, MaxKey]'
and docsExamined: 5
have been fetched, more than what we need for the result: nReturned: 3
.
Adding an unbounded range on the sort field
The workaround is to get back to an ESR query with a predicate on the first column, which covers the sort. I can't use an equality predicate as it may have multiple values, and use MinKey
or maxKey
to get a full range on this value:
mdb> db.demoesr.find(
{ s:{$gte:MinKey}, r:{$gt:10} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1});
[
{ e: 42, r: 40 },
{ e: 42, s: 'b', r: 20 },
{ e: 42, s: 'd', r: 30 }
]
After my earlier observation that a fully covered query would show s: null
coming from the index entry, the documents were retrieved from the collection. This is necessary because "e" is not in the index used here, so the document must be fetched and examined to obtain its value for the projection.
I need to display the execution plan to grasp the benefits of adding a dummy filter to the sort field, when it is the first field of the index:
mdb> db.demoesr.find(
{ s:{$gte:MinKey}, r:{$gt:10} } , {e:1,s:1,r:1,"_id":0 }
).sort({s:1}).explain("executionStats").executionStats;
{
nReturned: 3,
executionTimeMillis: 0,
totalKeysExamined: 5,
totalDocsExamined: 3,
executionStages: {
stage: 'PROJECTION_SIMPLE',
nReturned: 3,
transformBy: { e: 1, s: 1, r: 1, _id: 0 },
inputStage: {
stage: 'FETCH',
nReturned: 3,
docsExamined: 3,
inputStage: {
stage: 'IXSCAN',
nReturned: 3,
keyPattern: { s: 1, r: 1 },
indexName: 's_1_r_1',
direction: 'forward',
indexBounds: { s: [ '[MinKey, MaxKey]' ], r: [ '(10, inf.0]' ] },
keysExamined: 5,
seeks: 3,
The index bound on the first column used for Sort is the same: s: [ '[MinKey, MaxKey]' ]
, which confirms that adding s:{$gte:MinKey}
didn't change the number of index entries (keysExamined: 5
). What has changed is that the range predicate is now pushed down to the IXSCAN as r: [ '(10, inf.0]' ]
, and only docsExamined: 3
documents have been fetched to get the nReturned: 3
documents for the result.
Even if I've added a range predicate using MinKey or MaxKey, it is equivalent to the Equality in the ESR rule because what matters is that it returns a single range, ensuring that it is sorted on the following field of the composite index.
In summary, the ESR (Equality, Sort, Range) Rule outlined by MongoDB is an excellent framework for designing your composite indexes. However, it's essential to review the execution plan to ensure you meet your objective: avoiding the retrieval of too many documents.
Vector indexes, MariaDB & pgvector, large server, dbpedia-openai dataset
This post has results from ann-benchmarks to compare MariaDB and Postgres with a larger dataset, dbpedia-openai at 100k, 500k and 1M rows. It has 1536 dimensions and uses angular (cosine) as the distance metric. By larger I mean by the standards of what is in ann-benchmarks. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.
tl;dr
- Index create time was much less for MariaDB in all cases except the result for recall >= 0.95
- For a given recall, MariaDB gets between 2.1X and 2.7X more QPS than Postgres
This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit. The ann-benchmarks config files are here for MariaDB and for Postgres.
The command lines to run the benchmark using my helper scripts are:
bash rall.batch.sh v1 dbpedia-openai-100k-angular c32r128
These charts show the best QPS for a given recall. MariaDB gets about 2X more QPS than Postgres for a specific recall level
With 100k rows
With 500k rows
With 1M rows
Results: create index
- index sizes are similar between MariaDB and pgvector with halfvec
- time to create the index varies a lot and it is better to consider this in the context of recall which is done in next section
- M - value for M when creating the index
- cons - value for ef_construction when creating the index
- secs - time in seconds to create the index
- size(MB) - index size in MB
With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms (MariaDB, pgvector with float32, pgvector with float16/halfvec).
- Postgres does not get recall=1.0 for the values of M, ef_construction and ef_search I used
- Index create time was much less for MariaDB in all cases except the result for recall >= 0.95
- For a given recall target, MariaDB gets between 2.1X and 2.7X more QPS than Postgres
- recall, QPS - best QPS at that recall
- isecs - time to create the index in seconds
- m= - value for M when creating the index
- ef_cons= - value for ef_construction when creating the index
- ef_search= - value for ef_search when running queries
February 09, 2025
Aurora DSQL is different than Aurora, but Aurora DSQL belongs to Aurora (which belongs to RDS)
The term "Aurora DSQL" can be somewhat confusing, as it shares its name with another RDS database called "Aurora". This lack of distinction complicates discussions about the two. How would you refer to the original Aurora when it is neither "serverless" nor "limitless"?
The "non-DSQL" Aurora is PostgreSQL-compatible (APG -Aurora PostgreSQL) but is not a distributed database. It operates with a single instance that can handle your reads and writes and experiences downtime during failover or major upgrades.
In contrast, Aurora DSQL is distributed but offers minimal compatibility with PostgreSQL, as exposed in the previous posts of this series.
From a user's perspective, these two databases are quite different. Technically, DSQL storage does not use the multi-AZ storage, internally called "Grover", or simply "Aurora storage".
However, from a managed service perspective, it makes complete sense for AWS to offer DSQL under the umbrella of Amazon Aurora. I'll explain why, but remember that while the other posts in this series are technical and fact-based, this one is subjective and represents an opinion.
All Amazon RDS (the relational database services of AWS) databases are compatible with databases outside of Amazon. Aurora is compatible with MySQL or PostgreSQL. From an application standpoint, there is no difference.
Finding customers for a new database that behaves differently than any existing one would be tough. Amazon DynamoDB and Google Spanner are exceptions to this. Still, they succeeded because they were used internally by the cloud provider: DynamoDB powers Amazon (it all started with the shopping cart), and Google critical applications rely on Spanner (AdWords, YouTube). Many AWS customers also use DynamoDB because it is unique for specific use cases, but Spanner has never gained significant traction among Google customers. Customers prefer the freedom to migrate their applications to other cloud vendors without re-designing and coding.
Amazon DSQL combines the advantages of DynamoDB (restricting user actions to ensure predictable performance and scalability) and Spanner (integrating some SQL features like ACID transactions and relational data modeling). However, it is not equivalent to any existing database, employing Optimistic Concurrency Control with some SQL features (transactions, tables, and joins) but lacking others (no foreign key, no long transactions). No customer would place their core business data in a database with no equivalent on-premises or another cloud vendor. But they can if it is Aurora, previously known as a PostgreSQL alternative.
By cataloging DSQL under the Aurora umbrella, the issue is resolved. A multi-region resilient and elastic database attracts customers. They can begin building their applications on it and present it as PostgreSQL. Quickly, they encountered one of its many limitations. They could either accept the necessary workarounds in application code to continue using the distributed database or voice concerns about its lack of PostgreSQL compatibility. At this point, AWS can redirect them to the non-DSQL alternatives: Aurora (APG), Aurora Serverless, or Aurora Limitless.
As a distinct database service, the low adoption of DSQL would make the service unprofitable. However, as a variation of the Aurora engine, it can still be profitable despite low or short-term adoption, as it can direct users to the non-DSQL Aurora whenever they require PostgreSQL-compatible behavior. Developers can experiment with building new application services on Aurora DSQL and decide whether to remain with DSQL or switch to another version of Aurora later. Managers can consider AWS as an alternative to Google, with a Spanner equivalent. Ultimately, it’s about choosing the cloud infrastructure vendor, regardless of which database engine goes to production. It can also be managed by an AWS partner, like YugabyteDB, which is distributed, functions like PostgreSQL, and is multi-cloud, but the customer who started on AWS can remain on Amazon cloud infrastructure.
Aurora DSQL is technically different from the existing Aurora database engine. Still, from a user point of view, and with good marketing, it can be seen as a switch in the Aurora database service that reduces PostgreSQL compatibility to provide horizontal scalability, like the switch to serverless, when cost is more critical than performance predictability, to limitless, when a sharding key is possible for all use cases, or to compatibility extensions, like Babelfish. Amazon RDS is a marketplace for relational databases, and Amazon Aurora is a dedicated stall offering different flavors of PostgreSQL.