We recently looked at how various open-source database engines maintain their secondary indexes (in a previous analysis) and found significant differences. The maintenance of indexes is not the only aspect where storage engines differ, another significant difference is how they handle simple row updates. These updates highlight how these open-source databases organize data and manage … Continued
In this post, we share Ring’s billion-scale semantic video search on Amazon RDS for PostgreSQL with pgvector architectural decisions vs alternatives, cost-performance-scale challenges, key lessons, and future directions. The Ring team designed for global scale their vector search architecture to support millions of customers with vector embeddings, the key technology for numerical representations of visual content generated by an AI model. By converting video frames into vectors-arrays of numbers that capture what’s happening (visual content) in each frame – Ring can store these representations in a database and search them using similarity search. When you type “package delivery,” the system converts that text into a vector and finds the video frames whose vectors are most similar-delivering relevant results in under 2 seconds.
The latest release of the Percona Operator for MySQL, 1.1.0, is here. It brings point-in-time recovery, incremental backups, zstd backup compression, configurable asynchronous replication retries, and a set of stability fixes. This post walks through the highlights and how they help your MySQL deployments on Kubernetes. Percona Operator for MySQL 1.1.0 Running stateful databases … Continued
Introduction: Recently I was having a conversation with a DB Enthusiast, and he mentioned that when he was a fresher, he tuned an ETL/reporting query that was running for 8-10 hours via a nightly job by 1/3rd. He went to his manager, saying that he reduced the query execution time, thinking that the manager would … Continued
Amazon Aurora Serverless is an on-demand, auto scaling configuration for Aurora that scales up to support your most demanding workloads and down to zero when you don’t need it. The latest improvements deliver up to 30% better performance and enhanced scaling that understands your workload. These enhancements are available at no additional cost for a better price-performance ratio. In this post, we’ll share recent performance and scaling improvements with benchmark results, showing how Aurora Serverless can now scale up to 45.0% faster with a 32.9% faster workload completion time.
Having a separate DR cluster for production databases is a modern day requirement or necessity for tech and other related businesses that rely heavily on their database systems. Setting up such a [DC -> DR] topology for Percona XtraDB Cluster (PXC), which is a virtually- synchronous cluster, can be a bit challenging in a complex … Continued
AskTom Live is a great source of information from Oracle developer advocates and product managers, but I recently came across a clickbait marketing title ("Not All Binary Protocols Are Created Equal: The Science Behind OSON's 529x Performance Advantage") which compares apples to oranges, and it's an opportunity to explain what BSON is, the binary JSON format used by MongoDB.
TL;DR: If you want to compare with OSON, the Oracle Database datatype for JSON, you should compare the Mutable BSON Document which is the structure that MongoDB uses to access documents, reading and updating individual fields. Raw BSON is closer to protobuf: a compact serialization format for disk or network transfer, with access metadata removed and no blocks or headers.
I've left the following comment to the YouTube video but it seems that it is not publicly visible, so here it is.
Let me explain how Oracle Database and MongoDB handle disk-based data access, and you will understand the different design purposes of OSON and BSON, and why you are not testing the right thing to compare them.
Oracle Database, like many traditional databases, uses the same format on disk (blocks) and in memory (buffers), and must store all transient metadata that helps access it in memory on persistent storage. This applies to table blocks (which contain a table directory, a row directory, and even lock flags, ITLs, that need to be cleaned up later), and the same idea was used for OSON (header, dictionary, sorted field IDs, offset arrays). Think of it as a mini database with its catalog, like the Oracle database has its dictionary and segment headers, which map physical extents and blocks. Then accessing the on-disk OSON structure directly makes sense — it's designed to be used through buffers that match the disk blocks.
But MongoDB with WiredTiger uses a smarter cache where the in-memory structures are optimized for RAM: adding pointers instead of disk offsets, building an Elements Vector for O(1) field access, and adding skiplists to navigate fields, all when data is loaded into the database cache. So there are two formats: the mutable BSON that the database actually works on in memory for query processing and updates, and the on-disk raw BSON that, on purpose, strips any unnecessary metadata and compresses it, to maximize the OS filesystem cache usage, and fits to the major advantage of MongoDB for documents: read/write a document in a single I/O.
The raw BSON is a serialization format for disk and network, not to be accessed partially, because MongoDB has a powerful mutable BSON format in memory with O(1) access through its Elements Vector indexing. The O(n) sequential scan, the "no partial updates" limitation, and the field position penalties you describe — those are properties of the serialization format, not how MongoDB actually processes queries. And by definition, the serialization format is read sequentially, even though BSON can jump between fields. Don't do that except when you need a full document. Use the MongoDB server and drivers to access BSON, and learn how to use it correctly.
With this understanding, you can see that the "529x performance" clickbait title comes from a mistake: you used raw BSON to access individual fields, bypassing everything MongoDB does when serving a query. It would be like using BBED to query Oracle Datafiles without going through the instance — no buffer cache, no row directory navigation, no dictionary lookups — and then concluding that Oracle's storage format is slow.
Notably, the original OSON VLDB paper (Liu et al., 2020) by Zhen Hua Liu doesn't make the claims this video does. That paper honestly compares OSON against Oracle's own JSON text storage, not against MongoDB's query processing. It compares encoding sizes with BSON, which is legitimate for a serialization format comparison (though it overlooks that BSON in MongoDB is compressed on disk and over the network). The paper authors understood they were comparing serialization formats and storage approaches within Oracle, not benchmarking MongoDB's actual runtime performance. I believe OSON is the optimal format for Oracle because it was integrated into the existing instance, cache, and securefiles, which were created a long time ago. Conversely, BSON is ideal for MongoDB, as it capitalizes on the document database's purpose and the WiredTiger architecture.
Jepsen’s analysis of MySQL 8.0.34 walked through a set of concurrency and isolation anomalies in InnoDB. MariaDB, which inherits the same codebase, took the report seriously and shipped a response: a new server variable called innodb_snapshot_isolation, turned on by default starting in 11.8. The announcement claims that with the flag enabled, Repeatable Read in MariaDB … Continued
In this post, we show you how to build a multi-Region Kerberos authentication system that matches your Aurora Global Database’s resilience using AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) with multi-Region replication and a one-way forest trust to your on-premises Active Directory, so your Linux clients can authenticate without joining the AD domain.
Some readers are undoubtedly upset that I have not devoted more space to the
wonders of machine learning—how amazing LLMs are at code generation, how
incredible it is that Suno can turn hummed melodies into polished songs. But
this is not an article about how fast or convenient it is to drive a car. We
all know cars are fast. I am trying to ask what will happen to the shape of
cities.
Some of our possible futures are grim, but manageable. Others are downright
terrifying, in which large numbers of people lose their homes, health, or
lives. I don’t have a strong sense of what will happen, but the space of
possible futures feels much broader in 2026 than it did in 2022, and most of
those futures feel bad.
Much of the bullshit future is already here, and I am profoundly tired of it.
There is slop in my search results, at the gym, at the doctor’s office.
Customer service, contractors, and engineers use LLMs to blindly lie to me. The
electric company has hiked our rates and says data centers are to blame. LLM
scrapers take down the web sites I run and make it harder to access the
services I rely on. I watch synthetic videos of suffering animals and stare at
generated web pages which lie about police brutality. There is LLM spam in my
inbox and synthetic CSAM on my moderation dashboard. I watch people outsource
their work, food, travel, art, even relationships to ChatGPT. I read chatbots
lining the delusional warrens of mental health crises.
I am asked to analyze vaporware and to disprove nonsensical claims. I
wade through voluminous LLM-generated pull requests. Prospective clients ask
Claude to do the work they might have hired me for. Thankfully Claude’s code is
bad, but that could change, and that scares me. I worry about losing my home. I
could retrain, but my core skills—reading, thinking, and writing—are
squarely in the blast radius of large language models. I imagine going to
school to become an architect, just to watch ML eat that field too.
It is deeply alienating to see so many of my peers wildly enthusiastic about
ML’s potential applications, and using it personally. Governments and industry
seem all-in on “AI”, and I worry that by doing so, we’re hastening the arrival
of unpredictable but potentially devastating consequences—personal, cultural,
economic, and humanitarian.
I’ve thought about this a lot over the last few years, and I think the best
response is to stop. ML assistance reduces our performance and
persistence, and denies us both the
muscle memory and deep theory-building that comes with working through a task
by hand: the cultivation of what James C. Scott would
callmetis. I have never used an LLM for my writing, software, or personal life,
because I care about my ability to write well, reason deeply, and stay grounded
in the world. If I ever adopt ML tools in more than an exploratory capacity, I
will need to take great care. I also try to minimize what I consume from LLMs.
I read cookbooks written by human beings, I trawl through university websites
to identify wildlife, and I talk through my problems with friends.
I don’t think this will stop ML from advancing altogether: there are still
lots of people who want to make it happen. It will, however, slow them down,
and this is good. Today’s models are already very capable. It will take time
for the effects of the existing technology to be fully felt, and for culture,
industry, and government to adapt. Each day we delay the advancement of ML
models buys time to learn how to manage technical debt and errors introduced in
legal filings. Another day to prepare for ML-generated CSAM, sophisticated
fraud, obscure software vulnerabilities, and AI Barbie. Another day for workers
to find new jobs.
Staving off ML will also assuage your conscience over the coming decades. As
someone who once quit an otherwise good job on ethical grounds, I feel good
about that decision. I think you will too.
Despite feeling a bitter distaste for this generation of ML systems and the
people who brought them into existence, they do seem useful. I want to use
them. I probably will at some point.
For example, I’ve got these color-changing lights. They speak a protocol I’ve
never heard of, and I have no idea where to even begin. I could spend a month
digging through manuals and working it out from scratch—or I could ask an LLM
to write a client library for me. The security consequences are minimal, it’s a
constrained use case that I can verify by hand, and I wouldn’t be pushing tech
debt on anyone else. I still write plenty of code, and I could stop any time.
What would be the harm?
Right?
… Right?
Many friends contributed discussion, reading material, and feedback on this
article. My heartfelt thanks to Peter Alvaro, Kevin Amidon, André Arko, Taber
Bain, Silvia Botros, Daniel Espeset, Julia Evans, Brad Greenlee, Coda Hale,
Marc Hedlund, Sarah Huffman, Dan Mess, Nelson Minar, Alex Rasmussen, Harper
Reed, Daliah Saper, Peter Seibel, Rhys Seiffe, and James Turnbull.
This piece, like most all my words and software, was written by hand—mainly
in Vim. I composed a Markdown outline in a mix of headers, bullet points, and
prose, then reorganized it in a few passes. With the structure laid out, I
rewrote the outline as prose, typeset with Pandoc. I went back to make
substantial edits as I wrote, then made two full edit passes on typeset PDFs.
For the first I used an iPad and stylus, for the second, the traditional
pen and paper, read aloud.
I circulated the resulting draft among friends for their feedback before
publication. Incisive ideas and delightful turns of phrase may be attributed to
them; any errors or objectionable viewpoints are, of course, mine alone.
New Brand. Same Independence. If you read today’s announcement, you know Percona has a lot to say about what’s broken in modern data infrastructure. Lock-in dressed up as openness. Costs that climb while control shrinks. Vendors who made “managed” mean giving up visibility instead of gaining it. When we decided to stop being quiet about … Continued
This is a long article, so I'm breaking it up into a series of posts which will be released over the next few days. You can also read the full work as a PDF or EPUB; these files will be updated as each section is released.
As we deploy ML more broadly, there will be new kinds of work. I think much of
it will take place at the boundary between human and ML systems. Incanters
could specialize in prompting models. Process and statistical engineers
might control errors in the systems around ML outputs and in the models
themselves. A surprising number of people are now employed as model trainers,
feeding their human expertise to automated systems. Meat shields may be
required to take accountability when ML systems fail, and haruspices could
interpret model behavior.
LLMs are weird. You can sometimes get better results by threatening them,
telling them they’re experts, repeating your commands, or lying to them that
they’ll receive a financial bonus. Their performance degrades over longer
inputs, and tokens that were helpful in one task can contaminate another, so
good LLM users think a lot about limiting the context that’s fed to the model.
I imagine that there will probably be people (in all kinds of work!) who
specialize in knowing how to feed LLMs the kind of inputs that lead to good
results. Some people in software seem to be headed this way: becoming LLM
incanters who speak to Claude, instead of programmers who work directly with
code.
The unpredictable nature of LLM output requires quality control. For example,
lawyers keep getting in
trouble because they submit
AI confabulations in court. If they want to keep using LLMs, law firms are
going to need some kind of process engineers who help them catch LLM errors.
You can imagine a process where the people who write a court document
deliberately insert subtle (but easily correctable) errors, and delete
things which should have been present. These introduced errors are registered
for later use. The document is then passed to an editor who reviews it
carefully without knowing what errors were introduced. The document can only
leave the firm once all the intentional errors (and hopefully accidental
ones) are caught. I imagine provenance-tracking software, integration with
LexisNexis and document workflow systems, and so on to support this kind of
quality-control workflow.
These process engineers would help build and tune that quality-control process:
training people, identifying where extra review is needed, adjusting the level
of automated support, measuring whether the whole process is better than doing
the work by hand, and so on.
A closely related role might be statistical engineers: people who
attempt to measure, model, and control variability in ML systems directly.
For instance, a statistical engineer could figure out that the choice an LLM
makes when presented with a list of options is influenced
by the order in which those options were
presented, and develop ways to compensate. I suspect this might look something
like psychometrics—a field in which statisticians have gone to great lengths
to model and measure the messy behavior of humans.
Since LLMs are chaotic systems, this work will be complex and challenging:
models will not simply be “95% accurate”. Instead, an ML optimizer for database
queries might perform well on English text, but pathologically slow on
timeseries data. A healthcare LLM might be highly accurate for queries in
English, but perform abominably when those same questions are presented in
Spanish. This will require deep, domain-specific work.
As slop takes over the Internet, labs may struggle to obtain high-quality
corpuses for training models. Trainers must also contend with false sources:
Almira Osmanovic Thunström demonstrated that just a handful of obviously fake
articles1 could cause Gemini, ChatGPT, and Copilot to inform
users about an imaginary disease with a ridiculous
name. There are financial, cultural, and political incentives to influence
what LLMs say; it seems safe to assume future corpuses will be increasingly
tainted by misinformation.
One solution is to use the informational equivalent of low-background
steel: uncontaminated
works produced prior to 2023 are more likely to be accurate. Another option is
to employ human experts as model trainers. OpenAI could hire, say, postdocs
in the Carolingian Renaissance to teach their models all about Alcuin. These
subject-matter experts would write documents for the initial training pass,
develop benchmarks for evaluation, and check the model’s responses during
conditioning. LLMs are also prone to making subtle errors that look correct.
Perhaps fixing that problem involves hiring very smart people to carefully read
lots of LLM output and catch where it made mistakes.
In another case of “I wrote this years ago, and now it’s common knowledge”, a
friend introduced me to this piece on Mercor, Scale AI, et
al.,
which employ vast numbers of professionals to train models to do mysterious
tasks—presumably putting themselves out of work in the process. “It is, as
one industry veteran put it, the largest harvesting of human expertise ever
attempted.” Of course there’s bossware, and shrinking pay, and absurd hours,
and no union.2
You would think that CEOs and board members might be afraid that their own jobs
could be taken over by LLMs, but this doesn’t seem to have stopped them from
using “AI” as an excuse to fire lots of
people.
I think a part of the reason is that these roles are not just about sending
emails and looking at graphs, but also about dangling a warm body over the maws
of the legal
system and public opinion. You can fine an LLM-using corporation, but only humans can
be interviewed, apologize, or go to jail. Humans can be motivated by
consequences and provide social redress in a way that LLMs can’t.
I am thinking of the aftermath of the Chicago Sun-Times’ sloppy summer insert.
Anyone who read it should have realized it was nonsense, but Chicago Public
Media CEO Melissa Bell explained that they sourced the article from King
Features,
which is owned by Hearst, who presumably should have delivered articles which
were not sawdust and lies. King Features, in turn, says they subcontracted the
entire 64-page insert to freelancer Marco Buscaglia. Of course Buscaglia was
most proximate to the LLM and bears significant responsibility, but at the same
time, the people who trained the LLM contributed to this tomfoolery, as did the
editors at King Features and the Sun-Times, and indirectly, their respective
managers. What were the names of those people, and why didn’t they apologize
as Buscaglia and Bell did?
I think we will see some people employed (though perhaps not explicitly) as
meat shields: people who are accountable for ML systems under their
supervision. The accountability may be purely internal, as when Meta hires
human beings to review the decisions of automated moderation systems. It may be
external, as when lawyers are penalized for submitting LLM lies to the court.
It may involve formalized responsibility, like a Data Protection Officer. It
may be convenient for a company to have third-party subcontractors, like
Buscaglia, who can be thrown under the bus when the system as a whole
misbehaves. Perhaps drivers whose mostly-automated cars crash will be held
responsible in the same way.
Having written this, I am suddenly seized with a vision of a congressional
hearing interviewing a Large Language Model. “You’re absolutely right, Senator.
I did embezzle those sixty-five million dollars. Here’s the breakdown…”
When models go wrong, we will want to know why. What led the drone to abandon
its intended target and detonate in a field hospital? Why is the healthcare
model less likely to accurately diagnose Black
people?
How culpable should the automated taxi company be when one of its vehicles runs
over a child? Why does the social media company’s automated moderation system
keep flagging screenshots of Donkey Kong as nudity?
These tasks could fall to a haruspex: a person responsible for sifting
through a model’s inputs, outputs, and internal states, trying to synthesize an
account for its behavior. Some of this work will be deep investigations into a
single case, and other situations will demand broader statistical analysis.
Haruspices might be deployed internally by ML companies, by their users,
independent journalists, courts, and agencies like the NTSB.
When I say “obviously”, I mean the paper included the
phase “this entire paper is made up”. Again, LLMs are idiots.
At this point the reader is invited to blurt out whatever
screams of “the real problem is capitalism!” they have been holding back
for the preceding twenty-seven pages. I am right there with you. That said,
nuclear crisis and environmental devastation were never limited to capitalist
nations alone. If you have a friend or relative who lived in (e.g.) the USSR,
it might be interesting to ask what they think the Politburo would have done
with this technology.
This post takes a closer look at some of the most impactful features we have shipped in CedarDB across our recent releases. Whether you have been following along closely or are just catching up, here is a deeper look at the additions we are most excited about.
Role-Based Access Control
v2026-04-02
Controlling who can access and modify data is foundational for any production deployment. CedarDB now ships a fully PostgreSQL-compatible role-based access control (RBAC) system that lets you define fine-grained permissions and compose them into hierarchies that mirror your organization.
Roles are named containers for privileges. A role can represent a single user, a group, or an abstract set of capabilities, flexible enough to model almost any organizational structure. You create roles with CREATE ROLE and assign privileges on database objects (tables, sequences, schemas, …) with GRANT:
-- Create roles for different levels of access
CREATEROLEreadonly;CREATEROLEapp_backend;CREATEROLEadmin_role;-- A read-only role for dashboards and reporting
GRANTSELECTONTABLEorders,customers,productsTOreadonly;-- The application backend can read and write orders, but only read products
GRANTSELECT,INSERT,UPDATEONTABLEordersTOapp_backend;GRANTSELECTONTABLEcustomers,productsTOapp_backend;
Roles support inheritance, so you can build layered permission structures without duplicating grants. For example, an admin role that needs all backend privileges plus schema management:
-- admin_role inherits all privileges of app_backend
GRANTapp_backendTOadmin_role;-- ... and gets additional privileges on top
GRANTCREATEONSCHEMApublicTOadmin_role;
Assign roles to database users to put them into effect:
Now bob can insert orders but cannot touch the schema, while dashboard can only run SELECT queries. All of this is enforced by the database itself, not by application code. When permissions need to change, you update the role definition once rather than every user individually.
To tighten access later, REVOKE removes specific privileges:
REVOKEINSERT,UPDATEONTABLEordersFROMapp_backend;
Row Level Security
v2026-04-02
Standard permissions control the access to entire tables (or other database objects). Row Level Security (RLS) lets you go a step further by enforcing a more fine-grained access control at the row level, defining which rows a role can access within a table.
A typical use case is a multi-tenant application where a single table holds data for all clients, but each client should only see their own data:
CedarDB’s row level security implementation follows the PostgreSQL specification.
Check out our documentation for more details: Row Level Security Docs.
Delete Cascade
v2026-04-02
CedarDB lets you add foreign key constraints to ensure referential integrity.
Take, for example, the two tables customers and orders where each order belongs to a customer.
Each order references its customer with a foreign key, ensuring that a customer exists for each order.
Without such a constraint, deleting a customer while orders still reference it would leave the data in an inconsistent state.
While on delete restrict prevents such deletions by raising an error, CedarDB now also supports on delete cascade, which automatically deletes the referencing rows as well.
CREATETABLEcustomer(c_custkeyintegerPRIMARYKEY);CREATETABLEorders(o_orderkeyintegerPRIMARYKEY,o_custkeyintegerREFERENCEScustomerONDELETECASCADE);-- This also deletes all orders referencing customer 1
DELETEFROMcustomerWHEREc_custkey=1;
Note that tables with foreign keys might themselves be referenced by other tables:
CREATETABLElineitem(l_orderkeyintegerREFERENCESordersONDELETECASCADE);-- This also deletes all orders referencing customer 1 and all lineitems that reference those orders
DELETEFROMcustomerWHEREc_custkey=1;
With this, it is possible even to have cyclic delete dependencies, which are handled automatically by CedarDB as well.
Drizzle ORM Support
v2026-04-02
Drizzle is one of the most popular TypeScript ORMs, and CedarDB now supports it out of the box. This means TypeScript developers can use Drizzle to build applications backed by CedarDB with full compatibility.
To make this work, we closed a series of compatibility gaps with PostgreSQL: CedarDB now fully supports GENERATED ALWAYS AS IDENTITY columns (including custom sequence names) and pg_get_serial_sequence for auto-increment discovery. Additionally, we overhauled our system tables so Drizzle can correctly reconstruct full schema structure.
Want to try it yourself? Install Drizzle and point it at CedarDB just like you would a PostgreSQL database:
In this post, we walk through the steps to set up the custom migration assistant agent and migrate a PostgreSQL database to Aurora DSQL. We demonstrate how to use natural language prompts to analyze database schemas, generate compatibility reports, apply converted schemas, and manage data replication through AWS DMS. As of this writing, AWS DMS does not support Aurora DSQL as target endpoint. To address this, our solution uses Amazon Simple Storage Service (Amazon S3) and AWS Lambda functions as a bridge to load data into Aurora DSQL.
This is a long article, so I'm breaking it up into a series of posts which will be released over the next few days. You can also read the full work as a PDF or EPUB; these files will be updated as each section is released.
Software development may become (at least in some aspects) more like witchcraft
than engineering. The present enthusiasm for “AI coworkers” is preposterous.
Automation can paradoxically make systems less robust; when we apply ML to new
domains, we will have to reckon with deskilling, automation bias, monitoring
fatigue, and takeover hazards. AI boosters believe ML will displace labor
across a broad swath of industries in a short period of time; if they are
right, we are in for a rough time. Machine learning seems likely to further
consolidate wealth and power in the hands of large tech companies, and I don’t
think giving Amazon et al. even more money will yield Universal Basic Income.
Decades ago there was enthusiasm that programs might be written in a natural
language like English, rather than a formal language like Pascal. The folk
wisdom when I was a child was that this was not going to work: English is
notoriously ambiguous, and people are not skilled at describing exactly what
they want. Now we have machines capable of spitting out shockingly
sophisticated programs given only the vaguest of plain-language directives; the
lack of specificity is at least partially made up for by the model’s vast
corpus. Is this what programming will become?
In 2025 I would have said it was extremely unlikely, at least with the
current capabilities of LLMs. In the last few months it seems that models
have made dramatic improvements. Experienced engineers I trust are asking
Claude to write implementations of cryptography papers, and reporting
fantastic results. Others say that LLMs generate all code at their company;
humans are essentially managing LLMs. I continue to write all of my words and
software by hand, for the reasons I’ve discussed in this piece—but I am
not confident I will hold out forever.
Some argue that formal languages will become a niche skill, like assembly
today—almost all software will be written with natural language and “compiled”
to code by LLMs. I don’t think this analogy holds. Compilers work because they
preserve critical semantics of their input language: one can formally reason
about a series of statements in Java, and have high confidence that the
Java compiler will preserve that reasoning in its emitted assembly. When a
compiler fails to preserve semantics it is a big deal. Engineers must spend
lots of time banging their heads against desks to (e.g.) figure out that the
compiler did not insert the right barrier instructions to preserve a subtle
aspect of the JVM memory model.
Because LLMs are chaotic and natural language is ambiguous, LLMs seem unlikely
to preserve the reasoning properties we expect from compilers. Small changes in
the natural language instructions, such as repeating a sentence, or changing
the order of seemingly independent paragraphs, can result in completely
different software semantics. Where correctness is important, at least some humans must continue to read and understand the code.
This does not mean every software engineer will work with code. I can imagine a
future in which some or even most software is developed by witches, who
construct elaborate summoning environments, repeat special incantations
(“ALWAYS run the tests!”), and invoke LLM daemons who write software on their
behalf. These daemons may be fickle, sometimes destroying one’s computer or
introducing security bugs, but the witches may develop an entire body of folk
knowledge around prompting them effectively—the fabled “prompt engineering”. Skills files are spellbooks.
I also remember that a good deal of software programming is not done in “real”
computer languages, but in Excel. An ethnography of Excel is beyond the scope
of this already sprawling essay, but I think spreadsheets—like LLMs—are
culturally accessible to people who are do not consider themselves software
engineers, and that a tool which people can pick up and use for themselves is
likely to be applied in a broad array of circumstances. Take for example
journalists who use “AI for data analysis”, or a CFO who vibe-codes a report
drawing on SalesForce and Ducklake. Even if software engineering adopts more
rigorous practices around LLMs, a thriving periphery of rickety-yet-useful
LLM-generated software might flourish.
Executives seem very excited about this idea of hiring “AI employees”. I keep
wondering: what kind of employees are they?
Imagine a co-worker who generated reams of code with security hazards, forcing
you to review every line with a fine-toothed comb. One who enthusiastically
agreed with your suggestions, then did the exact opposite. A colleague who
sabotaged your work, deleted your home directory, and then issued a detailed,
polite apology for it. One who promised over and over again that they had
delivered key objectives when they had, in fact, done nothing useful. An intern
who cheerfully agreed to run the tests before committing, then kept committing
failing garbage anyway. A senior engineer who quietly deleted the test suite,
then happily reported that all tests passed.
You would fire these people, right?
Look what happened when Anthropic let Claude run a vending
machine. It sold metal
cubes at a loss, told customers to remit payment to imaginary accounts, and
gradually ran out of money. Then it suffered the LLM analogue of a
psychotic break, lying about restocking plans with people who didn’t
exist and claiming to have visited a home address from The Simpsons to sign
a contract. It told employees it would deliver products “in person”, and when
employees told it that as an LLM it couldn’t wear clothes or deliver anything,
Claude tried to contact Anthropic security.
LLMs perform identity, empathy, and accountability—at great length!—without
meaning anything. There is simply no there there! They will blithely lie to
your face, bury traps in their work, and leave you to take the blame. They
don’t mean anything by it. They don’t mean anything at all.
I have been on the Bainbridge Bandwagon for quite some time (so if you’ve read
this already skip ahead) but I have to talk about her 1983 paper
Ironies of
Automation.
This paper is about power plants, factories, and so on—but it is also
chock-full of ideas that apply to modern ML.
One of her key lessons is that automation tends to de-skill operators. When
humans do not practice a skill—either physical or mental—their ability to
execute that skill degrades. We fail to maintain long-term knowledge, of
course, but by disengaging from the day-to-day work, we also lose the
short-term contextual understanding of “what’s going on right now”. My peers in
software engineering report feeling less able to write code themselves after
having worked with code-generation models, and one designer friend says he
feels less able to do creative work after offloading some to ML. Doctors who
use “AI” tools for polyp detection seem to be
worse
at spotting adenomas during colonoscopies. They may also allow the automated
system to influence their conclusions: background automation bias seems to
allow “AI” mammography systems to mislead
radiologists.
Another critical lesson is that humans are distinctly bad at monitoring
automated processes. If the automated system can execute the task faster or more
accurately than a human, it is essentially impossible to review its decisions
in real time. Humans also struggle to maintain vigilance over a system which
mostly works. I suspect this is why journalists keep publishing fictitious
LLM quotes, and why the former head of Uber’s self-driving program watched his
“Full Self-Driving” Tesla crash into a
wall.
Takeover is also challenging. If an automated system runs things most of the
time, but asks a human operator to intervene occasionally, the operator is
likely to be out of practice—and to stumble. Automated systems can also mask
failure until catastrophe strikes by handling increasing deviation from the
norm until something breaks. This thrusts a human operator into an unexpected
regime in which their usual intuition is no longer accurate. This contributed
to the crash of Air France flight
447: the aircraft’s
flight controls transitioned from “normal” to “alternate 2B law”: a situation
the pilots were not trained for, and which disabled the automatic stall
protection.
Automation is not new. However, previous generations of automation
technology—the power loom, the calculator, the CNC milling machine—were
more limited in both scope and sophistication. LLMs are discussed as if they
will automate a broad array of human tasks, and take over not only repetitive,
simple jobs, but high-level, adaptive cognitive work. This means we will have
to generalize the lessons of automation to new domains which have not dealt
with these challenges before.
Software engineers are using LLMs to replace design, code generation, testing,
and review; it seems inevitable that these skills will wither with disuse. When
MLs systems help operate software and respond to outages, it can be more
difficult for human engineers to smoothly take over. Students are using LLMs to
automate reading and
writing:
core skills needed to understand the world and to develop one’s own thoughts.
What a tragedy: to build a habit-forming machine which quietly robs students of
their intellectual inheritance. Expecting translators to offload some of their
work to ML raises the prospect that those translators will lose the deep
context necessary
for a vibrant, accurate translation. As people offload emotional skills like
interpersonal advice and
self-regulation
to LLMs, I fear that we will struggle to solve those problems on our own.
There’s some terrifying
fan-fiction out there which predict
how ML might change the labor market. Some of my peers in software
engineering think that their jobs will be gone in two years; others are
confident they’ll be more relevant than ever. Even if ML is not very good at
doing work, this does not stop CEOs from firing large numbers of
people
and saying it’s because of
“AI”.
I have no idea where things are going, but the space of possible futures
seems awfully broad right now, and that scares the crap out of me.
You can envision a robust system of state and industry-union unemployment and
retraining programs as in
Sweden.
But unlike sewing machines or combine harvesters, ML systems seem primed to
displace labor across a broad swath of industries. The question is what happens
when, say, half of the US’s managers, marketers, graphic designers, musicians,
engineers, architects, paralegals, medical administrators, etc. all lose
their jobs in the span of a decade.
As an armchair observer without a shred of economic acumen, I see a
continuum of outcomes. In one extreme, ML systems continue to hallucinate,
cannot be made reliable, and ultimately fail to deliver on the promise of
transformative, broadly-useful “intelligence”. Or they work, but people get fed
up and declare “AI Bad”. Perhaps employment rises in some fields as the debts
of deskilling and sprawling slop come due. In this world, frontier labs and
hyperscalers pull a Wile E.
Coyote
over a trillion dollars of debt-financed capital expenditure, a lot of ML
people lose their jobs, defaults cascade through the financial system, but the
labor market eventually adapts and we muddle through. ML turns out to be a
normal
technology.
In the other extreme, OpenAI delivers on Sam Altman’s 2025 claims of PhD-level
intelligence,
and the companies writing all their code with Claude achieve phenomenal success
with a fraction of the software engineers. ML massively amplifies the
capabilities of doctors, musicians, civil engineers, fashion designers,
managers, accountants, etc., who briefly enjoy nice paychecks before
discovering that demand for their services is not as elastic as once thought,
especially once their clients lose their jobs or turn to ML to cut costs.
Knowledge workers are laid off en masse and MBAs start taking jobs at McDonalds
or driving for Lyft, at least until Waymo puts an end to human drivers. This is
inconvenient for everyone: the MBAs, the people who used to work at McDonalds
and are now competing with MBAs, and of course bankers, who were rather
counting on the MBAs to keep paying their mortgages. The drop in consumer
spending cascades through industries. A lot of people lose their savings, or
even their homes. Hopefully the trades squeak through. Maybe the Jevons
paradox kicks in eventually and
we find new occupations.
The prospect of that second scenario scares me. I have no way to judge how
likely it is, but the way my peers have been talking the last few months, I
don’t think I can totally discount it any more. It’s been keeping me up at
night.
Broadly speaking, ML allows companies to shift spending away from people
and into service contracts with companies like Microsoft. Those contracts pay
for the staggering amounts of hardware, power, buildings, and data required to
train and operate a modern ML model. For example, software companies are busy
firing engineers and spending more money on
“AI”. Instead of hiring a software
engineer to build something, a product manager can burn $20,000 a week on
Claude tokens, which in turn pays for a lot of Amazon
chips.
Unlike employees, who have base desires and occasionally organize to ask for
better
pay
or bathroom
breaks,
LLMs are immensely agreeable, can be fired at any time, never need to pee, and
do not unionize. I suspect that if companies are successful in replacing large
numbers of people with ML systems, the effect will be to consolidate both money
and power in the hands of capital.
AI accelerationists believe potential economic shocks are speed-bumps on the
road to abundance. Once true AI arrives, it will solve some or all of society’s
major problems better than we can, and humans can enjoy the bounty of its
labor. The immense profits accruing to AI companies will be taxed and shared
with all via Universal Basic
Income (UBI).
This feels hopelessly naïve. We
have profitable megacorps at home, and their names are things like Google,
Amazon, Meta, and Microsoft. These companies have fought tooth and
nail to avoid paying
taxes
(or, for that matter, their
workers). OpenAI made it less than a decade before deciding it didn’t want to be a nonprofit any
more. There
is no reason to believe that “AI” companies will, having extracted immense
wealth from interposing their services across every sector of the economy, turn
around and fund UBI out of the goodness of their hearts.
If enough people lose their jobs we may be able to mobilize sufficient public
enthusiasm for however many trillions of dollars of new tax revenue are
required. On the other hand, US income inequality has been generally
increasing for 40
years,
the top earner pre-tax income shares are nearing their highs from the
early 20th
century, and Republican opposition to progressive tax policy remains strong.
Deploy ParadeDB on Railway with one click. Full-text search, vector search, and hybrid search over Postgres — now available on your favorite cloud platform.