BugBash'26 Afternoon of Day 1
These are my notes from the afternoon sessions of BugBash'26. We had a 75 minute lunch break. Nice lunch, but there were no vegetarian entries, which made Peter Alvaro hangry. I don't blame him, I would be too.
Informal methods
Ben Eggers, Member of Technical Staff @ OpenAI
This was a fun and also thought provoking talk. The premise is "Nothing has changed about software development". Really? After the LLMs eating software like a wildfire, and particularly rocking at code generation in the last 3 months?? And this is coming from an OpenAI infrastructure engineer, who was once a 8th highest 7d token user. How come?
The talk has two parts:
- writing code was where the hard parts surfaced
- agents move the work, but do not obviate it
Ok, now it makes more sense. Both of these are sensible statements. Ben followed with a couple disclaimers, that he is talking about deep narrow systems, and not about broad high surface area systems, because he has experience in the former, and didn't want to make claims for the latter. Ben is a funny guy, and refreshingly honest. He made fun of LLMs mistake in code generation through 3 popup quizzes through out his talk.
Part 1: Writing code has always been where the hard part surfaced
Ben harkened back to Will Wilson's claim that before LLMs, 50% of the time was spent in writing code, and 50% in testing. Ben asked, hey, what happened to design, discovery, integration, and correctness? He made the case that these were the hard parts, and writing code forces people to address them:
- decide intent
- discover the shape of the problem
- turn boundaries into contracts
- notice weirdness before anyone else does
- fill in the code
This is a long arduous process. The slowness of writing code was load-bearing! When you notice your code start crossing boundaries, this exposed bad interfaces in your components. Wiring paths end-to-end helped expose missing cases. Writing tests (yes this was the first thing we lost to the LLM wildfire) forced expected behavior to become explicit (test your interfaces)
Before LLMs, how fast code appeared matched how fast humans could reason about it. And yes LLMs broke this part! But maybe not so drastically. Tech leads already managed stochastic work, and knew how to break the problem, and manage junior engineers, and interns writing code. The job was always narrowing distribution, and turning a large spectrum of possible outcomes into a tighter reliable band.
Part 2: Agents move the work, but don't make it disappear
Models crossed a usefulness threshold ~3 months ago. (This point kept coming up in many of the talks. Some people claimed it happened it November-December, but everyone--except Gary Marcus-- agreed a corner was turned.)
But the code you get back is proportional to the leg work you do. Models write better code when you do the leading: tell them what success means, give them the shape of the solution, determine the behavior up-front.
You need to be incredibly specific in your prompt, and in the limit prompts become math-like! (You mean TLA+ specs?) So Ben recommends:
- design: do the hardest parts yourself
- write your schemas by hand
- write your APIs and interfaces by hand
- correctness: give it rails
Unit testing is kinda dead, LLMs do a great job of that. But always implement tests in a different context. Write interfaces, write tests, and tell the LLM to not to touch the test, and ask it to write the code. This is... exactly like managing an army of interns.
So, Ben claims, nothing changed overall. Code got cheap, but correctness did not. You can outsource your coding, but you can outsource your thinking/understanding.
Now more than ever: building reliable software in the age of agents
Ron Minsky, Co-head of Technology @ Jane Street
I didn't take much notes in this talk, and took some headspace time in the second part.
Protocol-aware deterministic simulation testing
Chaitanya Bhandari, Distributed Systems Engineer @ TigerBeetle
Chaitanya is really smart and gave a decent talk. Again I didn't take much notes, and hit the hallway track. Waking up at 4am to fly in the same day from Buffalo to DC took its toll on me.
Fast and fault-tolerant: pick two
Matt Barrett, Founder & CEO @ Adaptive
The Adaptive company helps moving a lot of cash around the globe. Stressful work. Matt talked about what he claims is the world's fastest (in terms of low-latency) Raft implementation. This was built 8 years ago. Antithesis (tested couple months ago) said it is one of the most reliable Raft implementations they saw. It's used for trading systems/infrastructure. It supports 100K transactions/sec, and provides low double digit microseconds with low variance.
Aeron cluster, their fast and fault-tolerant Raft, builds on opensource Aeron as a low latency high throughput messaging layer. It is based on individual byte replication, not message replication! They moved from a message index to a byte index, with natural batching at all levels. Matt said business logic runs in the cluster for low latency. I don't know what he means exactly by that.
But, why didn't we hear about this Raft implementation before? Also the talk did not mention any protocol innovations. It looks like there isn't much protocol level innovation at the distributed systems level or algorithmic level, and the innovation may be at the lower layers, at data handling and networking implementation. I still don't have a good idea of their Raft implementation after the talk.
Making high performance storage boring
Corwin Coburn, Uber Tech Lead, Parallel File Systems @ Google
The point is you want to keep storage boring. Storage is about writes and reads. It is a utility, and, hence, is boring. Nobody calls the plumber when things are fine. They built at Google the fastest luster filesystem in the world with 10 TB/sec.
Parallel file systems in the cloud requires capacity, performance, availability, security, ease of use. They put a lot of effort to keep the storage boring by building on reliability infrastructure, proven software (lustre), strict tenancy isolation, providing limited configurations, and achievable SLOs.
This part is important do not overpromise, and not overdeliver! If you overdeliver, and customers get accustomed to it (Hyrum's law), when you go normal, that breaks the customers.
Since this conference care a lot about testing, what about testing at scale?
- try to avoid this
- but test each component at large scale
- and test e2e at small scale
- do the math
- and finally do limited test at full scale
Innovation << Reliability << Inertia
This is also a big tenet of keeping it boring. Nobody rewrites their applications to use your uber/super API. It has to be boring, remember, utility is boring.
A tip for the developers. When you don't use storage properly, it performs badly. Most developers don't know how to use storage properly. One important thing is: don't use filesystem metadata for query intensive operations.