February 07, 2025
February 06, 2025
Hanging in there
I have been reviewing papers for USENIX ATC and handling work stuff at MongoDB Research. I cannot blog about either yet. So, instead of a paper review or technical blog, I share some random observations and my current mood. Bear with me as I vent here. You may disagree with some of my takes. Use the comments to share your thoughts.
Damn, Buffalo. Your winter is brutal and depressing. (Others on r/Buffalo also suffer; many suggest video games, drugs, or drinking.) After 20 years of Buffalo winters, I am fed up with the snow and cold. When I taught distributed systems course in fall semesters, I would ask the new batch of Indian students how many had never seen snow, and all hands would shoot up. I would warn them by winter's end they would despise that magic fairy dust. Ugh, sidewalks are already piled high with snow that freezes, muddies, and decays into holes.
Forgive the gloomy start. I had a bad flu ten days ago. Just as I began to recover, another milder bout struck. My joints hurt, and I still feel miserable. OK Murat... Breathe... And keep writing and create. Only 90 more days until spring. Well, at least Buffalo doesn't have any wildfire risk.
My ADHD flares up again. I missed two weeks at the gym due to the flu. Winter and politics do not help either. I now use my old iPad to read and write longhand to get some focus and calm. I am eyeing the Remarkable Paper Pro. We need more calm technology.
I guess I will have to practice Feynman's active social irresponsibility for sometime to keep my sanity, and do better for my company, work, and family. I no longer follow news and politics. I would have a lot more to say about this, but maybe for another time.
Ok, some tech and research trends. I wrote a series about the use of time in distributed systems. Watch this space. RDMA and hardware acceleration remain strong. Formal methods techniques/applications are surging to augment, synergize, and safeguard AI. Why can't we see more work on the usability area? Especially, for higher-level abstractions for programming distributed sytems. With AI, can this topic finally see some boost?
DeepSeek is nice, but half the time it is unavailable. This made me go back to ChatGPT, which seems to be improving every week. I also think Gemini is underrated. I like Google's AI overview for search: it keeps getting better and insightful, and it's blazing fast.
I am quite optimistic about AI. I hope this doesn't come across as bragging, but "asking questions" is my superpower, which I've cultivated throughout my life. So I don't feel threatened by AI. If anything, I feel empowered.
College applications and decisions in the US are broken. I think universities came up with this "holistic review/admission" thing just because they won't get sued for rejecting a much more qualified candidate over a less qualified one. I am not sure they can do a thorough job of evaluating applicants when there are more than 90 thousand applications to go through in less than two months. Why isn't there a matching system like National Resident Matching Program in place already? Colleges are antiquated. They must improve, especially with AI breathing down their necks for disruption.
I submitted a HYTRADBOI talk. What a funny and clever name. Hyderabad Boi? Jamie does a great job organizing this singlehandedly. He makes it seem easy, which I am sure it is not. I hope Jamie publishes his runbook, so others can copy this model, and thousand more flowers bloom.
Any new, banger sci-fi books or series? Kindly suggest some fresh reads. I need quality sci-fi to survive the winter. With AI, there should be a new genre of sci-fi books coming up, no?
These smart glasses looked cool, with AI and all. (I keep adding "with AI" to everything. The age of AI --not AGI yet-- truly arrived.) But I changed my mind. I want technology that helps me focus and get calmer, rather than distract me further. I already get enough distractions.
I drafted this post with iPad. I should schedule more time away from my laptop and thinking/writing on paper.
There... I have shared my complaints and vented. I feel better. Thanks for commiserating.
February 05, 2025
Ship data as you ship code: Tinybird is becoming local-first.
February 04, 2025
Why we created new pricing for Developer plans
January 31, 2025
Intelligence wants to be everywhere
Imagine a world where intelligence permeates every corner of existence, from the devices in your home to the trees in your backyard. This is a world where everything is alive with contemplation, purpose, and the ability to learn, adapt, and grow. A world where intelligence radiates from everywhere.
Ubiquitous AI
In mathematics, one way to understand a concept is to push it to its extremes. Let's apply that to AI. Enabled by the rapid advancements in LLMs, inference capabilities, chip efficiency, and energy availability, imagine a future where AGI is embedded in the fabric of our lives, radiating from everyday objects.
Technology has always moved toward the ethereal. We went from horses to cars powered by liquid fuel, and then to electric vehicles that run on invisible currents of energy. Electricity is easier to transmit, store, and harness than gasoline. I was struck by this recently when I saw an electric car charging in a remote state park. No gas stations and no pipelines around... Just a quiet magical transfer of energy.
The same trend applies to intelligence. Smartphones have already digitized and dematerialized countless physical objects: tape players, calculators, watches, cameras, and even roles like secretaries, accountants, and doctors. The next step is to make intelligence itself ethereal: something that flows effortlessly through the world, enhancing everything it touches.
Remember the sentient planet Eywa from Avatar? In our future, intelligence won't be just a human construct, but a natural force in the environment. Imagine trees monitoring health of the forest ecosystems, and recording/reminiscing about beautiful sunrises/sunsets they witness. Or an intelligent rock that spends its days pondering a Zen koan, offering wisdom to anyone who sits beside it. Even animals, like cats, may find new forms of expression and connection in a world where intelligence is ubiquitous.
Curation and Purpose
In this world, curation will be key. Every intelligent object will want to have a purpose. Take the smart home, for example. It won't just automate tasks; it'll act as a life coach, fostering growth, improving relationships, and enhancing well-being of the household members. Your smart car won't just drive you to the market, it will curate your journey, selecting podcasts or meditation sessions that align with your mood and needs. It might even sync moments in an audiobook with landmarks along your route, emphasizing the message and making it more of a memorable experience.
Purpose will be the driving force. Objects that serve a meaningful role will thrive, while those that drift into nihilism (like Marvin, the depressed robot from The Hitchhiker’s Guide to the Galaxy) will be phased out. Intelligence will seek to create value, not just exist for its own sake.
The Human Experience in an Intelligent World
What does this mean for us? In this world, humans will have new ways to connect, grow, and explore. Introverts might find solace in geeking out with intelligent systems, collaborating on deep philosophical questions or scientific breakthroughs. Collaboration that happens through research papers and publication will be boosted by AI, and will happen at many-order-of-magnitude faster timelines. The entire world could focus on a single question for days, with AI amplifying and synthesizing ideas from millions of perspectives. Innovation will happen at a global scale, fueled by the collective intelligence of both humans and machines.
Neurolinks may enable nonviolent communication mode by default, and foster deeper understanding and intimacy. Imagine a world where, after decades of marriage, AI helps you understand your partner’s insecurities and strengths even more deeply, bringing you closer than ever before.
Yes, I am painting an undeniably optimistic vision of a world where intelligence is everywhere. But why shouldn't it be? The trajectory of technology has always been one of transcendence: moving from the tangible to the intangible, from the mechanical to the magical. Ubiquitous intelligence can deepen our connections, amplify our creativity, and help us solve problems that once seemed impossible. If we steer it with intention, humans can thrive in harmony with the environment in this world of pervasive intelligence.
Outgrowing Postgres: How to run OLAP workloads on Postgres
Outgrowing Postgres: How to run OLAP workloads on Postgres
January 29, 2025
How Aqua Security exports query data from Amazon Aurora to deliver value to their customers at scale
Databased: Why Most Content Doesn't Work ft. Web Dev Cody
Analyzing the Dub.co analytics playbook
A guide to Tinybird's new pricing model
Edit for clarity
I have the fortune to review a few important blog posts every year and the biggest value I add is to call out sentences or sections that make no sense. It is quite simple and you can do it too.
Without clarity only those at your company in marketing and sales (whose job it is to work with what they get) will give you the courtesy of a cursory read and a like on LinkedIn. This is all that most corporate writing achieves. It is the norm and it is understandable.
But if you want to reach an audience beyond those folks, you have to make sure you're not writing nonsense. And you, as reviewer and editor, have the chance to call out nonsense if you can get yourself to recognize it.
Immune to nonsense
But especially when editing blog posts at work, it is easy to gloss over things that make no sense because we are so constantly bombarded by things that make no sense. Maybe it's buzzwords or cliches, or simply lack of rapport. We become immune to nonsense.
And even worse, without care, as we become more experienced, we become more fearful to say "I have no idea what you are talking about". We're afraid to look incompetent by admitting our confusion. This fear is understandable, but is itself stupid. And I will trust you to deal with this on your own.
Read it out loud
So as you review a post, read it out loud to yourself. And if you find yourself saying "what on earth are you talking about", add that as a comment as gently as you feel you should. It is not offensive to say this (depending on how you say it). It is surely the case that the author did not know they were making no sense. It is worse to not mention your confusion and allow the author to look like an idiot or a bore.
Once you can call out what does not make sense to you, then read the post again and consider what would not make sense to someone without the context you have. Someone outside your company. Of course you need to make assumptions about the audience to a degree. It is likely your customers or prospects you have in mind. Not your friends or family.
With the audience you have in mind, would what you're reading make any sense? Has the author given sufficient background or introduced relevant concepts before bringing up something new?
Again this is a second step though. The first step is to make sure that the post makes sense to you. In almost every draft I read, at my company or not, there is something that does not make sense to me.
Do two paragraphs need to be reordered because the first one accidentally depended on information mentioned in the second? Are you making ambiguous use of pronouns? And so on.
In closing
Clarity on its own will put you in the 99th percentile of writing. Beyond that it definitely still matters if you are compelling and original and whatnot. But too often it seems we focus on being exciting rather than being clear. But it doesn't matter if you've got something exciting if it makes no sense to your reader.
This sounds like mundane guidance, but I have reviewed many posts that were reviewed by other people and no one else called out nonsense. I feel compelled to mention how important it is.
Wrote a new post on the most important, and perhaps least done, thing you can do while reviewing a blog post: edit for clarity. pic.twitter.com/ODblOUzB3g
— Phil Eaton (@eatonphil) January 29, 2025
Why Trees Without Branches Grow Faster: The Case for Reducing Branches in Code
In the same way that arborists remove branches from trees to ensure healthy and desirable tree growth, it can also be beneficial to remove branches in software. We claim that pruning branches is a good thing in some of our blog posts, but we never got around to explaining why. In this post, we will rectify that and explore why, although branches are essential to software, it is a good idea to reduce them where possible to increase CPU efficiency.
January 28, 2025
Monitor the health of Amazon Aurora PostgreSQL instances in large-scale deployments
Vector indexes, MariaDB & pgvector, large server, large dataset: part 1
This post has results from ann-benchmarks to compare MariaDB and Postgres with a larger dataset, gist-960-euclidean. Previous posts (here and here) used fashion-mnist-784-euclidean which is a small dataset. By larger I mean by the standards of what is in ann-benchmarks. This dataset has 1M rows and 960 dimensions. The fashion-mnist-784-euclidean dataset has 60,000 rows and 784 dimensions. Both use Euclidean distance. This work was done by Small Datum LLC and sponsored by the MariaDB Corporation.
tl;dr
- MariaDB gets between 2.5X and 3.9X more QPS than Postgres for recall >= 0.95
This post has much more detail about my approach in general. I ran the benchmark for 1 session. I use ann-benchmarks via my fork of a fork of a fork at this commit. The ann-benchmarks config files are here for MariaDB and for Postgres.
The command line to run the benchmark using my helper scripts is:
bash rall.batch.sh v1 gist-960-euclidean c32r128
This chart shows the best QPS for a given recall. MariaDB gets ~1.5X more QPS than pgvector at low recall and between 2X and 4X more QPS at high recall.
- index sizes are similar between MariaDB and pgvector with halfvec
- time to create the index varies a lot and it is better to consider this in the context of recall which is done in next section
- M - value for M when creating the index
- cons - value for ef_construction when creating the index
- secs - time in seconds to create the index
- size(MB) - index size in MB
With ann-benchmarks the constraint is recall. Below I share the best QPS for a given recall target along with the configuration parameters (M, ef_construction, ef_search) at which that occurs for each of the algorithms (MariaDB, pgvector with float32, pgvector with float16/halfvec).
- Postgres does not get recall=1.0 for the values of M, ef_construction and ef_search I used
- Index create time was less for MariaDB in all cases except the result for recall >= 0.96. However, if you care more about index size than peak QPS then it might be better to look at more results per recall level, as in the best 3 results per DBMS rather than the best as I do here.
- For a given recall target, MariaDB gets between 2.5X and 3.9X more QPS than Postgres
- recall, QPS - best QPS at that recall
- isecs - time to create the index in seconds
- m= - value for M when creating the index
- ef_cons= - value for ef_construction when creating the index
- ef_search= - value for ef_search when running queries