a curated list of database news from authoritative sources

July 08, 2021

Announcing Vitess Arewefastyet

Benchmarking is a critical technique for delivering high performance software. The basic idea behind benchmarking is measuring and comparing the performance of one software version against another. Over the years, many benchmarking techniques have emerged, but we can broadly separate them into two categories: micro and macro benchmarks. Micro-benchmarks measure a small part of the codebase, usually by isolating a single function call and calling it repeatedly, whereas macro-benchmarks measure the performance of the codebase as a whole and run in an environment similar to what end-users experience.

July 04, 2021

AWS EC2 Hardware Trends: 2015-2021

 

Over the past decade, AWS EC2 has introduced many new instance types with different hardware configurations and prices. This hardware zoo can make it hard to keep track of what is available. In this post we will look at how the EC2 hardware landscape changed since 2015. This will hopefully help picking the best option for a given task.

In the cloud, one can trade money for hardware resources. It therefore makes sense to take an economical perspective and normalize the hardware resource by the instance price. For example, instead of looking at absolute network bandwidth, we will use network bandwidth per dollar. Such metrics also allow us to ignore virtualized slices, reducing the number of instances relevant for the analysis from hundreds to dozens. For example, c5n.9xlarge is a virtualized slice of c5n.18xlarge with half the network bandwidth and half the cost.

Data Set

We use historical data from https://instances.vantage.sh/ and only consider current-generation Intel machines without GPUs. All prices are for us-east-1 Linux instances. Using these constraints, in July 2021 we can pick from the following instances:

namevCPUmemory [GB]network [Gbit/s]storageprice [$/h]
m4.16x6425625
3.20
h1.16x64256258x2TB disk3.74
c5n.18x72192100
3.89
d3.8x322562524x2TB disk4.00
c5.24x9619225
4.08
r4.16x6448825
4.26
m5.24x9638425
4.61
c5d.24x96192254x0.9TB NVMe4.61
i3.16x64488258x1.9TB NVMe5.00
m5d.24x96384254x0.9TB NVMe5.42
d2.8x362441024x2TB disk5.52
m5n.24x96384100
5.71
r5.24x9676825
6.05
d3en.12x481927524x14TB disk6.31
m5dn.24x963841004x0.9TB NVMe6.52
r5d.24x96768254x0.9TB NVMe6.91
r5n.24x96768100
7.15
r5b.24x9676825
7.15
r5dn.24x967681004x0.9TB NVMe8.02
i3en.24x967681008x7.5TB NVMe10.85
x1e.32x1283904252x1.9TB SATA26.69

&&

Compute

Using our six-year data set, let's first look at the cost of compute:


It is quite remarkable that from 2015 to 2021, the cost of compute barely changed. During that six-year time frame, the number of server CPU cores has been growing significantly, which may imply that Intel compute power is currently overpriced in EC2. In the last couple of years EC2 has introduced cheaper AMD and ARM instances, but it's still surprising that AWS chose to keep Intel CPU prices fixed.

DRAM Capacity

For DRAM, the picture is also quite stagnant:



The introduction of the x1e instances improved the situation a bit, but there's been a stagnation since 2018. However, this is less surprising than the CPU situation because DRAM commodity prices in general did not move much.

Instance Storage

Let's next look at instance storage. EC2 offers instances with disks (about 0.2GB/s bandwidth), SATA SSDs (about 0.5GB/s bandwidth), and NVMe SSDs (about 2GB/s bandwidth). The introduction of instances with up to 8 NVMe SSDs in 2017 clearly disrupted IO bandwidth speed (the y-axis unit may look weird for bandwidth but is correct once we normalize by hourly cost):



In terms of capacity per dollar, disk is still king and the d3en instance (introduced in December 2020) totally changed the game:


Network Bandwidth

For network bandwidth, we see another major disruption, this time the introduction of 100GBit network instances:



The c5n instance, in particular, is clearly a game changer. It is only marginally more expensive than c5, but its network speed is 4 times faster.

Conclusions

These results show that the hardware landscape is very fluid and regularly we see major changes like the introduction of NVMe SSDs or 100 GBit networking. Truisms like "in distributed systems network bandwidth is the bottleneck" can become false! (Network latency is of course a different beast.) High-performance systems must therefore take hardware trends into account and adapt to the ever-evolving hardware landscape.


July 02, 2021

July 01, 2021

June 25, 2021

Starting with Kafka

I just want to share my thoughts on Kafka after using it for a few months, always from a practical point of view. I don’t know anything more than the basics ...

June 24, 2021

June 22, 2021

Leaders, you need to share organization success stories more frequently

This post goes out to anyone who leads a team: managers, directors, VPs, executives. You need to share organization success stories with your organization on a regular and frequent basis. Talk about sales wins, talk about new services released, talk about the positive impact of a recent organizational change. Just get in front of your entire organization and tell them how the organization is making a positive difference.

Do this at least every other week.

And in case it's not clear, by "success stories" I don't mean nonsense, or opinions. I mean concrete, measurable things that moved the organization forward.

Everyone in your organization is contributing to these stories and it's your job to feed the stories back.

Leaders have a tendency to hear about successes but don't always remember to propagate the stories down. I've been guilty of this myself. This post is your (and my own) friendly reminder.

If you don't keep reminding your folks their organization is making a positive impact, they're going to forget it. You'll miss out on the freely available chance to give reassurance to your best people.

Talented folks want to be invested in an organization that is succeeding.