a curated list of database news from authoritative sources

August 02, 2019

July 24, 2019

Cuckoo Filters with arbitrarily sized tables

Cuckoo Filters are an interesting alternative to Bloom filters. Instead of maintaining a filter bitmap, they maintain a small (cuckoo-)hash table of key signatures, which has several good properties. For example is stores just the signature of a key instead of the key itself, but is nevertheless able to move an element to a different position in the case of conflicts.

This conflict resolution mechanism is quite interesting: Just like regular cuckoo hash tables each element has two potential positions where is be placed, a primary position i1 and a secondary position i2. These can be computed as follows:

i1 = hash(x)
i2 = i1 xor hash(signature(x))

Remember that the cuckoo filter stores only the (small) signature(x), not x itself. Thus, when we encounter a value, we cannot know if it is at its position i1 or position i2. However, we can nevertheless alternate between positions because the following holds

i1 = i2 xor hash(signature(x))

and we have the signature stored in the table. Thus, we can just use the self-inverse xor hash(signature(x)) to switch between i1 and i2, regardless of whether are currently at i1 or i2. Which is a neat little trick. This allows is to switch between positions, which is used in the cuckoo filter conflict resolution logic.

However all this hold only because the original cuckoo filters use power-of-two hash tables. If our hash table size is not a power of 2, the xor can place the alternative position beyond the size of the hash table, which breaks the filter. Thus cuckoo filter tables always had to be powers of two, even if that wasted a lot of memory.

In more recent work Lang et al. proposed using cuckoo filters with size C, where C did not have to be a power of two, offering much better space utilization. They achieved this by using a different self-inverse function:

i1 = hash(x) mod C
i2 = -(i1 + hash(signature(x)) mod C

Note that the modulo computation can be made reasonable efficient by using magic numbers, which can be precomputed when allocating the filter.

A slightly different way to formulate this is to introduce a switch function f, which switches between positions:

f(i,sig,C) = -(i + hash(sig)) mod C
i1 = hash(x) mod C
i2 = f(i1, signature(x), C)
i1 = f(i2, signature(x), C)
All this works because f is self-inverse, i.e.,

i = f(f(i, signature(x), C), signature(x), C)

for all C>0, i between 0 and C-1, and signature(x)>0.

The only problem is: Is this true? In a purely mathematical sense it is, as you can convince yourself by expanding the formula, but the cuckoo filters are not executed on abstract machines but on real CPUs. And there something unpleasant happens: We can get numerical overflows of our integer registers, which implicitly introduces a modulo 2^32 into our computation. Which breaks the self-inverseness of f in some cases, except when C is power of two itself.

Andreas Kipf noticed this problem when using the cuckoo filters with real world data. Which teaches us not to trust in formulas without additional extensive empirical validation... Fortunately we can repair the function f by using proper modular arithmetic. In pseudo-code this looks like this

f(i,sig,C)
    x=(C-1)-(hash(sig) mod C)
    if (x>=i)
       return (x-i);
    // The natural formula would be C-(i-x), but we prefer this one...
    return C+(x-i);

This computes the correct wrap-around module C, at the cost of one additional if. We can avoid the if by using predication, as shown below

f(i,sig,C)
    x=(C-1)-(hash(sig) mod C)
    m = (x<i)*(~0)
    return (m&C)+(x-i);


which can be attractive for SSE implementations where the comparison produces a bit mask anyway.

We have validated that this new f function is now self-inverse for all possible values of i, sig, and C. And we did this by not just looking at the formula, but by trying out all values programmatically. Which is a good way to get confidence in your approach; there is only a finite number of combinations, and we can test them all.
With this small fix, we can now enjoy Cuckoo Filters with arbitrarily sized tables.

Edit:  The original post did not mirror the hash space correctly (using C-... instead of (C-1)-...), thanks to Andreas Kipf for pointing this out.

July 20, 2019

Writing an x86 emulator from scratch in JavaScript: 2. system calls

Previously in emulator basics: <! forgive me, for I have sinned >
1. a stack and register machine

In this post we'll extend x86e to support the exit and write Linux system calls, or syscalls. A syscall is a function handled by the kernel that allows the process to interact with data outside of its memory. The SYSCALL instruction takes arguments in the same order that the regular CALL instruction does. But SYSCALL additionally requires the RAX register to contain the integer number of the syscall.

Historically, there have been a number of different ways to make syscalls. All methods perform variations on a software interrupt. Before AMD64, on x86 processors, there was the SYSENTER instruction. And before that there was only INT 80h to trigger the interrupt with the syscall handler (since interrupts can be used for more than just syscalls). The various instructions around interrupts have been added for efficiency as the processors and use by operating systems evolved.

Since this is a general need and AMD64 processors are among the most common today, you'll see similar code in every modern operating system such as FreeBSD, OpenBSD, NetBSD, macOS, and Linux. (I have no background in Windows.) The calling convention may differ (e.g. which arguments are in which registers) and the syscall numbers differ. Even within Linux both the calling convention and the syscall numbers differ between x86 (32-bit) and AMD64/x86_64 (64-bit) versions.

See this StackOverflow post for some more detail.

Code for this post in full is available as a Gist.

Exit

The exit syscall is how a child process communicates with the process that spawned it (its parent) when the child is finished running. Exit takes one argument, called the exit code or status code. It is an arbitrary signed 8-bit integer. If the high bit is set (i.e. the number is negative), this is interpreted to mean the process exited abnormally such as due to a segfault. Shells additionally interpret any non-zero exit code as a "failure". Otherwise, and ignoring these two common conventions, it can be used to mean anything the programmer wants.

The wait syscall is how the parent process can block until exit is called by the child and receive its exit code.

On AMD64 Linux the syscall number is 60. For example:

  MOV RDI, 0
  MOV RAX, 60
  SYSCALL

This calls exit with a status code of 0.

Write

The write syscall is how a process can send data to file descriptors, which are integers representing some file-like object. By default, a Linux process is given access to three file descriptors with consistent integer values: stdin is 0, stdout is 1, and stderr is 2. Write takes three arguments: the file descriptor integer to write to, a starting address to memory that is interpreted as a byte array, and the number of bytes to write to the file descriptor beginning at the start address.

On AMD64 Linux the syscall number is 1. For example:

  MOV RDI, 1   ; stdout
  MOV RSI, R12 ; address of string
  MOV RDX, 8   ; 8 bytes to write
  MOV RAX, 1   ; write
  SYSCALL

This writes 8 bytes to stdout starting from the string whose address is in R12.

Implementing syscalls

Our emulator is simplistic and is currently only implementing process emulation, not full CPU emulation. So the syscalls themselves will be handled in JavaScript. First we'll write out stubs for the two syscalls we are adding. And we'll provide a map from syscall id to the syscall.

const SYSCALLS_BY_ID = {
  1: function sys_write(process) {},
  60: function sys_exit(process) {},
};

We need to add an instruction handler to our instruction switch. In doing so we must convert the value in RAX from a BigInt to a regular Number so we can look it up in the syscall map.

      case 'syscall': {
        const idNumber = Number(process.registers.RAX);
        SYSCALLS_BY_ID[idNumber](process);
        process.registers.RIP++;
        break;
      }

Exit

Exit is really simple. It will be implemented by calling Node's global.process.exit(). Again we'll need to convert the register's BigInt value to a Number.

const SYSCALLS_BY_ID = {
  1: function sys_write(process) {},
  60: function sys_exit(process) {
    global.process.exit(Number(process.registers.RDI));
  },
};

Write

Write will be implemented by iterating over the process memory as bytes and by calling write() on the relevant file descriptor. We'll store a map of these on the process object and supply stdout, stderr, and stdin proxies on startup.

function main(file) {
  ...

  const process = {
    registers,
    memory,
    instructions,
    labels,
    fd: {
      // stdout
      1: global.process.stdout,
    }
  };

  ...
}

The base address is stored in RSI, the number of bytes to write are stored in RDX. And the file descriptor to write to is stored in RDI.

const SYSCALLS_BY_ID = {
  1: function sys_write(process) {
    const msg = BigInt(process.registers.RSI);
    const bytes = Number(process.registers.RDX);
    for (let i = 0; i < bytes; i++) {
      const byte = readMemoryBytes(process, msg + BigInt(i), 1);
      const char = String.fromCharCode(Number(byte));
      process.fd[Number(process.registers.RDI)].write(char);
    }
  },
  ...

All together

$ cat exit3.asm
main:
  MOV RDI, 1
  MOV RSI, 2
  ADD RDI, RSI

  MOV RAX, 60 ; exit
  SYSCALL
$ node emulator.js exit3.asm
$ echo $?
3
$ cat hello.asm
main:
  PUSH 10  ; \n
  PUSH 33  ; !
  PUSH 111 ; o
  PUSH 108 ; l
  PUSH 108 ; l
  PUSH 101 ; e
  PUSH 72  ; H

  MOV RDI, 1   ; stdout
  MOV RSI, RSP ; address of string
  MOV RDX, 56  ; 7 8-bit characters in the string but PUSH acts on 64-bit integers
  MOV RAX, 1   ; write
  SYSCALL

  MOV RDI, 0
  MOV RAX, 60
  SYSCALL
$ node emulator.js hello.asm
Hello!
$

Next steps

We still aren't setting flags appropriately to support conditionals, so that's low-hanging fruit. There are some other fun syscalls to implement that would also give us access to an emulated VGA card so we could render graphics. Syntactic support for string constants would also be convenient and more efficient.

June 28, 2019

Try out Tinybird's closed beta

Working with large amounts of data is challenging, but we believe it should not be complex. Everybody is waking up to data and its possibilities. We constant...

June 22, 2019

Writing a lisp compiler from scratch in JavaScript: 6. LLVM system calls

Previously in compiler basics: <! forgive me, for I have sinned >
1. lisp to assembly
2. user-defined functions and variables
3. LLVM
4. LLVM conditionals and compiling fibonacci
Next in compiler basics:
5. an x86 upgrade

In this post we'll extend the ulisp compiler's LLVM backend to support printing integers to stdout.

Exit code limitations

Until now we've validated program state by setting the exit code to the result of the program computation. But the exit code is an eight bit integer. What if we want to validate a computation that produces a result larger than 255?

To do this we need a way to print integers. This is challenging because printing normally deals with byte arrays. libc's printf, for example, takes a byte array as its first argument.

The shortest path forward is to add support for system calls so we can print one character at a time. Here's a version of a print form that hacks around not having arrays to send each integer in a number to stdout.

(def print-char (c)
     ; First argument is stdout
     ; Second argument is a pointer to a char array (of length one)
     ; Third argument is the length of the char array
     (syscall/sys_write 1 &c 1))

(def print (n)
     (if (> n 9)
         (print (/ n 10)))

     ; 48 is the ASCII code for '0'
     (print-char (+ 48 (% n 10))))

In order to support this we need to add the syscall/sys_write, >, %, and / builtin forms. We'll also need to add support for taking the address of a variable.

All code is available on Github as is the particular commit related to this post.

References

The sys_write syscall requires us to pass the memory address of the byte array to write. We don't support arrays, but we can treat an individual variable as an array of length one by passing the variable's address.

If we were compiling to C we could just pass the address of a local variable. But LLVM doesn't allow us to take the address of variables directly. We need to push the variable onto the LLVM stack to get an address.

Under the hood LLVM will likely optimize this into a local variable reference instead of first pushing to the stack.

Since LLVM IR is typed, the value representing the address of a local variable will be a pointer type. We'll need to refer to all uses of this value as a pointer. So we'll need to modify ulisp to track local types rather than hard-coding i64 everywhere.

Scope

To begin we'll modify the Scope class to track types. We only need to do this on registration. Afterward, we'll have to find all uses of local variables to make sure they use the local's value and type fields appropriately.

class Scope {

  ...

  register(local) {
    let copy = local.replace('-', '_');
    let n = 1;
    while (this.locals[copy]) {
      copy = local + n++;
    }

    this.locals[local] = {
      value: copy,
      type: 'i64',
    };
    return this.locals[local];
  }

  ...

}

We won't go through every use of a Scope variable in this post, but you can find it in the related commit to ulisp.

Reference

The long-term approach for handling a reference syntactically is probably to rewrite &x to (& x) in the parser. The lazy approach we'll take for now is to handle a reference as a special kind of identifier in compileExpression.

We'll use the LLVM alloca instruction to create space on the stack. This will return a pointer and will turn the destination variable into a pointer type. Then we'll use store to set the value at the address to the current value of the variable being referenced.

class Compiler {

  ...

  compileExpression(exp, destination, context) {

    ...

    // Is a reference, push onto the stack and return its address
    if (exp.startsWith('&')) {
      const symbol = exp.substring(1);
      const tmp = context.scope.symbol();
      this.compileExpression(symbol, tmp, context);
      this.emit(1, `%${destination.value} = alloca ${tmp.type}, align 4`);
      destination.type = tmp.type + '*';
      this.emit(1, `store ${tmp.type} %${tmp.value}, ${destination.type} %${destination.value}, align 4`);
      return;
    }

    ...

  }

  ...

}

And now we're set to take the address of any code.

System calls

LLVM IR provides no high-level means for making system calls. The only way is to use inline assembly. This syntax is based on GCC inline assembly and is confusing, with few explained examples, and unhelpful error messages.

Thankfully the assembly code needed for a syscall is only one line, one word: the syscall assembly instruction. We use inline assembly variable-to-register mapping functionality to line up all the parameters for the syscall. Here is an example:

%result = call i64 asm sideeffect "syscall", "=r,{rax},{rdi},{rsi},{rdx}" (i64 %raxArg, i64 %rdiArg, i64 %rsiArg, i64 %rdxArg)

This says to execute the inline assembly string, "syscall". The sideeffect flag means that this assembly should always be run even if the result isn't used. =r means the inline assembly returns a value, and the rest of the string is the list of registers that arguments should be mapped to. Finally we call the function with all the LLVM variables we want to be mapped.

Eventually we should also use the inline assembly syntax to list registers that are modified so that LLVM can know to save them before and after.

Code

We'll add a mapping for syscall/sys_write and a helper function for generating syscall code using the example above as a template. We'll suport 64-bit Darwin and Linux kernels.

const SYSCALL_TABLE = {
  darwin: {
    sys_write: 0x2000004,
    sys_exit: 0x2000001,
  },
  linux: {
    sys_write: 1,
    sys_exit: 60,
  },
}[process.platform];

class Compiler {
  constructor() {
    this.outBuffer = [];
    this.primitiveFunctions = {
      def: this.compileDefine.bind(this),
      begin: this.compileBegin.bind(this),
      'if': this.compileIf.bind(this),
      '+': this.compileOp('add'),
      '-': this.compileOp('sub'),
      '*': this.compileOp('mul'),
      '%': this.compileOp('urem'),
      '<': this.compileOp('icmp slt'),
      '=': this.compileOp('icmp eq'),
      'syscall/sys_write': this.compileSyscall(SYSCALL_TABLE.sys_write),
    };
  }

  ...

  compileSyscall(id) {
    return (args, destination, context) => {
      const argTmps = args.map((arg) => {
          const tmp = context.scope.symbol();
          this.compileExpression(arg, tmp, context);
          return tmp.type + ' %' + tmp.value;
      }).join(', ');
      const regs = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9'];
      const params = args.map((arg, i) => `{${regs[i]}}`).join(',');
      const idTmp = context.scope.symbol().value;
      this.emit(1, `%${idTmp} = add i64 ${id}, 0`)
      this.emit(1, `%${destination.value} = call ${destination.type} asm sideeffect "syscall", "=r,{rax},${params},~{dirflag},~{fpsr},~{flags}" (i64 %${idTmp}, ${argTmps})`);
    }
  }
}

>, /

Finally, we have a few new operations to add support for. But they'll be pretty simple using the compileOp helper function.

class Compiler {
  constructor() {
    this.outBuffer = [];
    this.primitiveFunctions = {
      def: this.compileDefine.bind(this),
      begin: this.compileBegin.bind(this),
      'if': this.compileIf.bind(this),
      '+': this.compileOp('add'),
      '-': this.compileOp('sub'),
      '*': this.compileOp('mul'),
      '/': this.compileOp('udiv'),
      '%': this.compileOp('urem'),
      '<': this.compileOp('icmp slt'),
      '>': this.compileOp('icmp sgt'),
      '=': this.compileOp('icmp eq'),
      'syscall/sys_write': this.compileSyscall(SYSCALL_TABLE.sys_write),
    };
  }

  ...

}

print

We're ready to give our print function a shot.

$ cat test.lisp
(def print-char (c)
     ; First argument is stdout
     ; Second argument is a pointer to a char array (of length one)
     ; Third argument is the length of the char array
     (syscall/sys_write 1 &c 1))

(def print (n)
     (if (> n 9)
         (print (/ n 10)))

     ; 48 is the ASCII code for '0'
     (print-char (+ 48 (% n 10))))

(def main ()
     (print 1234)
     0)
$ node ulisp.js test.lisp
$ ./build/a.out
1234

Looks good! In the next post we'll talk about tail call elimination.

June 17, 2019

The Benefits of Unsharded Vitess

For many large companies seeking help with horizontal scaling, Vitess' value proposition is easily understood; running stateful workloads at astronomical scale is a hard problem that Vitess has boldly solved in the past. However, for businesses that aren't hitting the performance limitations of standard MySQL, it may seem difficult to justify placing seemingly complex middleware in your data path with no immediate reward. I'm here to show you why unsharded Vitess is not just a pre-optimization for future horizontal scaling - it provides many upgrades to the MySQL experience.

June 04, 2019

Tinybird Changelog: Faster CSV Imports

During the last couple of weeks, we’ve made major improvements to our CSV import process, increasing its speed and reliability, getting almost a 2x performance.