a curated list of database news from authoritative sources

June 01, 2020

May 31, 2020

May 26, 2020

orchestrator on DB AMA: show notes

Earlier today I presented orchestrator on DB AMA. Thank you to the organizers Morgan Tocker, Liz van Dijk and Frédéric Descamps for hosting me, and thank you to all who participated! This was a no-slides, all command-line walkthrough of some of orchestrator‘s capabilities, highlighting refactoring, topology analysis, takeovers and failovers, and discussing a bit of […]

Vitess: The Cross Cell Connection

This post explains about how VTGate handles cross-cell operations and how to configure CellAlias for cross-cell read operations. If you are new to Vitess, it is recommended to read this blog post to get more familiar with the various components and their configuration in Vitess. To understand CellAlias, first let's get familiar with what a cell means in Vitess. A cell is a group of servers and network infrastructure collocated in an area, and isolated from failures in other cells.

May 16, 2020

RFCs and asynchronous-first culture

I hated writing documentation before working on features. But after a while I realized I couldn't communicate well enough, even with folks I had a good connection with. It took me a number of mistaken deliveries to get the message.

Sketches and mockups

Designers solve this by producing low-fidelity sketches early on in the process, iterating on feedback to produce a high-fidelity mockup. I solve this by producing short RFC (request for comment) documents. This isn't an original idea, but I see it so rarely I wanted to share.

Now as soon as I begin thinking about a technical or organizational change, I write an RFC. My RFCs are typically a page or two long and typically take me 30-60 minutes for a good first draft. I make clear in the title that it is a proposal or draft. This allows me to make crazy suggestions without upsetting folks; a draft can be easily thrown away.

RFC process

My RFCs include three key sections:

  1. What I think the problem is
  2. Pros/cons of all the solutions I considered
  3. Which solution I'm planning to go with if no one responds to the RFC

After I write the first draft I circulate it among a small group of peers I respect, my boss, etc. I request feedback at leisure and I check in every few days with a reminder. If no one responds after a while and there is little concern, I typically move forward with the proposed solution.

In addition to clarifying intent up front, this removes the need to schedule a meeting to discuss a problem. Discussion and decisions can be held asynchronously. I only schedule a meeting if there is disagreement that is unable to be resolved in writing.

After incorporating feedback, I either throw away the RFC and move on or feel reasonably confident about the proposal. I send it out to a wider group of relevant participants. Final meetings are held as needed.

The other option

In contrast, synchronous-first and undocumented proposals make some sense when you've got a small team in the same timezone with a similar schedule. Otherwise, you repeatedly reschedule meetings to accommodate everyone. You spend your first few meetings simply coming to understand and agree on the problem.

Spending 30-60 minutes to draft a proposal is almost always easier. It makes the decision-making process faster and produces more accurate results.

May 11, 2020

orchestrator: what’s new in CI, testing & development

Recent focus on development & testing yielded with new orchestrator environments and offerings for developers and with increased reliability and trust. This post illustrates the new changes, and see Developers section on the official documentation for more details. Testing In the past four years orchestrator was developed at GitHub, and using GitHub’s environments for testing. […]

May 10, 2020

Writing a SQL database from scratch in Go: 4. a database/sql driver

Previously in database basics: <! forgive me, for I have sinned >
1. SELECT, INSERT, CREATE and a REPL
2. binary expressions and WHERE filters
3. indexes

In this post, we'll extend gosql to implement the database/sql driver interface. This will allow us to interact with gosql the same way we would interact with any other database.

Here is an example familiar program (stored in cmd/sqlexample/main.go) we'll be able to run:

package main

import (
    "database/sql"
    "fmt"

    _ "github.com/eatonphil/gosql"
)

func main() {
    db, err := sql.Open("postgres", "")
    if err != nil {
        panic(err)
    }
    defer db.Close()

    _, err = db.Query("CREATE TABLE users (name TEXT, age INT);")
    if err != nil {
        panic(err)
    }

    _, err = db.Query("INSERT INTO users VALUES ('Terry', 45);")
    if err != nil {
        panic(err)
    }

    _, err = db.Query("INSERT INTO users VALUES ('Anette', 57);")
    if err != nil {
        panic(err)
    }

    rows, err := db.Query("SELECT name, age FROM users;")
    if err != nil {
        panic(err)
    }

    var name string
    var age uint64
    defer rows.Close()
    for rows.Next() {
        err := rows.Scan(&name, &age)
        if err != nil {
            panic(err)
        }

        fmt.Printf("Name: %s, Age: %d\n", name, age)
    }

    if err = rows.Err(); err != nil {
        panic(err)
    }
}

Our gosql driver will use a single instance of the Backend for all connections.

Aside from that, it is a simple matter of wrapping our existing APIs in structs that implement the database/sql/driver.Driver interface.

This post is largely a discussion of this commit.

Implementing the driver

A driver is registered by calling sql.Register with a driver instance.

We'll add the registration code to an init function in a new file, driver.go:

struct Driver {
    bkd Backend
}

func init() {
    sql.Register("postgres", &Driver{NewMemoryBackend()})
}

According to the Driver interface, we need only implement Open to return an connection instance that implements the database/sql/driver.Conn interface.

type Driver struct {
    bkd Backend
}

func (d *Driver) Open(name string) (driver.Conn, error) {
    return &Conn{d.bkd}, nil
}

func init() {
    sql.Register("postgres", &Driver{NewMemoryBackend()})
}

Implementing the connection

According to the Conn interface, we must implement:

  • Prepare(query string) (driver.Stmt, error) to handle prepared statements
  • Close to handle cleanup
  • and Begin to start a transaction

The connection can also optionally implement Query and Exec.

To simplify things we'll panic on Prepare and on Begin (we don't have transactions yet). There's no cleanup required so we'll do nothing in Close.

type Conn struct {
    bkd Backend
}

func (dc *Conn) Prepare(query string) (driver.Stmt, error) {
    panic("Prepare not implemented")
}

func (dc *Conn) Begin() (driver.Tx, error) {
    panic("Begin not implemented")
}

func (dc *Conn) Close() error {
    return nil
}

The only method we actually need, Query, is not required by the interface. It takes a query string and array of query parameters, returning an instance implementing the database/sql/driver.Rows interface.

To implement Query, we basically copy the logic we had in the cmd/main.go REPL. The only change is that when we return results when handling SELECT, we'll return a struct that implements the database/sql/driver.Rows interface.

database/sql/driver.Rows is not the same type as database/sql.Rows, which may sound more familiar. database/sql/driver.Rows is a simpler, lower-level interface.

If we receive parameterized query arguments, we'll ignore them for now. And if the query involves multiple statements, we'll process only the first statement.

func (dc *Conn) Query(query string, args []driver.Value) (driver.Rows, error) {
    if len(args) > 0 {
        // TODO: support parameterization
        panic("Parameterization not supported")
    }

    parser := Parser{}
    ast, err := parser.Parse(query)
    if err != nil {
        return nil, fmt.Errorf("Error while parsing: %s", err)
    }

    // NOTE: ignorning all but the first statement
    stmt := ast.Statements[0]
    switch stmt.Kind {
    case CreateIndexKind:
        err = dc.bkd.CreateIndex(stmt.CreateIndexStatement)
        if err != nil {
            return nil, fmt.Errorf("Error adding index on table: %s", err)
        }
    case CreateTableKind:
        err = dc.bkd.CreateTable(stmt.CreateTableStatement)
        if err != nil {
            return nil, fmt.Errorf("Error creating table: %s", err)
        }
    case DropTableKind:
        err = dc.bkd.DropTable(stmt.DropTableStatement)
        if err != nil {
            return nil, fmt.Errorf("Error dropping table: %s", err)
        }
    case InsertKind:
        err = dc.bkd.Insert(stmt.InsertStatement)
        if err != nil {
            return nil, fmt.Errorf("Error inserting values: %s", err)
        }
    case SelectKind:
        results, err := dc.bkd.Select(stmt.SelectStatement)
        if err != nil {
            return nil, err
        }

        return &Rows{
            rows:    results.Rows,
            columns: results.Columns,
            index:   0,
        }, nil
    }

    return nil, nil
}

Implementing results

According to the Rows interface we must implement:

  • Columns() []string to return an array of columns names
  • Next(dest []Value) error to populate an row array with the next row's worth of cells
  • and Close() error

Our Rows struct will contain the rows and colums as returned from Backend, and will also contain an index field we can use in Next to populate the next row of cells.

type Rows struct {
    columns []ResultColumn
    index   uint64
    rows    [][]Cell
}

func (r *Rows) Columns() []string {}

func (r *Rows) Close() error {}

func (r *Rows) Next(dest []driver.Value) error {}

For Columns we simply need to extract and return the column names from ResultColumn.

func (r *Rows) Columns() []string {
    columns := []string{}
    for _, c := range r.columns {
        columns = append(columns, c.Name)
    }

    return columns
}

For Next we need to iterate over each cell in the current row and retrieve its Go value, storing it in dest. The dest argument is simply a fixed-length array of interface{}, so we'll need no manual conversion.

Once we've reached the last row, the Next contract is to return an io.EOF.

func (r *Rows) Next(dest []driver.Value) error {
    if r.index >= uint64(len(r.rows)) {
        return io.EOF
    }

    row := r.rows[r.index]

    for idx, cell := range row {
        typ := r.columns[idx].Type
        switch typ {
        case IntType:
            i := cell.AsInt()
            if i == nil {
                dest[idx] = i
            } else {
                dest[idx] = *i
            }
        case TextType:
            s := cell.AsText()
            if s == nil {
                dest[idx] = s
            } else {
                dest[idx] = *s
            }
        case BoolType:
            b := cell.AsBool()
            if b == nil {
                dest[idx] = b
            } else {
                dest[idx] = b
            }
        }
    }

    r.index++
    return nil
}

Finally in Close we'll set index higher than the number of rows to force Next to only ever return io.EOF.

func (r *Rows) Close() error {
    r.index = uint64(len(r.rows))
    return nil
}

And that's all the changes needed to implement a database/sql driver! See here for driver.go in full.

Running the example

With the driver in place we can try out the example:

$ go build ./cmd/sqlexample/main.go
$ ./main
Name: Terry, Age: 45
Name: Anette, Age: 57

May 07, 2020

A new dashboard for Tinybird Analytics

A key part of running an effective Analytics Platform within an organization is being able to keep a tight control over usage and performance, ingestion jobs...