database.news | database technology news aggregator

January 26, 2021

Tinybird Engineering Blog

Clickhouse Tips #1: Calculating Aggregations After a Given Date

Tips and recipes to learn how to make the most of ClickHouse, curated weekly by the Tinybird team.

Vitess Blog

Announcing Vitess 9

On behalf of the Vitess maintainers team, I am pleased to announce the general availability of Vitess 9. Major Themes # In this release, we have focused on making Vitess more stable after the successful release of Version 8. There have been no major issues reported. So there were no patches released for Version 8. This has allowed us to push further on compatibility and adoption of common frameworks as priorities.

Tinybird Engineering Blog

Clickhouse Tips #1: Calculating Aggregations After a Given Date

Tips and recipes to learn how to make the most of ClickHouse, curated weekly by the Tinybird team. Part 1.

by Javi Santana

January 23, 2021

Phil Eaton

Extending gosql to supporting LIMIT and OFFSET

It's been a few months since I picked up gosql and I wanted to use it to prototype a SQL interface for data stored in S3. But one missing critical feature in gosql is LIMIT and OFFSET support. This post walks through the few key changes to gosql to support LIMIT and OFFSET.

You can find this commit in full on Github.

This post builds on top of a series on building a SQL database from scratch in Golang. <! forgive me, for I have sinned >
1. SELECT, INSERT, CREATE and a REPL
2. binary expressions and WHERE filters
3. indexes
4. a database/sql driver

Lexing

The first step is to update the lexer to know about the LIMIT and OFFSET keywords. Since we already have a generalized method of lexing any keywords from an array (see lexer.go:lexKeyword), this is really easy. Just add a new Keyword:

@@ -37,6 +37,8 @@ const (
        OnKeyword         Keyword = "on"
        PrimarykeyKeyword Keyword = "primary key"
        NullKeyword       Keyword = "null"
+       LimitKeyword      Keyword = "limit"
+       OffsetKeyword     Keyword = "offset"
 )

And then add these two new enums to the list of Keywords to lex:

@@ -261,6 +263,8 @@ func lexKeyword(source string, ic cursor) (*Token, cursor, bool) {
                OnKeyword,
                PrimarykeyKeyword,
                NullKeyword,
+               LimitKeyword,
+               OffsetKeyword,
        }

        var options []string

That's it for the lexer.

Parsing

Before we can parse limit and offset into the AST, we have to modify our AST struct to support these two fields in ast.go:

@@ -54,9 +54,11 @@ type SelectItem struct {
 }

 type SelectStatement struct {
-       Item  *[]*SelectItem
-       From  *Token
-       Where *Expression
+       Item   *[]*SelectItem
+       From   *Token
+       Where  *Expression
+       Limit  *Expression
+       Offset *Expression
 }

And to be a good citizen, we'll fix up the GenerateCode helper function (for pretty-printing the AST) to show LIMIT and OFFSET.

@@ -73,17 +75,24 @@ func (ss SelectStatement) GenerateCode() string {
                item = append(item, s)
        }

-       from := ""
+       code := "SELECT\n" + strings.Join(item, ",\n")
        if ss.From != nil {
-               from = fmt.Sprintf("\nFROM\n\t\"%s\"", ss.From.Value)
+               code += fmt.Sprintf("\nFROM\n\t\"%s\"", ss.From.Value)
        }

-       where := ""
        if ss.Where != nil {
-               where = fmt.Sprintf("\nWHERE\n\t%s", ss.Where.GenerateCode())
+               code += "\nWHERE\n\t" + ss.Where.GenerateCode()
        }

-       return fmt.Sprintf("SELECT\n%s%s%s;", strings.Join(item, ",\n"), from, where)
+       if ss.Limit != nil {
+               code += "\nLIMIT\n\t" + ss.Limit.GenerateCode()
+       }
+
+       if ss.Offset != nil {
+               code += "\nOFFSET\n\t" + ss.Limit.GenerateCode()
+       }
+
+       return code + ";"
 }

 type ColumnDefinition struct {

That's it for modifying the AST itself. Now we can modify the select statement parser to look for these two new sections. It's pretty simple: for both LIMIT and OFFSET first check if they exist in the current statement and then try to parse the expression after them, in parser.go:

@@ -285,6 +288,30 @@ func (p Parser) parseSelectStatement(tokens []*Token, initialCursor uint, delimi
                cursor = newCursor
        }

+       _, cursor, ok = p.parseToken(tokens, cursor, limitToken)
+       if ok {
+               limit, newCursor, ok := p.parseExpression(tokens, cursor, []Token{offsetToken, delimiter}, 0)
+               if !ok {
+                       p.helpMessage(tokens, cursor, "Expected LIMIT value")
+                       return nil, initialCursor, false
+               }
+
+               slct.Limit = limit
+               cursor = newCursor
+       }
+
+       _, cursor, ok = p.parseToken(tokens, cursor, offsetToken)
+       if ok {
+               offset, newCursor, ok := p.parseExpression(tokens, cursor, []Token{delimiter}, 0)
+               if !ok {
+                       p.helpMessage(tokens, cursor, "Expected OFFSET value")
+                       return nil, initialCursor, false
+               }
+
+               slct.Offset = offset
+               cursor = newCursor
+       }
+
        return &slct, cursor, true
 }

And the last tricky bit is to make sure that previous optional parseExpression know that they can be delimited by OFFSET and LIMIT (this delimiter awareness is just how the parser works):

@@ -273,9 +273,12 @@ func (p Parser) parseSelectStatement(tokens []*Token, initialCursor uint, delimi
                cursor = newCursor
        }

+       limitToken := tokenFromKeyword(LimitKeyword)
+       offsetToken := tokenFromKeyword(OffsetKeyword)
+
        _, cursor, ok = p.parseToken(tokens, cursor, whereToken)
        if ok {
-               where, newCursor, ok := p.parseExpression(tokens, cursor, []Token{delimiter}, 0)
+               where, newCursor, ok := p.parseExpression(tokens, cursor, []Token{limitToken, offsetToken, delimiter}, 0)
                if !ok {
                        p.helpMessage(tokens, cursor, "Expected WHERE conditionals")
                        return nil, initialCursor, false

That's it for parsing!

Runtime

Gosql has just one storage backend currently: an in-memory store. To support LIMIT and OFFSET we need to evaluate both expressions if they exist. Then while we're iterating through table rows, after testing whether each row passes the WHERE filter, we'll check if the number of rows passing the WHERE filter falls within the range of OFFSET and LIMIT + OFFSET otherwise we'll skip the row, in memory.go:

@@ -587,6 +587,33 @@ func (mb *MemoryBackend) Select(slct *SelectStatement) (*Results, error) {
                }
        }

+       limit := len(t.rows)
+       if slct.Limit != nil {
+               v, _, _, err := t.evaluateCell(0, *slct.Limit)
+               if err != nil {
+                       return nil, err
+               }
+
+               limit = int(*v.AsInt())
+       }
+       if limit < 0 {
+               return nil, fmt.Errorf("Invalid, negative limit")
+       }
+
+       offset := 0
+       if slct.Offset != nil {
+               v, _, _, err := t.evaluateCell(0, *slct.Offset)
+               if err != nil {
+                       return nil, err
+               }
+
+               offset = int(*v.AsInt())
+       }
+       if offset < 0 {
+               return nil, fmt.Errorf("Invalid, negative limit")
+       }
+
+       rowIndex := -1
        for i := range t.rows {
                result := []Cell{}
                isFirstRow := len(results) == 0
@@ -602,6 +629,13 @@ func (mb *MemoryBackend) Select(slct *SelectStatement) (*Results, error) {
                        }
                }

+               rowIndex++
+               if rowIndex < offset {
+                       continue
+               } else if rowIndex > offset+limit-1 {
+                       break
+               }
+
                for _, col := range finalItems {
                        value, columnName, columnType, err := t.evaluateCell(uint(i), *col.Exp)
                        if err != nil {

Just to call out explicitly, with LIMIT and OFFSET we still have to check every single row in the table (at least until we've reached the offset). This should clearly illustrate why paginating based on LIMIT and OFFSET is not a great idea for big datasets compared to index-based pagination.

That's all!

Trying it out

$ go build cmd/main.go
$ ./main
Welcome to gosql.
# create table user (name text, age int);
ok
# insert into user values ('meg', 2);
ok
# insert into user values ('jerry', 2);
ok
# insert into user values ('phil', 1);
ok
# select * from user;
  name  | age
--------+------
  meg   |   2
  jerry |   2
  phil  |   1
(3 results)
ok
# select * from user limit 1;
  name | age
-------+------
  meg  |   2
(1 result)
ok
# select * from user where age=1 limit 1;
  name | age
-------+------
  phil |   1
(1 result)
ok
# select * from user where age=1 limit 4;
  name | age
-------+------
  phil |   1
(1 result)
ok
# select * from user where age=2 limit 1;
  name | age
-------+------
  meg  |   2
(1 result)
ok
# select * from user where age=2 limit 1 offset 1;
  name  | age
--------+------
  jerry |   2
(1 result)
ok

Not so hard to hack is it? Make sure to include some tests!

Working on a prototype SQL-based explorer for data stored in S3 and I needed OFFSET/LIMIT support in the gosql parser. Wrote up a short post on how you can hack in additional syntax and functionality into this SQL engine written in Go.https://t.co/PyVozTPZ5S
— Phil Eaton (@phil_eaton) January 24, 2021

January 22, 2021

Tinybird Engineering Blog

Product design: how our SaaS integrates with git

Analytics data projects are code and code should be in a repo.

January 19, 2021

Tinybird Engineering Blog

Querying large CSVs online with SQL

Use Tinybird to query large CSVs using SQL, right from your browser.

January 02, 2021

Supabase Blog

Supabase Beta December 2020

Ten months of building.

December 28, 2020

Hack MySQL

MySQL Password Rotation with AWS Secrets Manager and Lambda

How to set up MySQL password rotation with AWS Secrets Manager, AWS Lambda, and Terraform

by Daniel Nichter

December 27, 2020

Phil Eaton

The year in books: 20 to recommend in 2020

This year I finished 47 books, up from last year but not a personal best. The breakdown was 17 non-fiction and 30 fiction. Another 20-30 remain started but unfinished this year.

Non-fiction

The 8 non-fiction books I most recommend are:

Fashionapolis: The Price of Fast Fashion and the Future of Clothes (Must read)
Effective Python: 90 Specific Ways to Write Better Python (Must read; truly excellent for Python programmers, I recommend this to anyone I work with)
The Machine that Changed the World (Must read)
Europe: The Struggle for Supremacy from 1453 to the Present
Wind, Sand and Stars
American Colussus: The Triumph of Capitalism, 1865-1900
Making Common Sense of Japan
The German Genius

The 3 books I recommend you not to waste time on are: "The Two Koreas", "The Price of Inequality", and "Ninety Percent of Everything: Inside Shipping".

The whole list

The Two Koreas by Don Oberdorfer
- Interesting but not a huge fan, seemed pretty biased against South Korea somehow
Forbidden Nation: A History of Taiwan by Jonathan Manthorpe
Effective Python: 90 Specific Ways to Write Better Python by Brett Slatkin
A Philosophy of Software Design by John Ousterhout
- Came as a recommendation from someone on Twitter, ultimately not a huge fan. Still looking for high quality books on software design
The Price of Inequality by Joseph E. Stiglitz
- Agreed with the premise but the book was incoherent and too self-assuring
Paris Reborn: Napoléon III, Baron Haussmann, and the Quest to Build a Modern City by Stephane Kirkland
Fashionapolis: The Price of Fast Fashion and the Future of Clothes by Dana Thomas
A Moveable Feast by Ernest Hemingway
- I normally love Hemingway's writing but this particular book was not very coherent
Europe: The Struggle for Supremacy from 1453 to the Present by Brendan Simms
- Such an excellent introduction to the continent for Americans who otherwise don't have great background
Wind, Sand and Stars by Antoine de Saint-Exupéry
- A beautiful memoir of flights by the author of The Little Prince, very similar in style to Hemingway
American Colussus: The Triumph of Capitalism, 1865-1900 by H.W. Brands
- Baby's first primer on unions, (I need more recommendations on the history of unions)
Making Common Sense of Japan by Steven R. Reed
- It can be difficult to find English translations of Korean, Japanese history by Korean and Japanese authors; this is a good one by an American professor
The Machine that Changed the World by James P. Womack
- An excellent, well-researched history of automobile manufacturing in the US, Europe and Japan from the 1900s to 1990; how Japan ate everyone's lunch
The United States of Europe by T.R. Reid
- Very light introduction to the European Union
The Soul of a New Machine by Tracy Kidder
- Overhyped by the internets, but not bad
The German Genius by Peter Watson
- Dense but excellent introduction to many famous Germans in many fields throughout time
Ninety Percent of Everything: Inside Shipping by Rose George

Fiction

I'm trying to read more from non-English authors. If you see non-English authors in the vein of these here that you can recommend, I'd love to hear from you.

The 12 fiction books I most recommend are:

Planet of the Apes (Must read, yes even if you've seen the film)
All Quiet on the Western Front (Must read)
The Mouse That Roared (Must read)
The Dead Mountaineer's Inn
The Golem and the Jinni
Neverwhere
Dubliners
Old Man's War
The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion
Fantômas
Foundation
Out of the Silent Planet

The only book I really didn't like was "Invisible Cities".

The whole list

March Violets by Philip Kerr (Scottish)
Liberty Bar by Georges Simenon (Belgian)
The Late Monsieur Gallet by Georges Simenon (Belgian)
Dubliners by James Joyce (Irish)
Tales of the City by Amistead Maupin (American)
The Third Policeman by Flann O'Brien (Irish)
44 Scotland Street by Alexander McCall Smith (British-African)
Knots and Crosses by Ian Rankin (Scottish)
I Hear Your Voice by Kim Young Ha (South Korean)
The Golem and the Jinni by Helene Wecker (American)
The Tokyo Zodiac Murders by Shimada Sōji (Japanese)
The Screwtape Letters by C.S. Lewis (English)
Neverwhere by Neil Gaiman (English)
Old Man's War by John Scalzi (American)
Tales from Earthsea by Ursula K. Le Guin (American)
Solaris by Stanisław Lem (Polish)
A Wizard of Earthsea by Ursula K. Le Guin (American)
Planet of the Apes by Pierre Boulle (French)
The Dead Mountaineer's Inn by Arkady Strugatsky (Russian)
Invisible Cities by Italo Calvino (Cuban-born Italian)
The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion by Friedrich Dürrenmatt (Swiss)
Fantômas by Marcel Allain (French)
All Quiet on the Western Front by Erich Maria Remarque (Germany)
A Crime in Holland by Georges Simenon (Belgian)
The Wonderful Adventure of Nils Holversson by Selma Lagerlöf (Swedish)
Foundation by Isaac Asimov (Russian-born American)
Out of the Silent Planet by C.S. Lewis (English)
The Spy Who Came in from the Cold by John le Carré (English)
The Bat by Jo Nesbø (Norwegian)
The Mouse That Roared by Leonard Wibberley (Irish-born American)

Out of 47 books read this year, here's the 20 I recommend to you (gave them 4/5 stars or better). I'm trying to read more non-English authors so I'd love to hear if there are authors with similar style on this list you'd recommend!https://t.co/FjHcvHpRSr
— Phil Eaton (@phil_eaton) December 27, 2020

← 1 ... 74 75 76 77 78 79 80 81 82 83 ... 87 →