January 26, 2021
Announcing Vitess 9
Clickhouse Tips #1: Calculating Aggregations After a Given Date
January 23, 2021
Extending gosql to supporting LIMIT and OFFSET
It's been a few months since I picked up
gosql and I wanted to use it to
prototype a SQL interface for data stored in S3. But one missing
critical feature in gosql is LIMIT and OFFSET support. This post walks
through the few key changes to gosql to support LIMIT and OFFSET.
You can find this commit in full on Github.
This post builds on top of a series on building a SQL database from scratch in Golang.
<! forgive me, for I have sinned >
1. SELECT, INSERT, CREATE and a REPL
2. binary expressions and WHERE filters
3. indexes
4. a database/sql driver
Lexing
The first step is to update the lexer to know about the
LIMIT and OFFSET keywords. Since we already
have a generalized method of lexing any keywords from an array (see
lexer.go:lexKeyword), this is really easy. Just add a new
Keyword:
@@ -37,6 +37,8 @@ const (
OnKeyword Keyword = "on"
PrimarykeyKeyword Keyword = "primary key"
NullKeyword Keyword = "null"
+ LimitKeyword Keyword = "limit"
+ OffsetKeyword Keyword = "offset"
)
And then add these two new enums to the list of Keywords
to lex:
@@ -261,6 +263,8 @@ func lexKeyword(source string, ic cursor) (*Token, cursor, bool) {
OnKeyword,
PrimarykeyKeyword,
NullKeyword,
+ LimitKeyword,
+ OffsetKeyword,
}
var options []string
That's it for the lexer.
Parsing
Before we can parse limit and offset into the AST, we have to modify our AST struct to support these two fields in ast.go:
@@ -54,9 +54,11 @@ type SelectItem struct {
}
type SelectStatement struct {
- Item *[]*SelectItem
- From *Token
- Where *Expression
+ Item *[]*SelectItem
+ From *Token
+ Where *Expression
+ Limit *Expression
+ Offset *Expression
}
And to be a good citizen, we'll fix up the GenerateCode
helper function (for pretty-printing the AST) to
show LIMIT and OFFSET.
@@ -73,17 +75,24 @@ func (ss SelectStatement) GenerateCode() string {
item = append(item, s)
}
- from := ""
+ code := "SELECT\n" + strings.Join(item, ",\n")
if ss.From != nil {
- from = fmt.Sprintf("\nFROM\n\t\"%s\"", ss.From.Value)
+ code += fmt.Sprintf("\nFROM\n\t\"%s\"", ss.From.Value)
}
- where := ""
if ss.Where != nil {
- where = fmt.Sprintf("\nWHERE\n\t%s", ss.Where.GenerateCode())
+ code += "\nWHERE\n\t" + ss.Where.GenerateCode()
}
- return fmt.Sprintf("SELECT\n%s%s%s;", strings.Join(item, ",\n"), from, where)
+ if ss.Limit != nil {
+ code += "\nLIMIT\n\t" + ss.Limit.GenerateCode()
+ }
+
+ if ss.Offset != nil {
+ code += "\nOFFSET\n\t" + ss.Limit.GenerateCode()
+ }
+
+ return code + ";"
}
type ColumnDefinition struct {
That's it for modifying the AST itself. Now we can modify the select
statement parser to look for these two new sections. It's pretty
simple: for both LIMIT and OFFSET first
check if they exist in the current statement and then try to parse the
expression after them, in parser.go:
@@ -285,6 +288,30 @@ func (p Parser) parseSelectStatement(tokens []*Token, initialCursor uint, delimi
cursor = newCursor
}
+ _, cursor, ok = p.parseToken(tokens, cursor, limitToken)
+ if ok {
+ limit, newCursor, ok := p.parseExpression(tokens, cursor, []Token{offsetToken, delimiter}, 0)
+ if !ok {
+ p.helpMessage(tokens, cursor, "Expected LIMIT value")
+ return nil, initialCursor, false
+ }
+
+ slct.Limit = limit
+ cursor = newCursor
+ }
+
+ _, cursor, ok = p.parseToken(tokens, cursor, offsetToken)
+ if ok {
+ offset, newCursor, ok := p.parseExpression(tokens, cursor, []Token{delimiter}, 0)
+ if !ok {
+ p.helpMessage(tokens, cursor, "Expected OFFSET value")
+ return nil, initialCursor, false
+ }
+
+ slct.Offset = offset
+ cursor = newCursor
+ }
+
return &slct, cursor, true
}
And the last tricky bit is to make sure that previous
optional parseExpression know that they can be delimited
by OFFSET and LIMIT (this delimiter
awareness is just how the parser works):
@@ -273,9 +273,12 @@ func (p Parser) parseSelectStatement(tokens []*Token, initialCursor uint, delimi
cursor = newCursor
}
+ limitToken := tokenFromKeyword(LimitKeyword)
+ offsetToken := tokenFromKeyword(OffsetKeyword)
+
_, cursor, ok = p.parseToken(tokens, cursor, whereToken)
if ok {
- where, newCursor, ok := p.parseExpression(tokens, cursor, []Token{delimiter}, 0)
+ where, newCursor, ok := p.parseExpression(tokens, cursor, []Token{limitToken, offsetToken, delimiter}, 0)
if !ok {
p.helpMessage(tokens, cursor, "Expected WHERE conditionals")
return nil, initialCursor, false
That's it for parsing!
Runtime
Gosql has just one storage backend currently: an in-memory store. To
support LIMIT and OFFSET we need to evaluate
both expressions if they exist. Then while we're iterating through
table rows, after testing whether each row passes
the WHERE filter, we'll check if the number of rows
passing the WHERE filter falls within the range
of OFFSET and LIMIT + OFFSET otherwise we'll
skip the row, in memory.go:
@@ -587,6 +587,33 @@ func (mb *MemoryBackend) Select(slct *SelectStatement) (*Results, error) {
}
}
+ limit := len(t.rows)
+ if slct.Limit != nil {
+ v, _, _, err := t.evaluateCell(0, *slct.Limit)
+ if err != nil {
+ return nil, err
+ }
+
+ limit = int(*v.AsInt())
+ }
+ if limit < 0 {
+ return nil, fmt.Errorf("Invalid, negative limit")
+ }
+
+ offset := 0
+ if slct.Offset != nil {
+ v, _, _, err := t.evaluateCell(0, *slct.Offset)
+ if err != nil {
+ return nil, err
+ }
+
+ offset = int(*v.AsInt())
+ }
+ if offset < 0 {
+ return nil, fmt.Errorf("Invalid, negative limit")
+ }
+
+ rowIndex := -1
for i := range t.rows {
result := []Cell{}
isFirstRow := len(results) == 0
@@ -602,6 +629,13 @@ func (mb *MemoryBackend) Select(slct *SelectStatement) (*Results, error) {
}
}
+ rowIndex++
+ if rowIndex < offset {
+ continue
+ } else if rowIndex > offset+limit-1 {
+ break
+ }
+
for _, col := range finalItems {
value, columnName, columnType, err := t.evaluateCell(uint(i), *col.Exp)
if err != nil {
Just to call out explicitly, with LIMIT
and OFFSET we still have to check every single row in
the table (at least until we've reached the offset). This should
clearly illustrate why paginating based on LIMIT
and OFFSET is not a great idea for big datasets
compared
to index-based pagination.
That's all!
Trying it out
$ go build cmd/main.go
$ ./main
Welcome to gosql.
# create table user (name text, age int);
ok
# insert into user values ('meg', 2);
ok
# insert into user values ('jerry', 2);
ok
# insert into user values ('phil', 1);
ok
# select * from user;
name | age
--------+------
meg | 2
jerry | 2
phil | 1
(3 results)
ok
# select * from user limit 1;
name | age
-------+------
meg | 2
(1 result)
ok
# select * from user where age=1 limit 1;
name | age
-------+------
phil | 1
(1 result)
ok
# select * from user where age=1 limit 4;
name | age
-------+------
phil | 1
(1 result)
ok
# select * from user where age=2 limit 1;
name | age
-------+------
meg | 2
(1 result)
ok
# select * from user where age=2 limit 1 offset 1;
name | age
--------+------
jerry | 2
(1 result)
ok
Not so hard to hack is it? Make sure to include some tests!
Working on a prototype SQL-based explorer for data stored in S3 and I needed OFFSET/LIMIT support in the gosql parser. Wrote up a short post on how you can hack in additional syntax and functionality into this SQL engine written in Go.https://t.co/PyVozTPZ5S
— Phil Eaton (@phil_eaton) January 24, 2021
January 22, 2021
Product design: how our SaaS integrates with git
January 19, 2021
Querying large CSVs online with SQL
January 02, 2021
Supabase Beta December 2020
December 28, 2020
MySQL Password Rotation with AWS Secrets Manager and Lambda
December 27, 2020
The year in books: 20 to recommend in 2020
This year I finished 47 books, up from last year but not a personal best. The breakdown was 17 non-fiction and 30 fiction. Another 20-30 remain started but unfinished this year.
Non-fiction
The 8 non-fiction books I most recommend are:
- Fashionapolis: The Price of Fast Fashion and the Future of Clothes (Must read)
- Effective Python: 90 Specific Ways to Write Better Python (Must read; truly excellent for Python programmers, I recommend this to anyone I work with)
- The Machine that Changed the World (Must read)
- Europe: The Struggle for Supremacy from 1453 to the Present
- Wind, Sand and Stars
- American Colussus: The Triumph of Capitalism, 1865-1900
- Making Common Sense of Japan
- The German Genius
The 3 books I recommend you not to waste time on are: "The Two Koreas", "The Price of Inequality", and "Ninety Percent of Everything: Inside Shipping".
The whole list
- The Two Koreas by Don Oberdorfer
- Interesting but not a huge fan, seemed pretty biased against South Korea somehow
- Forbidden Nation: A History of Taiwan by Jonathan Manthorpe
- Effective Python: 90 Specific Ways to Write Better Python by Brett Slatkin
- A Philosophy of Software Design by John Ousterhout
- Came as a recommendation from someone on Twitter, ultimately not a huge fan. Still looking for high quality books on software design
- The Price of Inequality by Joseph E. Stiglitz
- Agreed with the premise but the book was incoherent and too self-assuring
- Paris Reborn: Napoléon III, Baron Haussmann, and the Quest to Build a Modern City by Stephane Kirkland
- Fashionapolis: The Price of Fast Fashion and the Future of Clothes by Dana Thomas
- A Moveable Feast by Ernest Hemingway
- I normally love Hemingway's writing but this particular book was not very coherent
- Europe: The Struggle for Supremacy from 1453 to the Present by Brendan Simms
- Such an excellent introduction to the continent for Americans who otherwise don't have great background
- Wind, Sand and Stars by Antoine de Saint-Exupéry
- A beautiful memoir of flights by the author of The Little Prince, very similar in style to Hemingway
- American Colussus: The Triumph of Capitalism, 1865-1900 by H.W. Brands
- Baby's first primer on unions, (I need more recommendations on the history of unions)
- Making Common Sense of Japan by Steven R. Reed
- It can be difficult to find English translations of Korean, Japanese history by Korean and Japanese authors; this is a good one by an American professor
- The Machine that Changed the World by James P. Womack
- An excellent, well-researched history of automobile manufacturing in the US, Europe and Japan from the 1900s to 1990; how Japan ate everyone's lunch
- The United States of Europe by T.R. Reid
- Very light introduction to the European Union
- The Soul of a New Machine by Tracy Kidder
- Overhyped by the internets, but not bad
- The German Genius by Peter Watson
- Dense but excellent introduction to many famous Germans in many fields throughout time
- Ninety Percent of Everything: Inside Shipping by Rose George
Fiction
I'm trying to read more from non-English authors. If you see non-English authors in the vein of these here that you can recommend, I'd love to hear from you.
The 12 fiction books I most recommend are:
- Planet of the Apes (Must read, yes even if you've seen the film)
- All Quiet on the Western Front (Must read)
- The Mouse That Roared (Must read)
- The Dead Mountaineer's Inn
- The Golem and the Jinni
- Neverwhere
- Dubliners
- Old Man's War
- The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion
- Fantômas
- Foundation
- Out of the Silent Planet
The only book I really didn't like was "Invisible Cities".
The whole list
- March Violets by Philip Kerr (Scottish)
- Liberty Bar by Georges Simenon (Belgian)
- The Late Monsieur Gallet by Georges Simenon (Belgian)
- Dubliners by James Joyce (Irish)
- Tales of the City by Amistead Maupin (American)
- The Third Policeman by Flann O'Brien (Irish)
- 44 Scotland Street by Alexander McCall Smith (British-African)
- Knots and Crosses by Ian Rankin (Scottish)
- I Hear Your Voice by Kim Young Ha (South Korean)
- The Golem and the Jinni by Helene Wecker (American)
- The Tokyo Zodiac Murders by Shimada Sōji (Japanese)
- The Screwtape Letters by C.S. Lewis (English)
- Neverwhere by Neil Gaiman (English)
- Old Man's War by John Scalzi (American)
- Tales from Earthsea by Ursula K. Le Guin (American)
- Solaris by Stanisław Lem (Polish)
- A Wizard of Earthsea by Ursula K. Le Guin (American)
- Planet of the Apes by Pierre Boulle (French)
- The Dead Mountaineer's Inn by Arkady Strugatsky (Russian)
- Invisible Cities by Italo Calvino (Cuban-born Italian)
- The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion by Friedrich Dürrenmatt (Swiss)
- Fantômas by Marcel Allain (French)
- All Quiet on the Western Front by Erich Maria Remarque (Germany)
- A Crime in Holland by Georges Simenon (Belgian)
- The Wonderful Adventure of Nils Holversson by Selma Lagerlöf (Swedish)
- Foundation by Isaac Asimov (Russian-born American)
- Out of the Silent Planet by C.S. Lewis (English)
- The Spy Who Came in from the Cold by John le Carré (English)
- The Bat by Jo Nesbø (Norwegian)
- The Mouse That Roared by Leonard Wibberley (Irish-born American)
Out of 47 books read this year, here's the 20 I recommend to you (gave them 4/5 stars or better). I'm trying to read more non-English authors so I'd love to hear if there are authors with similar style on this list you'd recommend!https://t.co/FjHcvHpRSr
— Phil Eaton (@phil_eaton) December 27, 2020