a curated list of database news from authoritative sources

December 08, 2023

December 05, 2023

Why is Jepsen Written in Clojure?

People keep asking why Jepsen is written in Clojure, so I figure it’s worth having a referencable answer. I’ve programmed in something like twenty languages. Why choose a Weird Lisp?

Jepsen is built for testing concurrent systems–mostly databases. Because it tests concurrent systems, the language itself needs good support for concurrency. Clojure’s immutable, persistent data structures make it easier to write correct concurrent programs, and the language and runtime have excellent concurrency support: real threads, promises, futures, atoms, locks, queues, cyclic barriers, all of java.util.concurrent, etc. I also considered languages (like Haskell) with more rigorous control over side effects, but decided that Clojure’s less-dogmatic approach was preferable.

Because Jepsen tests databases, it needs broad client support. Almost every database has a JVM client, typically written in Java, and Clojure has decent Java interop.

Because testing is experimental work, I needed a language which was concise, adaptable, and well-suited to prototyping. Clojure is terse, and its syntactic flexibility–in particular, its macro system–work well for that. In particular the threading macros make chained transformations readable, and macros enable re-usable error handling and easy control of resource scopes. The Clojure REPL is really handy for exploring the data a test run produces.

Tests involve representing, transforming, and inspecting complex, nested data structures. Clojure’s data structures and standard library functions are possibly the best I’ve ever seen. I also print a lot of structures to the console and files: Clojure’s data syntax (EDN) is fantastic for this.

Because tests involve manipulating a decent, but not huge, chunk of data, I needed a language with “good enough” performance. Clojure’s certainly not the fastest language out there, but idiomatic Clojure is usually within an order of magnitude or two of Java, and I can shave off the difference where critical. The JVM has excellent profiling tools, and these work well with Clojure.

Jepsen’s (gosh) about a decade old now: I wanted a language with a mature core and emphasis on stability. Clojure is remarkably stable, both in terms of JVM target and the language itself. Libraries don’t “rot” anywhere near as quickly as in Scala or Ruby.

Clojure does have significant drawbacks. It has a small engineering community and no (broadly-accepted, successful) static typing system. Both of these would constrain a large team, but Jepsen’s maintained and used by only 1-3 people at a time. Working with JVM primitives can be frustrating without dropping to Java; I do this on occasion. Some aspects of the polymorphism system are lacking, but these can be worked around with libraries. The error messages are terrible. I have no apologetics for this. ;-)

I prototyped Jepsen in a few different languages before settling on Clojure. A decade in, I think it was a pretty good tradeoff.

December 01, 2023

What is HTAP?

Learn what HTAP is, how HTAP compares to OLAP and OLTP, and some pros and cons of HTAP.

Automatic CLI login

Explore the technical implementation and security measures behind CLI's new automatic login feature.

November 28, 2023

The Infamous ORDER BY LIMIT Query Optimizer Bug

Which is faster: LIMIT 1 or LIMIT 20? Presumably, fetching less rows is faster than fetching more rows. But for 16 years (since 2007) the MySQL query optimizer has had a “bug”† that not only makes LIMIT 1 slower than LIMIT 20 but can also make the former a table scan, which tends to cause problems. This happened last week where I work, and although MySQL DBAs are familiar with this bug, I’m writing this blog post for developers to more clearly illustrate and explain what’s going on and why because it’s really counterintuitive.

Introducing Insights Anomalies

This new update to PlanetScale Insights introduces smart query monitoring to detect slower than expected queries in your database.

Tinybird expands self-service real-time analytics to AWS

Starting today, our self-service customers can select Amazon Web Services as their deployment environment when creating a new Tinybird Workspace. Tinybird is already trusted by AWS customers, including FanDuel, Factorial, and Canva.

November 21, 2023

Are Vector Databases Dead?

This year vector databases have sprung up like mushrooms to enable applications to retrieve context based on semantic search. A large portion of these applications have used the retrieved context to augment the ability of large language models (LLMs) in a pattern known as RAG. On November 7th OpenAI released its Assistants API, enabling the implementation of AI chat interfaces with context retrieval without needing a separate message store or vector database. Does this new API make vector databases obsolete?

Build AI Chat with Convex Vector Search

Convex is a full-stack development platform and cloud database, including built-in vector search. In this third post in our [series](https://stack.convex.dev/ai-chat-using-openai-assistants-api), we’ll build an AI-powered chat interface using Convex, with our own message storage and context retrieval.

Build AI Chat with OpenAI's Assistants API

On November 7th OpenAI released its Assistants API, enabling chat bot with context retrieval implementations without needing a messages or vector database. In this post, we’ll cover how to leverage this API to build a fully functioning AI chat interface.

Webhook security: a hands-on guide

Learn what went into building PlanetScale webhooks from a security perspective. This article covers SSRF, webhook validation, DDoS, and more.

November 20, 2023

November 19, 2023

Exploring a Postgres query plan

I learned this week that you can intercept and redirect Postgres query execution. You can hook into the execution layer so you're given a query plan and you get to decide what to do with it. What rows to return, if any, and where they come from.

That's very interesting. So I started writing code to explore execution hooks. However, I got stuck interpreting the query plan. Either there's no query plan walking infrastructure or I just didn't find it.

So this post is a digression into walking a Postgres query plan. By the end we'll be able to run psql -c 'SELECT a FROM x WHERE a > 1' and reconstruct the entire SQL string from a Postgres QueryDesc object, the query plan object Postgres builds.

With that query plan walking infrastructure in place, we'll be in a good state to not just print out the query plan while walking it but instead to translate the query plan or evaluate it in our own way (e.g. over column-wise data, or vectorized execution over row-wise data).

Code for this project is available on Github.

What is a query plan?

If you're familiar with parsers and compilers, a query plan is like an intermediate representation (IR) of a program. It is not as raw as an abstract syntax tree (AST); it has already been optimized.

If that doesn't mean anything to you, think of a query plan as a structured and optimized version of the SQL query you submit to your database. It isn't text anymore. It is probably a tree.

Check out another Justin Jaffray article on the subject for more detail.

Development environment

Before we get to walking the query plan, let's set up the infrastructure to intercept query execution where we can eventually add in our print debugging of the query plan reconstructed as a SQL string.

Once you've got Postgres build dependencies, build and install a debug version of Postgres:

$ git clone https://github.com/postgres/postgres && cd postgres
$ # Make sure you're on the same commit I'm on, just to be safe.
$ git checkout b218fbb7a35fcf31539bfad12732038fe082a2eb
$ ./configure --enable-cassert --enable-debug CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"
$ make -j8
$ # Installs to to /usr/local/pgsql/bin.
$ sudo make install

I'm not going to cover Postgres extension infrastructure in detail. I wrote a bit about it in my last post. You need only read the first half, if at all; not the actual Table Access Method implementation.

It will be even simpler in this post because Postgres hooks are extensions but not extensions you install with CREATE EXTENSION. If you want to read about the different kinds of Postgres extensions, check out this article by Steven Miller.

The minimum we need, aside from the hook code itself, is a Makefile that uses PGXS:

MODULES = pgexec

PG_CONFIG = /usr/local/pgsql/bin/pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

The MODULES value there corresponds to the C file we'll create shortly, pgexec.c.

This pg_config binary path is important because you might have different versions of Postgres installed, for example by your package manager. It is important that the extension is built against the same version of Postgres which will load the extension.

Now we're ready for some hook code.

Intercepting query execution

You can find the basic structure of a hook (and which hooks are available) in Tamika Nomara's unofficial Postgres hooks docs.

There is no official central place describing all hooks I could find in Postgres docs. Some hooks are described in various places throughout the docs though.

Based on that page, we can write a bare minimum hook that will intercept queries, log when we've done so, and pass control back to the standard execution path for the actual query. In pgexec.c:

#include "postgres.h"
#include "fmgr.h"
#include "executor/executor.h"

PG_MODULE_MAGIC;

static ExecutorRun_hook_type prev_executor_run_hook = NULL;

static void print_plan(QueryDesc* queryDesc) {
  elog(LOG, "[pgexec] HOOKED SUCCESSFULLY!");
}

static void pgexec_run_hook(
  QueryDesc* queryDesc,
  ScanDirection direction,
  uint64 count,
  bool execute_once
) {
  print_plan(queryDesc);
  return prev_executor_run_hook(queryDesc, direction, count, execute_once);
}

void _PG_init(void) {
  prev_executor_run_hook = ExecutorRun_hook;
  if (prev_executor_run_hook == NULL) {
    prev_executor_run_hook = standard_ExecutorRun;
  }
  ExecutorRun_hook = pgexec_run_hook;
}

void _PG_fini(void) {
  ExecutorRun_hook = prev_executor_run_hook;
}

You can discover the standard_ExectutorRun function from a quick git grep ExecutorRun_hook in the Postgres source which leads to src/backend/executor/execMain.c#L306:

void
ExecutorRun(QueryDesc *queryDesc,
            ScanDirection direction, uint64 count,
            bool execute_once)
{
    if (ExecutorRun_hook)
        (*ExecutorRun_hook) (queryDesc, direction, count, execute_once);
    else
        standard_ExecutorRun(queryDesc, direction, count, execute_once);
}

So our hook will just log and pass back execution to the existing execution hook. Let's build and install the extension.

$ make
$ sudo make install

Now we'll create a new database and tell it to load the extension.

$ /usr/local/pgsql/bin/initdb test-db
$ echo "shared_preload_libraries = 'pgexec'" > test-db/postgresql.conf

Remember, hooks are not CREATE EXTENSION extensions. As far as I can tell they can't be dynamically loaded (without some additional dynamic loading infrastructure one could potentially write). So every time you make a change you need to rebuild the extension, reinstall it, and restart the Postgres server.

And start the server in the foreground:

$ /usr/local/pgsql/bin/postgres \
  --config-file=$(pwd)/test-db/postgresql.conf \
  -D $(pwd)/test-db
  -k $(pwd)/test-db
2023-11-18 19:35:16.680 GMT [3215547] LOG:  starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), 64-bit
2023-11-18 19:35:16.681 GMT [3215547] LOG:  listening on IPv6 address "::1", port 5432
2023-11-18 19:35:16.681 GMT [3215547] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-11-18 19:35:16.681 GMT [3215547] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-11-18 19:35:16.682 GMT [3215550] LOG:  database system was shut down at 2023-11-18 19:20:16 GMT
2023-11-18 19:35:16.684 GMT [3215547] LOG:  database system is ready to accept connections

Keep an eye on this foreground process since this is where elog(LOG, ...) calls will show up.

Now in a new window, create a test.sql script that we can use to exercise the hook:

DROP TABLE IF EXISTS x;
CREATE TABLE x (a INT);
INSERT INTO x VALUES (309);
SELECT a FROM x WHERE a > 1;

Run psql so we can trigger the hook:

$ /usr/local/pgsql/bin/psql -h localhost postgres -f test.sql
DROP TABLE
CREATE TABLE
INSERT 0 1
  a
-----
 309
(1 row)

And in the postgres foreground process you should see a log:

2023-11-19 17:42:03.045 GMT [3242321] LOG:  [pgexec] HOOKED SUCCESSFULLY!
2023-11-19 17:42:03.045 GMT [3242321] STATEMENT:  INSERT INTO x VALUES (309);
2023-11-19 17:42:03.045 GMT [3242321] LOG:  [pgexec] HOOKED SUCCESSFULLY!
2023-11-19 17:42:03.045 GMT [3242321] STATEMENT:  SELECT a FROM x WHERE a > 1;

That's our hook! Interestingly only the INSERT and SELECT statements show up, not the DROP and CREATE.

Now let's see if we can reconstruct the query text from that first argument, the QueryDesc* that pgexec_run_hook receives. And let's simplify things for ourselves and only worry about reconstructing a SELECT query.

Nodes and Datums

But first, let's talk about two fundemental ways data in Postgres (code) is organized.

Postgres code is extremely dynamic and, maybe relatedly, fairly object-oriented. Almost every entity in Postgres is a Node. While values in Postgres that are exposed to users of Postgres are Datums.

Each node has a type, NodeTag, that we can switch on to decide what to do. In contrast, Datum has no type. The type of the Datum must be known by context before using one of the transform functions like DatumGetBool to retrieve a C value from a Datum.

A table is a Node. A query plan is a Node. A sequential scan is a Node. A join is a Node. A literal in a query is a Node. The value for the literal in a query is a Datum.

Here is how The Internals of PostgreSQL book visualizes a query plan for example:

Every box in that image is a Node.

And all Nodes in code I've seen share a common definition prefix like this:

typedef struct SomeThing {
  pg_node_attr(abstract) // If the node is indeed abstract in the OOP sense.
  NodeTag type;
}

Many Nodes you'll see are abstract, like Plan. But by printing out NodeTag and checking the value printed in src/include/nodes/nodetags.h, you can find the concrete type of the Node.

src/include/nodes/nodetags.h is generated during a preprocessing step. (Don't look if regex in Perl worries you).

We'll get back to Nodes later.

What's in a QueryDesc?

Let's take a look at the QueryDesc struct:

typedef struct QueryDesc
{
    /* These fields are provided by CreateQueryDesc */
    CmdType     operation;      /* CMD_SELECT, CMD_UPDATE, etc. */
    PlannedStmt *plannedstmt;   /* planner's output (could be utility, too) */
    const char *sourceText;     /* source text of the query */
    Snapshot    snapshot;       /* snapshot to use for query */
    Snapshot    crosscheck_snapshot;    /* crosscheck for RI update/delete */
    DestReceiver *dest;         /* the destination for tuple output */
    ParamListInfo params;       /* param values being passed in */
    QueryEnvironment *queryEnv; /* query environment passed in */
    int         instrument_options; /* OR of InstrumentOption flags */

    /* These fields are set by ExecutorStart */
    TupleDesc   tupDesc;        /* descriptor for result tuples */
    EState     *estate;         /* executor's query-wide state */
    PlanState  *planstate;      /* tree of per-plan-node state */

    /* This field is set by ExecutorRun */
    bool        already_executed;   /* true if previously executed */

    /* This is always set NULL by the core system, but plugins can change it */
    struct Instrumentation *totaltime;  /* total time spent in ExecutorRun */
} QueryDesc;

The PlannedStmt field looks interesting. Let's take a look:

typedef struct PlannedStmt
{
    pg_node_attr(no_equal, no_query_jumble)

    NodeTag     type;

    CmdType     commandType;    /* select|insert|update|delete|merge|utility */

    uint64      queryId;        /* query identifier (copied from Query) */

    bool        hasReturning;   /* is it insert|update|delete RETURNING? */

    bool        hasModifyingCTE;    /* has insert|update|delete in WITH? */

    bool        canSetTag;      /* do I set the command result tag? */

    bool        transientPlan;  /* redo plan when TransactionXmin changes? */

    bool        dependsOnRole;  /* is plan specific to current role? */

    bool        parallelModeNeeded; /* parallel mode required to execute? */

    int         jitFlags;       /* which forms of JIT should be performed */

    struct Plan *planTree;      /* tree of Plan nodes */

    List       *rtable;         /* list of RangeTblEntry nodes */

    List       *permInfos;      /* list of RTEPermissionInfo nodes for rtable
                                 * entries needing one */

    /* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
    List       *resultRelations;    /* integer list of RT indexes, or NIL */

    List       *appendRelations;    /* list of AppendRelInfo nodes */

    List       *subplans;       /* Plan trees for SubPlan expressions; note
                                 * that some could be NULL */

    Bitmapset  *rewindPlanIDs;  /* indices of subplans that require REWIND */

    List       *rowMarks;       /* a list of PlanRowMark's */

    List       *relationOids;   /* OIDs of relations the plan depends on */

    List       *invalItems;     /* other dependencies, as PlanInvalItems */

    List       *paramExecTypes; /* type OIDs for PARAM_EXEC Params */

    Node       *utilityStmt;    /* non-null if this is utility stmt */

    /* statement location in source string (copied from Query) */
    int         stmt_location;  /* start location, or -1 if unknown */
    int         stmt_len;       /* length in bytes; 0 means "rest of string" */
} PlannedStmt;

The struct Plan* planTree field in there looks like what we'd want. But Plan is abstract:

typedef struct Plan
{
    pg_node_attr(abstract, no_equal, no_query_jumble)

    NodeTag     type;

So let's try printing out the planTree->type field and find the Node it is concretely. In pgexec.c change the definition of print_plan:

static void print_plan(QueryDesc* queryDesc) {
  elog(LOG, "[pgexec] HOOKED SUCCESSFULLY! %d", queryDesc->plannedstmt->planTree->type);
}

Rebuild and reinstall the extension, and restart Postgres:

$ make
$ sudo make install
$ /usr/local/pgsql/bin/postgres \
  --config-file=$(pwd)/test-db/postgresql.conf \
  -D $(pwd)/test-db
  -k $(pwd)/test-db
2023-11-18 19:35:16.680 GMT [3215547] LOG:  starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), 64-bit
2023-11-18 19:35:16.681 GMT [3215547] LOG:  listening on IPv6 address "::1", port 5432
2023-11-18 19:35:16.681 GMT [3215547] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-11-18 19:35:16.681 GMT [3215547] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-11-18 19:35:16.682 GMT [3215550] LOG:  database system was shut down at 2023-11-18 19:20:16 GMT
2023-11-18 19:35:16.684 GMT [3215547] LOG:  database system is ready to accept connections

And in another window run psql:

$ /usr/local/pgsql/bin/psql -h localhost postgres -f test.sql

And check the logs from the postgres process we just started and you should notice:

2023-11-19 17:46:18.834 GMT [3242495] LOG:  [pgexec] HOOKED SUCCESSFULLY! 322
2023-11-19 17:46:18.834 GMT [3242495] STATEMENT:  SELECT a FROM x WHERE a > 1;

So 322 is the NodeTag for the Plan. If we look that up in Postgres's src/include/nodes/nodetags.h (remember, this is generated after ./configure && make so I can't link you to it):

$ grep ' = 322' src/include/nodes/nodetags.h
        T_SeqScan = 322,

Hey, that makes sense! A SELECT without any indexes definitely sounds like a sequential scan!

Walking a sequential scan

Let's take a look at the SeqScan struct:

typedef struct SeqScan
{
    Scan        scan;
} SeqScan;

Ok, that's not very interesting. Let's look at Scan then:

typedef struct Scan
{
    pg_node_attr(abstract)

    Plan        plan;
    Index       scanrelid;      /* relid is index into the range table */
} Scan;

That's interesting! scanrelid represents the table we're scanning. I don't know what "range table" means exactly. But there was a field on the PlannedStmt called rtable that seems relevant.

rtable was described as a List of RangeTblEntry nodes. And browsing around the file where List is defined we can see some nice methods for working with Lists, like list_length().

Let's print out the scanrelid and let's check out the length of the rtable and see if it's filled out. Let's also restrict our print_plan code to only look at SeqScan nodes. In pgexec.c:

static void print_plan(QueryDesc* queryDesc) {
  SeqScan* scan = NULL;
  Plan* plan = queryDesc->plannedstmt->planTree;
  if (plan->type != T_SeqScan) {
    elog(LOG, "[pgexec] Unsupported plan type.");
    return;
  }

  scan = (SeqScan*)plan;

  elog(LOG, "[pgexec] relid: %d, rtable length: %d", scan->scan.scanrelid, list_length(queryDesc->plannedstmt->rtable));
}

Rebuild and reinstall the extension, and restart Postgres. (You can find the instructions for this above if you've forgotten.) Re-run the test.sql script. And check the Postgres server logs. You should see:

2023-11-19 18:00:34.184 GMT [3244438] LOG:  [pgexec] relid: 1, rtable length: 1
2023-11-19 18:00:34.184 GMT [3244438] STATEMENT:  SELECT a FROM x WHERE a > 1;

Awesome! So rtable does have data in it. There's only one table in this query so its length makes sense to be 1. The scanrelid being 1 also though is weird. Let's fetch the nth value from the rtable list using scanrelid-1 as the index.

For the RangeTblEntry itself, let's take a look:

typedef enum RTEKind
{
    RTE_RELATION,               /* ordinary relation reference */
    RTE_SUBQUERY,               /* subquery in FROM */
    RTE_JOIN,                   /* join */
    RTE_FUNCTION,               /* function in FROM */
    RTE_TABLEFUNC,              /* TableFunc(.., column list) */
    RTE_VALUES,                 /* VALUES (<exprlist>), (<exprlist>), ... */
    RTE_CTE,                    /* common table expr (WITH list element) */
    RTE_NAMEDTUPLESTORE,        /* tuplestore, e.g. for AFTER triggers */
    RTE_RESULT,                 /* RTE represents an empty FROM clause; such
                                 * RTEs are added by the planner, they're not
                                 * present during parsing or rewriting */
} RTEKind;

typedef struct RangeTblEntry
{
    pg_node_attr(custom_read_write, custom_query_jumble)

    NodeTag     type;

    RTEKind     rtekind;        /* see above */

    /*
     * XXX the fields applicable to only some rte kinds should be merged into
     * a union.  I didn't do this yet because the diffs would impact a lot of
     * code that is being actively worked on.  FIXME someday.
     */

    /*
     * Fields valid for a plain relation RTE (else zero):
     *
     * rellockmode is really LOCKMODE, but it's declared int to avoid having
     * to include lock-related headers here.  It must be RowExclusiveLock if
     * the RTE is an INSERT/UPDATE/DELETE/MERGE target, else RowShareLock if
     * the RTE is a SELECT FOR UPDATE/FOR SHARE target, else AccessShareLock.
     *
     * Note: in some cases, rule expansion may result in RTEs that are marked
     * with RowExclusiveLock even though they are not the target of the
     * current query; this happens if a DO ALSO rule simply scans the original
     * target table.  We leave such RTEs with their original lockmode so as to
     * avoid getting an additional, lesser lock.
     *
     * perminfoindex is 1-based index of the RTEPermissionInfo belonging to
     * this RTE in the containing struct's list of same; 0 if permissions need
     * not be checked for this RTE.
     *
     * As a special case, relid, relkind, rellockmode, and perminfoindex can
     * also be set (nonzero) in an RTE_SUBQUERY RTE.  This occurs when we
     * convert an RTE_RELATION RTE naming a view into an RTE_SUBQUERY
     * containing the view's query.  We still need to perform run-time locking
     * and permission checks on the view, even though it's not directly used
     * in the query anymore, and the most expedient way to do that is to
     * retain these fields from the old state of the RTE.
     *
     * As a special case, RTE_NAMEDTUPLESTORE can also set relid to indicate
     * that the tuple format of the tuplestore is the same as the referenced
     * relation.  This allows plans referencing AFTER trigger transition
     * tables to be invalidated if the underlying table is altered.
     */
    Oid         relid;          /* OID of the relation */
    char        relkind;        /* relation kind (see pg_class.relkind) */
    int         rellockmode;    /* lock level that query requires on the rel */
    struct TableSampleClause *tablesample;  /* sampling info, or NULL */
    Index       perminfoindex;

In SELECT a FROM x, x should be a plain relation RTE (to use the terminology there). So we can add a guard that validates that. But we don't get a Relation. (You might remember from my previous post that Relation is where we can finally see the table name.)

We get an Oid for the Relation. So we need to find a way to lookup a Relation from an Oid. And by grepping around in Postgres (or via judicious use of ChatGPT, I confess), we can notice RelationIdGetRelation that takes an Oid and returns a Relation. Notice also that the comment says we should close the relation when we're done with RelationClose.

So putting it altogether (and again, reusing some code from that previous post), we can print out the table name.

static void print_plan(QueryDesc* queryDesc) {
  SeqScan* scan = NULL;
  RangeTblEntry* rte = NULL;
  Relation relation = {};
  char* tablename = NULL;
  Plan* plan = queryDesc->plannedstmt->planTree;
  if (plan->type != T_SeqScan) {
    elog(LOG, "[pgexec] Unsupported plan type.");
    return;
  }

  scan = (SeqScan*)plan;
  rte = list_nth(queryDesc->plannedstmt->rtable, scan->scan.scanrelid-1);
  if (rte->rtekind != RTE_RELATION) {
    elog(LOG, "[pgexec] Unsupported FROM type: %d.", rte->rtekind);
    return;
  }

  relation = RelationIdGetRelation(rte->relid);
  tablename = NameStr(relation->rd_rel->relname);

  elog(LOG, "[pgexec] SELECT [todo] FROM %s", tablename);

  RelationClose(relation);
}

You'll also need to add a new #include for utils/rel.h.

Rebuild and reinstall the extension, and restart Postgres. Re-run the test.sql script. Check the Postgres server logs and you should see:

2023-11-19 18:36:03.986 GMT [3246777] LOG:  [pgexec] SELECT [todo] FROM x
2023-11-19 18:36:03.986 GMT [3246777] STATEMENT:  SELECT a FROM x WHERE a > 1;

Fantastic! Before we get into walking the SELECT columns and the (optional) WHERE clause, let's do some quick refactoring.

A string builder

Let's add a little string builder library so we can emit a single string we build up to a single elog() call.

I wrote this ahead of time and won't explain it here since the details aren't relevant.

Just copy this and paste near the top of pgexec.c:

typedef struct {
  char* mem;
  size_t len;
  size_t offset;
} PGExec_Buffer;

static void buffer_init(PGExec_Buffer* buf) {
  buf->offset = 0;
  buf->len = 8;
  buf->mem = (char*)malloc(sizeof(char) * buf->len);
}

static void buffer_resize_to_fit_additional(PGExec_Buffer* buf, size_t additional) {
  char* new = {};
  size_t newsize = 0;

  Assert(additional >= 0);

  if (buf->offset + additional < buf->len) {
    return;
  }

  newsize = (buf->offset + additional) * 2;
  new = (char*)malloc(sizeof(char) * newsize);
  Assert(new != NULL);
  memcpy(new, buf->mem, buf->len * sizeof(char));
  free(buf->mem);
  buf->len = newsize;
  buf->mem = new;
}

static void buffer_append(PGExec_Buffer*, char*, size_t);

static void buffer_appendz(PGExec_Buffer* buf, char* c) {
  buffer_append(buf, c, strlen(c));
}

static void buffer_append(PGExec_Buffer* buf, char* c, size_t chars) {
  buffer_resize_to_fit_additional(buf, chars);
  memcpy(buf->mem + buf->offset, c, chars);
  buf->offset += chars;
}

static void buffer_appendf(
  PGExec_Buffer *,
  const char* restrict,
  ...
) __attribute__ ((format (gnu_printf, 2, 3)));

static void buffer_appendf(PGExec_Buffer *buf, const char* restrict fmt, ...) {
  // First figure out how long the result will be.
  size_t chars = 0;
  va_list arglist;
  va_start(arglist, fmt);
  chars = vsnprintf(0, 0, fmt, arglist);
  Assert(chars >= 0); // TODO: error handling.

  // Resize to fit result.
  buffer_resize_to_fit_additional(buf, chars);

  // Actually do the printf into buf.
  va_end(arglist);
  va_start(arglist, fmt);
  chars = vsprintf(buf->mem + buf->offset, fmt, arglist);
  Assert(chars >= 0); // TODO: error handling.
  buf->offset += chars;
  va_end(arglist);
}

static char* buffer_cstring(PGExec_Buffer* buf) {
  char zero = 0;
  const size_t prev_offset = buf->offset;

  if (buf->offset == buf->len) {
    buffer_append(buf, &zero, 1);
    buf->offset--;
  } else {
    buf->mem[buf->offset] = 0;
  }

  // Offset should stay the same. This is a fake NULL.
  Assert(buf->offset == prev_offset);

  return buf->mem;
}

static void buffer_free(PGExec_Buffer* buf) {
  free(buf->mem);
}

Next we'll modify print_plan() in pgexec.c to use it, and add stubs for printing the SELECT columns and WHERE clauses.

static void buffer_print_where(PGExec_Buffer* buf, QueryDesc* queryDesc, Plan* plan) {
  buffer_appendz(buf, " [where todo]");
}

static void buffer_print_select_columns(PGExec_Buffer* buf, QueryDesc* queryDesc, Plan* plan) {
  buffer_appendz(buf, "[columns todo]");
}

static void print_plan(QueryDesc* queryDesc) {
  SeqScan* scan = NULL;
  RangeTblEntry* rte = NULL;
  Relation relation = {};
  char* tablename = NULL;
  Plan* plan = queryDesc->plannedstmt->planTree;
  PGExec_Buffer buf = {};

  if (plan->type != T_SeqScan) {
    elog(LOG, "[pgexec] Unsupported plan type.");
    return;
  }

  scan = (SeqScan*)plan;
  rte = list_nth(queryDesc->plannedstmt->rtable, scan->scan.scanrelid-1);
  if (rte->rtekind != RTE_RELATION) {
    elog(LOG, "[pgexec] Unsupported FROM type: %d.", rte->rtekind);
    return;
  }

  buffer_init(&buf);

  relation = RelationIdGetRelation(rte->relid);
  tablename = NameStr(relation->rd_rel->relname);

  buffer_appendz(&buf, "SELECT ");
  buffer_print_select_columns(&buf, queryDesc, plan);
  buffer_appendf(&buf, " FROM %s", tablename);
  buffer_print_where(&buf, queryDesc, plan);

  elog(LOG, "[pgexec] %s", buffer_cstring(&buf));

  RelationClose(relation);
  buffer_free(&buf);
}

Now we just need to implement the buffer_print_where and buffer_print_select_columns functions and our walking infrastructure will be done! For now. :)

Walking the WHERE clause

If you remember back to the SeqScan and Scan nodes, they were both basically empty. They had a Plan and a scanrelid. So the rest of the SELECT info must be in the Plan since it wasn't in the Scan.

Let's look at Plan again. One part that stands out is:

    /*
     * Common structural data for all Plan types.
     */
    int         plan_node_id;   /* unique across entire final plan tree */
    List       *targetlist;     /* target list to be computed at this node */
    List       *qual;           /* implicitly-ANDed qual conditions */
    struct Plan *lefttree;      /* input plan tree(s) */
    struct Plan *righttree;
    List       *initPlan;       /* Init Plan nodes (un-correlated expr
                                 * subselects) */

qual kinda looks like a WHERE clause. (And targetlist kinda looks like the columns the SELECT pulls).

Lists just contain void pointers, so we can't tell what the type of qual or targetlist children are. But I'm going to make a wild guess they are Nodes.

There's even a nice helper that casts void pointers to Node* and pulls out the type, nodeTag().

And reading around pg_list.h shows some interesting helper utilities like foreach that we can use to iterate the list.

Let's try printing out the type of qual's members.

static void buffer_print_where(PGExec_Buffer* buf, QueryDesc* queryDesc, Plan* plan) {
  ListCell* cell = NULL;
  bool first = true;
  if (plan->qual == NULL) {
    return;
  }

  buffer_appendz(buf, " WHERE ");
  foreach(cell, plan->qual) {
    if (!first) {
      buffer_appendz(buf, " AND ");
    }

    first = false;
    buffer_appendf(buf, "[node: %d]", nodeTag(lfirst(cell)));
  }
}

Notice any vestiges of LISP?

Rebuild and reinstall the extension, and restart Postgres. Re-run the test.sql script. Check the Postgres server logs and you should see:

2023-11-19 19:17:00.879 GMT [3250850] LOG:  [pgexec] SELECT [columns todo] FROM x WHERE [node: 15]
2023-11-19 19:17:00.879 GMT [3250850] STATEMENT:  SELECT a FROM x WHERE a > 1;

Well, our code didn't crash! So the guess about qual List entries being Nodes seems right. Let's look up that node type in the Postgres repo:

$ grep ' = 15,' src/include/nodes/nodetags.h
        T_OpExpr = 15,

Woot! That is exactly what I'd expect the WHERE clause here to be.

Now that we know qual is a List of Nodes, let's do a bit of refactoring since targetlist will probably also be a List of Nodes. Back in pgexec.c:

static void buffer_print_expr(PGExec_Buffer*, Node*);
static void buffer_print_list(PGExec_Buffer*, List*, char*);

static void buffer_print_opexpr(PGExec_Buffer* buf, OpExpr* op) {
  buffer_appendf(buf, "[opexpr: todo]");
}

static void buffer_print_expr(PGExec_Buffer* buf, Node* expr) {
  switch (nodeTag(expr)) {
  case T_OpExpr:
    buffer_print_opexpr(buf, (OpExpr*)expr);
    break;

  default:
    buffer_appendf(buf, "[Unknown: %d]", nodeTag(expr));
  }
}

static void buffer_print_list(PGExec_Buffer* buf, List* list, char* sep) {
  ListCell* cell = NULL;
  bool first = true;

  foreach(cell, list) {
    if (!first) {
      buffer_appendz(buf, sep);
    }

    first = false;