a curated list of database news from authoritative sources

August 05, 2018

Vitess Weekly Digest - Aug 24 2018

This week, we continue the digest from the Slack discussions. Update stream # Jian [Jul 25th at 1:27 PM] hi there, I'm new to Vitess, now I'm following the user-guide from vitess.io to explore vitess, in update stream section, I notice they have change log, where could I see these change logs so I can have a better understanding of the update stream? sougou That's the only documentation we have about the update stream, but we'll be fixing docs for all vitess very soon.

Vitess Weekly Digest - Aug 5 2018

This week, we kick off our new weekly blog updates — bringing you the best of Vitess questions and topics on our Slack discussions. The goal is to show the most interesting topics and requests so those of you just getting started can see highlights of what has been covered. Since this is our first ever digest, we’re going to go back in time and publish a little more than what happened last week.

August 04, 2018

btest: a language agnostic test runner

btest is a minimal, language-agnostic test runner originally written for testing compilers. Brian, an ex- co-worker from Linode, wrote the first implementation in Crystal (a compiled language clone of Ruby) for testing bshift, a compiler project. The tool accomplished exactly what I needed for my own language project, BSDScheme, and had very few dependencies. After some issues with Crystal support in containerized CI environments, and despite some incredible assistance from the Crystal community, we rewrote btest in D to simplify downstream use.

How it works

btest registers a command (or commands) to run and verifies the command output and status for different inputs. btest iterates over files in a directory to discover test groups and individual tests within. It supports a limited template language for easily adjusting a more-or-less similar set of tests. And it supports running test groups and individual tests themselves in parallel. All of this is managed via a simple YAML config.

btest.yaml

btest requires a project-level configuration file to declare the test directory, the command(s) to run per test, etc. Let's say we want to run tests against a python program. We create a btest.yaml file with the following:

test_path: tests

runners:
  - name: Run tests with cpython
    run: python test.py

test_path is the directory in which tests are located. runners is an array of commands to run per test. We hard-code a file to run test.py as a project-level standard file that will get written to disk in an appropriate path for each test-case.

On multiple runners

Using multiple runners is helpful when we want to run all tests with different test commands or test command settings. For instance, we could run tests against cpython and pypy by adding another runner to the runners section.

test_path: tests

runners:
  - name: Run tests with cpython
    run: python test.py
  - name: Run tests with pypy
    run: pypy test.py

An example test config

Let's create a divide-by-zero.yaml file in the tests directory and add the following:

cases:
  - name: Should exit on divide by zero
    status: 1
    stdout: |
      Traceback (most recent call last):
        File "test.py", line 1, in <module>
          4 / 0
      ZeroDivisionError: division by zero
    denominator: 0
templates:
  - test.py: |
      4 / {{ denominator }}

In this example, name will be printed out when the test is run. status is the expected integer returned by running the program. stdout is the entire expected output written by the program during execution. None of these three fields are required. If status or stdout are not provided, btest will skip checking them.

Any additional key-value pairs are treated as template variable values and will be substituted if/where it is referenced in the templates section when the case is run. denominator is the only such variable we use in this example. When this first (and only) case is run, test.py will be written to disk containing 4 / 0.

templates section

The templates section is a dictionary allowing us to specify files to be created with variable substitution. All files are created in the same directory per test case, so if we want to import code we can do so with relative paths.

Here is a simple example of a BSDScheme test that uses this feature.

Running btest

Run btest from the root directory (the directory above tests) and we'll see all the grouped test cases that btest registers and the result of each test:

$ btest
tests/divide-by-zero.yaml
[PASS] Should exit on divide by zero

1 of 1 tests passed for runner: Run tests with cpython

Use in CI environments

In the future we may provide pre-built release binaries. But in the meantime, the CI step involves downloading git and ldc and building/installing btest before calling it.

Circle CI

This is the config file I use for testing BSDScheme:

version: 2
jobs:
  build:
    docker:
      - image: dlanguage/ldc
    steps:
      - checkout
      - run:
          name: Install debian-packaged dependencies
          command: |
            apt update
            apt install -y git build-essential
            ln -s $(which ldc2) /usr/local/bin/ldc
      - run:
          name: Install btest
          command: |
            git clone https://github.com/briansteffens/btest
            cd btest
            make
            make install
      - run:
          name: Install bsdscheme
          command: |
            make
            make install
      - run:
          name: Run bsdscheme tests
          command: btest

Travis CI

This is the config Brian uses for testing BShift:

sudo: required

language: d

d:
    - ldc

script:
    # ldc gets installed as other names sometimes
    - sudo ln -s `which $DC` /usr/local/bin/ldc

    # bshift
    - make
    - sudo ln -s $PWD/bin/bshift /usr/local/bin/bshift
    - sudo ln -s $PWD/lib /usr/local/lib/bshift

    # nasm
    - sudo apt-get install -y nasm

    # basm
    - git clone https://github.com/briansteffens/basm
    - cd basm && cabal build && cd ..
    - sudo ln -s $PWD/basm/dist/build/basm/basm /usr/local/bin/basm

    # btest
    - git clone https://github.com/briansteffens/btest
    - cd btest && make && sudo make install && cd ..

    # run the tests
    - btest

July 22, 2018

When MySQL Goes Away

Handling MySQL errors in Go is not easy. There are a lot of MySQL server error codes, and the Go MySQL driver as its own errors, and Go database/sql has its own errors, and errors can bubble up from other packages, like net.OpError. Consequently, Go programs tend not to handle errors. Instead, they simply report errors:

err := db.Query(...).Scan(&v)
if err != nil {
 return err
}

And then the error is logged or reported somewhere. This is as poor as it common, and it’s extremely common. A robust program handles the error: retry the query if possible; or report a more specific error; else, report the unhandled error. But robust MySQL error handling in Go requires very specific knowledge and experience that is beyond the reasonable purview of app developers.

June 08, 2018

Propagation of Mistakes in Papers

While reading papers on cardinality estimation I noticed something odd: The seminal paper by Flajolet and Martin on probabilistic counting gives a bias correction constant as 0.77351, while a more recent (and very useful) paper by Scheuermann and Mauve gives the constant as 0.775351. Was this a mistake? Or did they correct a mistake in the original paper?

I started searching, and there is a large number of papers that uses the value 0.775351, but there is also a number of papers that uses the value 0.77351. Judging by the number of Google hits for "Flajolet 0.77351" vs. "Flajolet 0.775351" the 0.77351 group seems to be somewhat larger, but both camps have a significant number of publications. Interestingly, not a single paper mentions both constants, and thus no paper explains what the correct constant should be.

In the end I repeated the constant computation as explained by Flajolet, and the correct value is 0.77351. We can even derive one digit more when using double arithmetic (i.e., 0.773516), but that makes no difference in practice. Thus, the original paper was correct.

But why do so many paper use the incorrect value 0.775351 then? My guess is that at some point somebody made a typo while writing a paper, introducing the superfluous digit 5, and that all other authors copied the constant from that paper without re-checking its value. I am not 100% sure what the origin of the mistake is. The incorrect value seems to appear first in the year 2007, showing up in multiple publications from that year. Judging by publication date the source seems to be this paper (also it did not cite any other papers with the incorrect value, as far as I know). And everybody else just copied the constant from somewhere else, propagating it from paper to paper.

If you find this web page because you are searching for the correct Flajolet/Martin bias correction constant, I can assure you that the original paper was correct, and that the value is 0.77351. But you do not have to trust me on this, you can just repeat the computation yourself.

May 18, 2018

Writing to be read

There is a common struggle in the writing and maintenance of documentation, checklists, emails, guides, etc. Each provides immense value; a document may be the key to an important process. The goal is to remove barriers -- to encourage understanding and correct application of what has been noted -- without requiring a change in the character of the reader. That is, expect reading to be difficult and people to be lazy. Don't make things harder for your reader than need be.

Ignoring imperfections in the ideas transcribed into writing, there are a few particular aesthetic approaches I take to (hopefully) make my notes more effective. These ideas have been influenced by readings on writing, psychology, and user experience. In particular, I recommend On Writing Well, Thinking Fast and Slow, and Nielsen Norman research.

Language correctness

Spelling and grammatical correctness are low hanging fruit. They are easy to achieve. Use full sentences, use punctuation, and capitalize appropriately. But don't be a grammar stickler unreasonably; language is flexible and always changing. Don't allow anyone the opportunity to take your work less seriously by screwing up the basics.

Structuring sentences and paragraphs

Keep your sentences short. And avoid run on sentences; they are always difficult to parse. If you use more than two commas in a sentences (aside from in lists), the sentence is terrible. Split it up. Commas are often used superfluously. Don't do that.

Remember that if a comma separates two sentences, you can separate them into two sentences with a period instead. And if you ever have a list containing another list, separate the outer list with semi colons instead of commas to provide better differentiation.

Keep your paragraphs short too. In primary school you may have learned to use 5-8 sentences per paragraph. Don't do so needlessly. 3-5 sentences can be perfectly appropriate. As both sentences and paragraphs get longer, they appear more intimidating and can discourage readers from continuing.

Visually speaking

Make your line height 120-145% the height of the font. Increase the spacing between lines in a paragraph to make the paragraph less dense and more friendly.

Keep contrast high. Don't put very gray (or colored) text on a white background.

Additionally, a number of studies suggest that limiting the width of text increases readability. For best results, limit the width such that 50-75 characters appear per line of text.

Don't put checklists in paragraphs

If a document describes concrete steps that should be followed exactly and can be reasonably summarized, don't hide the steps within paragraphs of text. Instead use an ordered or unordered list to clearly enumerate the expectations. You can't expect a checklist to be followed when it is hidden within the sentences of a paragraph.

Structuring sections

Any document (regardless the type) longer than 3-5 paragraphs should be broken into sub-sections with summarizing headers to aid scanning. Use the HTML id attribute to allow a direct link to a particular section in a long page. If the page has more than two sections or vertically flows beyond a single screen, consider adding a table of contents at the top of the page to allow the reader to find the exact section she needs.

Visually speaking

Don't put large headers immediately next to each other. It is disruptive to have multiple lines of large text.

I almost completely avoid Github Markdown's h1/# tag because it is just too large and jarring relative to the rest of the text. It is often best for the flow of a Github Markdown document to stick to only h3-h4/###-#### tags for headers, using the h2/## tag for the document title.

In summary

The aesthetic flow of a document can help or hurt the experience of a reader consuming it. Good aesthetic "sense" in this regard can be boiled down to a few methods that primarily revolve around simplifying structure and facilitating the rewarding feeling of progress as a reader reads.

Writing is difficult and takes time to evolve helpfully. The dividends are paid when process is better followed and questions are readily clarified in writing without further human intervention. It is incumbent on those writing and maintaining to organize effectively and see confusion of the reader as fault of the document, not fault of the reader. It is easier to change something yourself than to expect others to change to accommodate you.

May 06, 2018

Writing a simple JSON parser

Writing a JSON parser is one of the easiest ways to get familiar with parsing techniques. The format is extremely simple. It's defined recursively so you get a slight challenge compared to, say, parsing Brainfuck; and you probably already use JSON. Aside from that last point, parsing S-expressions for Scheme might be an even simpler task.

If you'd just like to see the code for the library, pj, check it out on Github.

What parsing is and (typically) is not

Parsing is often broken up into two stages: lexical analysis and syntactic analysis. Lexical analysis breaks source input into the simplest decomposable elements of a language called "tokens". Syntactic analysis (often itself called "parsing") receives the list of tokens and tries to find patterns in them to meet the language being parsed.

Parsing does not determine semantic viability of an input source. Semantic viability of an input source might include whether or not a variable is defined before being used, whether a function is called with the correct arguments, or whether a variable can be declared a second time in some scope.

There are, of course, always variations in how people choose to parse and apply semantic rules, but I am assuming a "traditional" approach to explain the core concepts.

The JSON library's interface

Ultimately, there should be a from_string method that accepts a JSON-encoded string and returns the equivalent Python dictionary.

For example:

assert_equal(from_string('{"foo": 1}'),
             {"foo": 1})

Lexical analysis

Lexical analysis breaks down an input string into tokens. Comments and whitespace are often discarded during lexical analysis so you are left with a simpler input you can search for grammatical matches during the syntactic analysis.

Assuming a simple lexical analyzer, you might iterate over all the characters in an input string (or stream) and break them apart into fundemental, non-recursively defined language constructs such as integers, strings, and boolean literals. In particular, strings must be part of the lexical analysis because you cannot throw away whitespace without knowing that it is not part of a string.

In a helpful lexer you keep track of the whitespace and comments you've skipped, the current line number and file you are in so that you can refer back to it at any stage in errors produced by analysis of the source. The V8 Javascript engine recently became able to do reproduce the exact source code of a function. This, at the very least, would need the help of a lexer to make possible.

Implementing a JSON lexer

The gist of the JSON lexer will be to iterate over the input source and try to find patterns of strings, numbers, booleans, nulls, or JSON syntax like left brackets and left braces, ultimately returning each of these elements as a list.

Here is what the lexer should return for an example input:

assert_equal(lex('{"foo": [1, 2, {"bar": 2}]}'),
             ['{', 'foo', ':', '[', 1, ',', 2, ',', '{', 'bar', ':', 2, '}', ']', '}'])

Here is what this logic might begin to look like:

def lex(string):
    tokens = []

    while len(string):
        json_string, string = lex_string(string)
        if json_string is not None:
            tokens.append(json_string)
            continue

        # TODO: lex booleans, nulls, numbers

        if string[0] in JSON_WHITESPACE:
            string = string[1:]
        elif string[0] in JSON_SYNTAX:
            tokens.append(string[0])
            string = string[1:]
        else:
            raise Exception('Unexpected character: {}'.format(string[0]))

    return tokens

The goal here is to try to match strings, numbers, booleans, and nulls and add them to the list of tokens. If none of these match, check if the character is whitespace and throw it away if so. Otherwise store it as a token if it is part of JSON syntax (like left brackets). Finally throw an exception if the character/string didn't match any of these patterns.

Let's extend the core logic here a little bit to support all the types and add the function stubs.

def lex_string(string):
    return None, string


def lex_number(string):
    return None, string


def lex_bool(string):
    return None, string


def lex_null(string):
    return None, string


def lex(string):
    tokens = []

    while len(string):
        json_string, string = lex_string(string)
        if json_string is not None:
            tokens.append(json_string)
            continue

        json_number, string = lex_number(string)
        if json_number is not None:
            tokens.append(json_number)
            continue

        json_bool, string = lex_bool(string)
        if json_bool is not None:
            tokens.append(json_bool)
            continue

        json_null, string = lex_null(string)
        if json_null is not None:
            tokens.append(None)
            continue

        if string[0] in JSON_WHITESPACE:
            string = string[1:]
        elif string[0] in JSON_SYNTAX:
            tokens.append(string[0])
            string = string[1:]
        else:
            raise Exception('Unexpected character: {}'.format(string[0]))

    return tokens

Lexing strings

For the lex_string function, the gist will be to check if the first character is a quote. If it is, iterate over the input string until you find an ending quote. If you don't find an initial quote, return None and the original list. If you find an initial quote and an ending quote, return the string within the quotes and the rest of the unchecked input string.

def lex_string(string):
    json_string = ''

    if string[0] == JSON_QUOTE:
        string = string[1:]
    else:
        return None, string

    for c in string:
        if c == JSON_QUOTE:
            return json_string, string[len(json_string)+1:]
        else:
            json_string += c

    raise Exception('Expected end-of-string quote')

Lexing numbers

For the lex_number function, the gist will be to iterate over the input until you find a character that cannot be part of a number. (This is, of course, a gross simplification, but being more accurate will be left as an exercise to the reader.) After finding a character that cannot be part of a number, either return a float or int if the characters you've accumulated number more than 0. Otherwise return None and the original string input.

def lex_number(string):
    json_number = ''

    number_characters = [str(d) for d in range(0, 10)] + ['-', 'e', '.']

    for c in string:
        if c in number_characters:
            json_number += c
        else:
            break

    rest = string[len(json_number):]

    if not len(json_number):
        return None, string

    if '.' in json_number:
        return float(json_number), rest

    return int(json_number), rest

Lexing booleans and nulls

Finding boolean and null values is a very simple string match.

def lex_bool(string):
    string_len = len(string)

    if string_len >= TRUE_LEN and \
       string[:TRUE_LEN] == 'true':
        return True, string[TRUE_LEN:]
    elif string_len >= FALSE_LEN and \
         string[:FALSE_LEN] == 'false':
        return False, string[FALSE_LEN:]

    return None, string


def lex_null(string):
    string_len = len(string)

    if string_len >= NULL_LEN and \
       string[:NULL_LEN] == 'null':
        return True, string[NULL_LEN:]

    return None, string

And now the lexer code is done! See the pj/lexer.py for the code as a whole.

Syntactic analysis

The syntax analyzer's (basic) job is to iterate over a one-dimensional list of tokens and match groups of tokens up to pieces of the language according to the definition of the language. If, at any point during syntactic analysis, the parser cannot match the current set of tokens up to a valid grammar of the language, the parser will fail and possibly give you useful information as to what you gave, where, and what it expected from you.

Implementing a JSON parser

The gist of the JSON parser will be to iterate over the tokens received after a call to lex and try to match the tokens to objects, lists, or plain values.

Here is what the parser should return for an example input:

tokens = lex('{"foo": [1, 2, {"bar": 2}]}')
assert_equal(tokens,
             ['{', 'foo', ':', '[', 1, ',', 2, '{', 'bar', ':', 2, '}', ']', '}'])
assert_equal(parse(tokens),
             {'foo': [1, 2, {'bar': 2}]})

Here is what this logic might begin to look like:

def parse_array(tokens):
    return [], tokens

def parse_object(tokens):
    return {}, tokens

def parse(tokens):
    t = tokens[0]

    if t == JSON_LEFTBRACKET:
        return parse_array(tokens[1:])
    elif t == JSON_LEFTBRACE:
        return parse_object(tokens[1:])
    else:
        return t, tokens[1:]

A key structural difference between this lexer and parser is that the lexer returns a one-dimensional array of tokens. Parsers are often defined recursively and returns a recursive, tree-like object. Since JSON is a data serialization format instead of a language, the parser should produce objects in Python rather than a syntax tree on which you could perform more analysis (or code generation in the case of a compiler).

And, again, the benefit of having the lexical analysis happen independent from the parser is that both pieces of code are simpler and concerned with only specific elements.

Parsing arrays

Parsing arrays is a matter of parsing array members and expecting a comma token between them or a right bracket indicating the end of the array.

def parse_array(tokens):
    json_array = []

    t = tokens[0]
    if t == JSON_RIGHTBRACKET:
        return json_array, tokens[1:]

    while True:
        json, tokens = parse(tokens)
        json_array.append(json)

        t = tokens[0]
        if t == JSON_RIGHTBRACKET:
            return json_array, tokens[1:]
        elif t != JSON_COMMA:
            raise Exception('Expected comma after object in array')
        else:
            tokens = tokens[1:]

    raise Exception('Expected end-of-array bracket')

Parsing objects

Parsing objects is a matter of parsing a key-value pair internally separated by a colon and externally separated by a comma until you reach the end of the object.

def parse_object(tokens):
    json_object = {}

    t = tokens[0]
    if t == JSON_RIGHTBRACE:
        return json_object, tokens[1:]

    while True:
        json_key = tokens[0]
        if type(json_key) is str:
            tokens = tokens[1:]
        else:
            raise Exception('Expected string key, got: {}'.format(json_key))

        if tokens[0] != JSON_COLON:
            raise Exception('Expected colon after key in object, got: {}'.format(t))

        json_value, tokens = parse(tokens[1:])

        json_object[json_key] = json_value

        t = tokens[0]
        if t == JSON_RIGHTBRACE:
            return json_object, tokens[1:]
        elif t != JSON_COMMA:
            raise Exception('Expected comma after pair in object, got: {}'.format(t))

        tokens = tokens[1:]

    raise Exception('Expected end-of-object brace')

And now the parser code is done! See the pj/parser.py for the code as a whole.

Unifying the library

To provide the ideal interface, create the from_string function wrapping the lex and parse functions.

def from_string(string):
    tokens = lex(string)
    return parse(tokens)[0]

And the library is complete! (ish). Check out the project on Github for the full implementation including basic testing setup.

Appendix A: Single-step parsing

Some parsers choose to implement lexical and syntactic analysis in one stage. For some languages this can simplify the parsing stage entirely. Or, in more powerful languages like Common Lisp, it can allow you to dynamically extend the lexer and parser in one step with reader macros.

I wrote this library in Python to make it more accessible to a larger audience. However, many of the techniques used are more amenable to languages with pattern matching and support for monadic operations -- like Standard ML. If you are curious what this same code would look like in Standard ML, check out the JSON code in Ponyo.

April 28, 2018

Finishing up a FreeBSD experiment

I've been using FreeBSD as my daily driver at work since December. I've successfully done my job and I've learned a hell of a lot forcing myself on CURRENT... But there's been a number of issues with it that have made it difficult to keep using, so I replaced it with Arch Linux yesterday and I no longer have those issues. This is not the first time I've forced myself to run FreeBSD and it won't be the last.

The FreeBSD setup

I have a Dell Developer Edition. It employs full-disk encryption with ZFS. Not being a "disk-jockey" I cannot comment on how exhilarating an experience running ZFS is. It didn't cause me any trouble.

It has an Intel graphics card and the display server is X. I use the StumpWM window manager and the SLiM login manager. xscreensaver handles locking the screen, feh gives me background images, scrot gives me screenshots, and recordMyDesktop gives me video screen capture. This list should feel familiar to users of Arch Linux or other X-supported, bring-your-own-software operating systems/Linux distributions.

Software development

I primarily work on a web application with Node/PostgreSQL and React/SASS. I do all of this development locally on FreeBSD. I run other components of our system in a Vagrant-managed VirtualBox virtual machine.

Upgrading the system

Since I'm running CURRENT, I fetch the latest commit on Subversion and rebuild the FreeBSD system (kernel + user-land) each weekend to get the new hotness. This takes somewhere between 1-4 hours. I start the process Sunday morning and come back to it after lunch. After the system is compiled and installed, I update all the packages through the package manager and deal with fallout from incompatible kernel modules that send me in a crash/reboot loop on boot.

This is actually the part about running FreeBSD (CURRENT) I love the most. I've gotten more familiar with the development and distribution of kernel modules like the WiFi, Graphics, and VirtualBox drivers. I've learned a lot about the organization of the FreeBSD source code. And I've gotten some improvements merged into the FreeBSD Handbook on how to debug a core dump.

Issues with FreeBSD on my hardware

I installed CURRENT in December to get support for new Intel graphics drivers (which have since been backported to STABLE). The built-in Intel WiFi card is also new enough that it hadn't been backported to STABLE. My WiFi ultimately never got more than 2-4Mbps down on the same networks my Macbook Pro would get 120-250Mbps down. I even bought an older Realtek USB WiFi adapter and it fared no differently. My understanding is that this is because CURRENT turns on enough debug flags that the entire system is not really meant to be used except for by FreeBSD developers.

It would often end up taking 10-30 seconds for a git push to happen. It would take minutes to pull new Docker images, etc. This (like everything else) does not mean you cannot do work on FreeBSD CURRENT, it makes it really annoying.

Appendix A - Headphones

I couldn't figure out the headphone jack at all. Configuring outputs via sysctl and device.hints is either really complicated or presented in documentation really complicatedly. I posted a few times in #freebsd on Freenode and got eager assistance but ultimately couldn't get the headphone jack to produce anything without incredible distortion.

Of course Spotify has no FreeBSD client and I didn't want to try the Linux compatiblity layer (which may have worked). I tried spoofing user agents for the Spotify web app in Chrome but couldn't find one that worked. (I still cannot get a working one on Linux either.) So I'd end up listening to Spotify on my phone, which would have been acceptable except for that the studio headphones I decided I needed were immensely under-powered by my phone.

Appendix B - Yubikey

I couldn't figure out how to give myself non-root access to my Yubikey which I believe is the reason I ultimately wasn't able to make any use of it. Though admittedly I don't understand a whit of GPG/PGP or Yubikey itself.

Appendix C - bhyve

I really wanted to use bhyve as the hypervisor for my CentOS virtual machines instead of VirtualBox. So I spent 2-3 weekends trying to get it working as a backend for Vagrant. Unfortunately the best "supported" way of doing this is to manually mutate VirtualBox-based Vagrant boxes and that just repeatedly didn't work for me.

When I tried using bhyve directly I couldn't get networking right. Presumably this is because NAT doesn't work well with wireless interfaces... And I hadn't put in enough weekends to understand setting up proxy rules correctly.

Appendix D - Synaptics

It is my understanding that FreeBSD has its own custom Synaptics drivers and configuration interfaces. Whether that is the case or not, the documentation is a nightmare and while I would have loved to punt to a graphical interface to prevent from fat-palming the touchpad every 30 seconds, none of the graphical configuration tools seemed to work.

A few weeks ago I think I finally got the synaptics support on but I couldn't scroll or select text anymore. I also had to disable synaptics, restart X, enable synaptics, and restart X on each boot for it to successfully register the mouse. I meant to post in #freebsd on Freenode where I probably would have found a solution but :shrugs:.

Appendix E - Sleep

Well sleep doesn't really work on any modern operating system.

FreeBSD is awesome

I enjoy picking on my setup, but it should be impressive that you can do real-world work on FreeBSD. If I had a 3-4 year old laptop instead of a 1-2 year old laptop, most of my issues would be solved.

Here are some reasons to like FreeBSD.

Less competition

This is kind of stupid. But it's easier to find work to do (e.g. docs to fix, bugs to report, ports to add/update, drivers to test) on FreeBSD. I'm really disappointed to be back on Linux because I like being closer to the community and knowing there are ways I can contribute and learn. It's difficult to find the right combination of fending/learning for yourself and achieving a certain level of productivity.

Package management (culture)

Rolling packages are really important to me as a developer. When I've run Ubuntu and Debian desktops in the past, I typically built 5-15 major (to my workflow) components from source myself. This is annoying. Rolling package systems are both easier to use and easier to contribute to... The latter point may be a coincidence.

In FreeBSD, packages are rolling and the base system (kernel + userland) is released every year or two if you run the recommended/supported "flavors" of FreeBSD (i.e. not CURRENT). If you're running CURRENT then everything is rolling.

Packages are binary, but you can build them from source if needed.

Source

FreeBSD has an older code base than Linux does but still manages to be much better organized. OpenBSD and Minix are even better organized but I don't consider them in the grouping as mainstream general-purpose operating systems like FreeBSD and Linux. Linux is an awful mess and is very intimidating, though I hope to get over that.

Old-school interfaces

There's no systemd so starting X is as simple as startx (but you can enable the login manager service to have it launch on boot). You configure your network interfaces via ifconfig, wpa_supplicant, and dhclient.

Alternatives

PCBSD or TrueOS may be a good option for desktop users but something about the project turns me off (maybe it's the scroll-jacking website).

Picking Arch Linux

In any case, I decided it was time to stop waiting for git push to finish. I had run Gentoo at work for 3-4 months before I installed FreeBSD. But I still had nightmares of resolving dependencies during upgrades. I needed a binary package manager (not hard to find) and a rolling release system.

Installing Arch stinks

Many of my old coworkers at Linode run Arch Linux at home so I've looked into it a few times. It absolutely meets my rolling release and binary packaging needs. But I've been through the installation once before (and I've been through Gentoo's) and loathed the minutes-long effort required to set up full-disk encryption. Also, systemd? :(

How about Void Linux?

Void Linux looked promising and avoids systemd (which legitimately adds complexity and new tools to learn for desktop users with graphics and WiFi/DHCP networking). It has a rolling release system and binary packages, but overall didn't seem popular enough. I worried I'd be in the same boat as in Debian/Ubuntu building lots of packages myself.

What about Arch-based distros?

Eventually I realized Antergos and Manjaro are two (Distrowatch-rated) popular distributions that are based on Arch and would provide me with the installer I really wanted. I read more about Manjaro and found it was pretty divergent from Arch. That didn't sound appealing. Divergent distributions like Manjaro and Mint exist to cause trouble. Antergos, on the other hand, seemed to be a thin layer around Arch including a graphical installer and its own few package repositories. It seemed easy enough to remove after the installation was finished.

Antergos Linux

I ran the Antergos installer and the first time around, my touchpad didn't work at all. I tried a USB mouse (that to be honest, may have been broken anyway) but it didn't seem to be recognized. I rebooted and my touchpad worked.

I tried to configure WiFi using the graphical NetworkManager provided but it was super buggy. Menus kept expanding and contracting as I moused over items. And it ultimately never prompted me for a password to the locked networks around me. (It showed lock icons beside the locked networks.)

I spent half an hour trying to configure the WiFi manually. After I got it working (and "learned" all the fun new modern tools like ip, iw, dhcpcd, iwconfig, and systemd networking), the Antergos installer would crash at the last step for some error related to not being able to update itself.

At this point I gave up. The Antergos installer was half-baked, buggy, and was getting me nowhere.

Anarchy Linux

Still loathe to spend a few minutes configuring disk encryption manually, I interneted until I found Anarchy Linux (which used to be Arch Anywhere).

This installer seemed even more promising. It is a TUI installer so no need for a mouse and there are more desktop environments to pick from (including i3 and Sway) or avoid.

It was a little concerning that Anarchy Linux also intends to be its own divergent Arch-based distribution, but in the meantime it still offers support for installing vanilla Arch.

It worked.

Life on Arch

I copied over all my configs from my FreeBSD setup and they all worked. That's pretty nice (also speaks to the general compatibility of software between Linux and FreeBSD). StumpWM, SLiM, scrot, xscreensaver, feh, Emacs, Tmux, ssh, kubectl, font settings, keyboarding bindings, etc.

Getting Powerline working was a little weird. The powerline and powerline-fonts packages don't seem to install patched fonts (e.g. Noto Sans for Powerline). I prefer to use these than the alternative of specifying multiple fonts for fallbacks because I have font settings in multiple places (e.g. .Xresources, .emacs, etc) and the syntax varies in each config. So ultimately I cloned the github.com/powerline/fonts repo and ran the install.sh script there to get the patched fonts.

But hey, there's a Spotify client! It works! And the headphone jack just works after installing alsa-utils and running alsamixer. And my WiFi speed is 120Mbps-250Mbps down on all the right networks!

I can live with this.

Random background

Each time I join a new company, I try to use the change as an excuse to force myself to try different workflows and learn something new tangential to the work I actually do. I'd been a Vim and Ubuntu desktop user since highschool. In 2015, I took a break from work on the East Coast to live in a school bus in Silver City, New Mexico. I swapped out my Ubuntu and Vim dev setup for FreeBSD and Emacs. I kept GNOME 3 because I liked the asthetic. I spent 6 months with this setup forcing myself to use it as my daily-driver doing full-stack, contract web development gigs.

In 2016, I joined Linode and took up the company Macbook Pro. I wasn't as comfortable at the time running Linux on my Macbook, but a determined coworker put Arch on his. I was still the only one running Emacs (everyone else used Vim or VS Code) for Python and React development.

I joined Capsule8 in late 2017 and put Gentoo on my Dell Developer Edition. Most people ran Ubuntu on the Dell or macOS. I'd never used Gentoo on a desktop before but liked the systemd-optional design and similarities to FreeBSD. I ran Gentoo for 3-4 months but was constantly breaking it during upgrades, and the monthly, full-system upgrades themselves took 1-2 days. I didn't have the chops or patience to deal with it.

So I used FreeBSD for 5 months and now I'm back on Linux.