An open API service indexing awesome lists of open source software.

https://github.com/oxidecomputer/buildomat

a software build labour-saving device
https://github.com/oxidecomputer/buildomat

Last synced: 11 months ago
JSON representation

a software build labour-saving device

Awesome Lists containing this project

README

          


hammer logo

B U I L D O M A T

a software build labour-saving device


Buildomat manages the provisioning of ephemeral UNIX systems (e.g., instances
in AWS EC2) on which to run software builds. It logs job output, collects
build artefacts, and reports status. The system integrates with GitHub through
the Checks API, to allow build jobs to be triggered by pushes and pull
requests.

## Components

Buildomat is made up of a variety of crates, loosely grouped into areas
of related functionality:

```
$ cargo xtask crates
buildomat /bin
buildomat-agent /agent
buildomat-bunyan /bunyan
buildomat-client /client
buildomat-common /common
buildomat-database /database
buildomat-server /server
buildomat-types /types

buildomat-factory-aws /factory/aws
buildomat-factory-lab /factory/lab

buildomat-github-common /github/common
buildomat-github-database /github/database
buildomat-github-dbtool /github/dbtool
buildomat-github-ghtool /github/ghtool
buildomat-github-server /github/server

xtask /xtask
```

### Buildomat Core

The buildomat core is responsible for authenticating users and remote services,
for managing build systems, and for running jobs and collecting output.

#### Server (`buildomat-server`, in `server/`)

The core buildomat API server. Coordinates the creation, tracking, and
destruction of workers in which to execute jobs. This component sits at the
centre of the system and is used by the GitHub integration server, the client
command, the agent running within each worker for control of the job, and any
factories.

#### Client Command (`buildomat`, in `bin/`)

A client tool that uses the client library to interface with and manipulate the
core server. The tool has both administrative and user-level functions,
expressed in a relatively regular hierarchy of commands; e.g., `buildomat job
run` or `buildomat user ls`.

```
$ ./target/release/buildomat
Usage: buildomat [OPTS] COMMAND [ARGS...]

Commands:
info get information about server and user account
control server control functions
job job management
user user management

Options:
--help usage information
-p, --profile PROFILE
authentication and server profile

ERROR: choose a command
```

#### Client Library (`buildomat-client`, in `client/`)

A HTTP client library for accessing the core buildomat server. This client is
generated at build time by
[`progenitor`](https://github.com/oxidecomputer/progenitor), an OpenAPI client
generator.

The client is generated based an OpenAPI document managed in the repository and
generated by Dropshot based on the implementation of the server and then
checked in to the repository. If you make changes to the API exposed by the
core server, you will need to regenerate the document, `client/openapi.json`,
using:

```
$ cargo xtask openapi
```

#### Agent (`buildomat-agent`, in `agent`/)

A process that is injected into an ephemeral AWS EC2 instance to allow the
buildomat core server to take control and run jobs. This process receives
single-use credentials at provisioning time from the core server, and connects
out to receive instructions. The agent does not require a public IP, or any
direct inbound connectivity, to allow agents to run inside remote NAT
environments.

### Factories

Buildomat jobs are specified to execute within a particular target environment.
Concrete instances of those target environments (known as _workers_) are
created, managed, and destroyed by _factories_. Factories are long-lived
server processes that connect to the core API and offer to construct workers as
needed. When a worker has finished executing the job, or when requested by an
operator, the factory is also responsible for freeing any resources that were
in use by the worker.

#### AWS Factory (`buildomat-factory-aws` in `factory/aws/`)

The AWS factory creates ephemeral AWS instances that are used to run one job
and are then destroyed. The factory arranges for the agent to be installed and
start automatically in each instance that is created. The factory is
responsible for ensuring no stale resources are left behind, and for enforcing
a cap on the concurrent use of resources at AWS. Each target provided by an
AWS factory can support a different instance type (i.e., CPU and RAM capacity),
a different image (AMI), and a different root disk size.

#### Lab Factory (`buildomat-factory-lab` in `factory/lab/`)

The lab factory uses IPMI to exert control over a set of physical lab systems.
When a worker is required, a lab system is booted from a ramdisk and the agent
is started, just as it would be for an AWS instance. From that point on,
operation is quite similar to AWS instances: the agent communicates directly
with the core API. When tearing down a lab worker, the machine is rebooted
(again via IPMI) to clear out the prior ramdisk state. Each target provided by
a lab factory can boot from a different ramdisk image stored on a local server.

### GitHub Integration (formerly known as Wollongong)

The GitHub-specific portion of the buildomat suite sits in front of the core
buildomat service. It is responsible for receiving and processing
notifications of new commits and pull requests on GitHub, starting any
configured build jobs, and reporting the results so that they are visible
through the GitHub user interface.

#### Server (`buildomat-github-server`, in `github/server/`)

This server acts as a [GitHub App](https://docs.github.com/en/developers/apps).
It is responsible for processing incoming [GitHub
webhooks](https://docs.github.com/en/developers/webhooks-and-events/webhooks/about-webhooks)
that notify the system about commits and pull requests in authorised
repositories. In addition to relaying jobs between GitHub and the buildomat
core, this service provides an additional HTML presentation of job state (e.g.,
detailed logs) and access to any artefacts that jobs produce. This server
keeps state required to manage the interaction with GitHub, but does not store
job data; requests for logs or artefacts are proxied back to the core server.

#### Database Tool (`buildomat-github-dbtool`, in `github/dbtool/`)

This tool can be used to inspect the database state kept by the GitHub
integration as it tracks GitHub pull requests and commits. Unlike the core
client tool, this program directly interacts with a local SQLite database.

```
$ buildomat-github-dbtool
Usage: buildomat-github-dbtool COMMAND [ARGS...]

Commands:
delivery (del) webhook deliveries
repository (repo) GitHub repositories
check GitHub checks

Options:
--help usage information

ERROR: choose a command
```

Of particular note, the tool is useful for inspecting and replaying received
webhook events; e.g.,

```
$ buildomat-github-dbtool del ls
SEQ ACK RECVTIME EVENT ACTION
0 1 2021-10-05T01:58:32Z ping -
1 1 2021-10-05T02:25:33Z installation created
2 1 2021-10-05T02:26:53Z push
3 1 2021-10-05T02:26:53Z check_suite requested
4 1 2021-10-05T02:26:56Z check_suite completed
5 1 2021-10-05T02:26:56Z check_run completed
6 1 2021-10-05T02:26:56Z check_run created
7 1 2021-10-05T02:26:56Z check_run created
8 1 2021-10-05T02:26:57Z check_run created
...
```

The `buildomat-github-dbtool del unack SEQ` command can be used to trigger the
reprocessing of an invididual webhook message.

## Per-repository Configuration

Buildomat works as a [GitHub App](https://docs.github.com/en/developers/apps),
which is generally "installed" at the level of an
[Organisation](https://docs.github.com/en/organizations). Installing the App
allows buildomat to receive notifications about events, such as git pushes and
pull requests, from all repositories (public and private) within the
organisation. In order to avoid accidents, buildomat requires that the service
be explicitly configured for a repository before it will take any actions.

Per-repository configuration is achieved by creating a file **in the default
branch** (e.g., `main`) of the repository in question, named
`.github/buildomat/config.toml`. This file is written in
[TOML](https://toml.io/), with a handful of simple values. Supported
properties in this file include:

- `enable` **(boolean)**

Must be present and have the value `true` in order for buildomat to consider
the repository for jobs; e.g.,

```toml
enable = true
```

- `org_only` **(boolean, defaults to `true` if missing)**

If set to `true`, or missing from the file, buildomat will not automatically
run jobs in response to pull requests opened by users that are not a member
of the GitHub Organisation which owns the repository. If set to `false`, any
GitHub user can cause a job to be executed.

This property is important for security if your repository is able to create
any jobs that have access to secrets, or to restricted networks.

- `allow_users` **(array of strings, each a GitHub login name)**

If specified, jobs will be started automatically for users in this list,
regardless of whether they are a member of the Organisation that owns the
repository or not, and regardless of the value of the `org_only` property.

This is often useful for pre-authorising jobs driven by Pull Requests
made by various automated systems; e.g.,

```toml
allow_users = [
"dependabot[bot]",
"renovate[bot]",
]
```

Note that buildomat will only ever read this configuration file from the most
recent commit in the default branch of the repository, not from the contents of
another branch or pull request. This is of particular importance for
security-sensitive properties like `org_only`, where the policy set by users
with full write access to the repository must not be overridden by changes from
potentially untrusted users. If a pull request with a malicious policy change
is merged, it will then be in the default branch and active for subsequent pull
requests; maintainers must carefully review pull requests that change this
file.

## Specifying Jobs

Once you have configured buildomat at the repository level, you can specify
some number of jobs to execute automatically in response to pushes and pull
requests. While per-repository configuration is read from the default branch,
jobs are read from the commit under test.

Jobs are specified as `bash` programs with some configuration directives
embedded in comments. These job files must be named
`.github/buildomat/jobs/*.sh`. Unexpected additional files in
`.github/buildomat/jobs` will result in an error.

Job files should begin with an interpreter line, followed by TOML-formatted
configuration prefixed with `#:` so that they will be identified as
configuration by buildomat, but ignored by the shell. For example, a minimal
job that would just execute `uname -a`:

```bash
#!/bin/bash
#:
#: name = "build"
#: variety = "basic"
#:
uname -a
```

The minimum set of properties that must always appear in the TOML frontmatter
is:

- `name` **(string)**

Must be present in all jobs. This name is used for at least two things: as
the name of the Check Run in the GitHub user interface, and when specifying
that some other job depends on this job. The job name must be unique amongst
all jobs within the commit under test.

In general, it is probably best to keep these short, lower-case, and without
spaces. It is conventional to use the same name for the job file and the
job, e.g., `name = "build"` in file `.github/buildomat/jobs/build.sh`.

- `variety` **(string)**

To allow the system to evolve over time, a job must specify a _variety_,
which defines some things about the way a job executes and what additional
configuration options are required or available.

These properties are optional, but not variety-specific:

- `enable` **(boolean)**

To exclude a particular job file from processing, set this to `false`. If
not specified, this property defaults to `true`. This allows a job to be
temporarily disabled without needing to be removed from the repository.

The rest of the configuration is variety-specific.

### Variety: Basic

Each **basic** variety job (selected by specifying `variety = "basic"` in the
frontmatter) takes a single `bash` program and runs it in an ephemeral
environment. The composition of that environment, such as compute and memory
capacity or the availability of specific toolchains and other software, depends
on the `target` option.

Basic variety jobs can produce output files (see the configuration options
`output_rules` and `publish`). They can also depend on the successful
completion of other jobs, gaining access to any output files from the upstream
job (see the `dependencies` option). Jobs are generally executed in parallel,
unless they are waiting for a dependency or for capacity to become available.

#### Execution Environment

By default, an ephemeral system (generally a virtual machine) will be
provisioned for each job. The system will be discarded at the end of the job,
so no detritus is left behind. Once the environment is provisioned, the `bash`
program in the job file is executed as-is.

Jobs are executed as an unprivileged user, `build`, with home directory
`/home/build`. If required, this user is able to escalate to `root` privileges
through the use of [pfexec(1)](https://illumos.org/man/1/pfexec). Systems that
do not have a native `pfexec` will be furnished with a compatible wrapper
around a native escalation facility, to ease the construction of cross-platform
jobs.

By default, the working directory for the job is based on the name of the
repository; e.g., for https://github.com/oxidecomputer/buildomat, the working
directory would be `/work/oxidecomputer/buildomat`. The system will arrange
for the repository to be cloned at that location with the commit under test
checked out. A simple job could directly invoke some build tool like `gmake`
or `cargo build`, and the build would occur at the root of the clone. The
`skip_clone` configuration option can disable this behaviour.

Most targets provide toolchains from common metapackages like
`build-essential`; e.g., `gmake` and `gcc`. If a Rust toolchain is required,
one can be requested through the `rust_toolchain` configuration option. This
will be installed using [rustup](https://rustup.rs/).

##### Environment Variables

While the complete set of environment variables is generally target-specific,
the common minimum for all targets includes:

- `BUILDOMAT_JOB_ID` will be set to the unique ID of this job
- `CI` will be set to `true`
- `GITHUB_REPOSITORY` set to `owner/repository`; e.g., `oxidecomputer/buildomat`
- `GITHUB_SHA` set to the commit ID of the commit under test
- If the commit under test is part of a branch, then `GITHUB_BRANCH` will be
set to the branch name (e.g., `main`) and `GITHUB_REF` will be set to the ref
name; e.g., `res/heads/main`.
- `HOME`, set to the home directory of the build user
- `USER` and `LOGNAME`, set to the username of the build user
- `PATH` set to include relevant directories for toolchains and other
useful software
- `TZ` will be set to `UTC`
- `LANG` and `LC_ALL` will be set to `en_US.UTF-8`

##### Available Commands

Cross-platform shell programming can be challenging due to differences between
different operating systems. To make this a little easier, we ensure that each
buildomat target can provide a basic suite of tools that are helpful in
constructing succint jobs:

- [pfexec(1)](https://illumos.org/man/1/pfexec) allows escalation from the
unprivileged build user to `root`; e.g., `pfexec id -a`.
- [ptime(1)](https://illumos.org/man/1/ptime) runs a program and provides
(with `-m`, detailed) timing information; e.g., `ptime -m cargo test`.
- [banner(1)](https://illumos.org/man/1/banner) prints its arguments in
large letters on standard output, and is useful for producing headings
in job log output; e.g., `banner testing`.

#### Configuration

Configuration properties supported for basic jobs include:

- `access_repos` **(array of strings)**

Jobs can be created in both public and private repositories. Public
repositories are available to everybody, but private repositories require
appropriate credentials. By default, an ephemeral, read-only token is
injected into the execution environment (in the `$HOME/.netrc` file) that is
able to access only the repository directly under test.

If a job requires access to additional private repositories beyond the
direct repository, they may be specified in this list, in the form
`owner/repository`; e.g.,

```bash
#: access_repos = [
#: "oxidecomputer/clandestine-project",
#: "oxidecomputer/secret-plans",
#: ]
```

Note that this option only works for repositories within the same
organisation as the direct repository. Using the option will trigger a
requirement for job-level authorisation by a member of the organisation.

- `dependencies` **(table)**

A job may depend on the successful completion of one or more other jobs from
the same commit under test. If the dependency is cancelled or fails to
complete successfully for some other reason, that failure will be propagated
forward as a failure of this job.

Each entry in the dependencies table is itself a table with a name for the
dependency, and the following per-dependency properties:

* `job` **(string)**

Specifies the job that this job should wait on for execution. The `job`
value must exactly match the `name` property of some other `basic` variety
job available in the same commit.

Any artefacts output by the job named in the dependency will be made
available automatically under `/input/$dependency` using the dependency
name. For example, consider this dependency directive:

```bash
#: [dependencies.otherjob]
#: job = "the-other-job!"
```

If the job with the name `the-other-job!` produces an output file,
`/tmp/output.zip`, then it will be made available within this job as the file
`/input/otherjob/tmp/output.zip`.

Using this facility, one can easily split a job into a "build" phase that
runs under a target with access to toolchains, and one or more "test" phases
that can take the build output and run it in under another target that might
not have a toolchain or may have access to other resources that have limited
availability like test hardware.

Jobs can also depend on more than one other job, allowing a job to aggregate
artefacts from several other jobs together in one place. This might be
useful when building binaries for more than one different OS, with a final
step that publishes multi-OS packages if all the other builds were
successful.

Cycles in the dependency graph are not allowed.

- `output_rules` **(array of strings)**

Jobs may produce artefacts that we wish to survive beyond the lifetime of the
ephemeral build environment. A job may specify one or more files for
preservation by the system; e.g., a build job may produce binaries or
packages that can then be downloaded and installed, or a test job may produce
JUnit XML files or other diagnostic logs that can be inspected by engineers.

The `output_rules` property is a list of `/`-anchored glob patterns that
match files in the ephemeral machine; e.g., `/tmp/*.txt` would match
`/tmp/something.txt` but not `/tmp/nothing.png`. Like the shell, a single
asterisk (`*`) will not descend into a hierarchy of directories. If you want
to match recursively, a double asterisk (`**`) pattern will match the current
directory or any directory under that directory, but not files. You can
combine these to get a recursive match; e.g., `/tmp/**/*.txt` would match
`/tmp/a.txt`, `/tmp/dir/a.txt`, and `/tmp/dir/dir/a.txt`.

By default, it is not an error to specify a pattern that does not match any
files. Provided the job is not cancelled, matching files are uploaded
whether the job program exits with a zero status (denoting success) or a
non-zero status (denoting failure). These behaviours can be used to upload
diagnostic logs left behind by unexpected test failures that are cleaned up
on success; e.g.,

```bash
#: output_rules = [
#: "/tmp/test_output/*",
#: ]
```

If the success of a job _requires_ that a particular artefact is produced,
the `=` prefix may be used to signify "this rule must match at least one
file". If the rule does not match at least one output file, the job is
marked as failed even if the job program otherwise succeeded. This can be
used to make sure that, say, a release binary build job produces an archive
with the expected name; e.g.,

```bash
#: output_rules = [
#: "=/work/pkg/important.tar.gz",
#: "=/work/pkg/important.sha256.txt",
#: ]
```

By default, the system attempts to ensure that a job has not accidentally
left background processes running that continue to modify the output
artefacts. If the size or modified time of a file changes while it is being
uploaded, the job will fail. To relax this restriction, the `%` prefix may
be used to signify that "this file is allowed to change while it is being
uploaded". The `%` prefix will also ignore a file that is completely removed
by a background process before it is able to be uploaded. This is used to
make best effort uploads of diagnostic log files for background processes
which may continue running even though the job is nominally complete; e.g.,

```bash
#: output_rules = [
#: "%/var/svc/log/*.log",
#: ]
```

To exclude specific files from upload, the `!` prefix can be used to signify
that "any file that matches this pattern should be ignored, even if it was
nominally included by another pattern". Order in the array is not important;
a match of any exclusion rule will prevent that file from behing uploaded.
For example, to upload anything left in `/tmp` except for pictures:

```bash
#: output_rules = [
#: "/tmp/*",
#: "!/tmp/*.jpg",
#: ]
```

The must-match (`=`) and allow-change (`%`) prefixes may be combined in a
single output rule. The exclusion prefix (`!`) may not be combined with any
other prefix. For example, to require at least one log file (which may still
be growing) that is not `big-and-useless.log`:

```bash
#: output_rules = [
#: "=%/tmp/*.log",
#: "!/tmp/big-and-useless.log",
#: ]
```

- `publish` **(array of tables)**

Some jobs may wish to publish a specific subset of their output artefacts at
a predictable URL based on the commit ID of the commit under test, for
reference by other jobs from other repositories, or end user tools.

Each table in the `publish` array of tables must contain these properties:

* `from_output` **(string)**

Specify the full path of the output artefact to be published without using
any wildcard syntax. The output rule that provides this artefact should be
specified using a must-match (`=`) prefix so that the job fails if it is
not produced. Each publish entry can specify exactly one output artefact.

* `series` **(string)**

Specify a series name to group a set of uploads together. This is useful
to group related files together in the URL space, even if they are produced
by several different jobs. This value should be short and URL-safe.

* `name` **(string)**

Specify the publically visible name of this file, which must be unique
within the series for this commit for this repository. This value should
be short and URL-safe.

Each file published this way will be available at a predictable URL of the
form:

```
https://buildomat.eng.oxide.computer/public/file/OWNER/REPO/SERIES/VERSION/NAME
```

The `VERSION` value is the commit ID (full SHA) of the commit under test, and
the `SERIES` and `NAME` come from the `publish` entry.

For example, if commit `e65aace9237833ec775253cfde97f59a0af5bc3d` from
repository `oxidecomputer/software` included this publish directive:

```bash
#: [[publish]]
#: from_output = "/work/important-packaged-files.tar.gz"
#: series = "packages"
#: name = "files.tar.gz"
```

A published file would be available at the URL:

```
https://buildomat.eng.oxide.computer/public/file/oxidecomputer/software/packages/e65aace9237833ec775253cfde97f59a0af5bc3d/files.tar.gz
```

Note that files published this way from private repositories will be
available without authentication.

- `rust_toolchain` **(string or boolean)**

If specified, `rustup` will be installed in the environment and the nominated
toolchain will be available as the default toolchain. Any toolchain
specification that `rustup` accepts should work here; e.g., something general
like `stable` or `nightly`, or a specific nightly date, like
`nightly-2022-04-27`.

```bash
#: rust_toolchain = "stable"
```

It is also possible to use the boolean value `true` here, at which point
the system will interpret the contents of the `rust-toolchain.toml` file
in the root of the repository to decide what to install. The file must
contain a valid `channel` value, and may also contain a valid `profile`
value. Neither the legacy (pre-TOML) file format, nor TOML files which
contain the `path` directive, are supported.

```bash
#: rust_toolchain = true
```

- `skip_clone` **(boolean)**

By default, a basic job will clone the repository and check out the commit
under test. The working directory for the job will be named for
the GitHub repository; e.g., for https://github.com/oxidecomputer/buildomat,
the directory would be `/work/oxidecomputer/buildomat`.

If this option is specifed with the value `true`, no clone will be performed.
The working directory for the job will be `/work` without subdirectories.
This is useful in targets that do not provide toolchains or `git`, or where
no source files from the repository (beyond the job program itself) are
required for correct execution.

```bash
#: skip_clone = true
```

- `target` **(string)**

The target for a job, which specifies the composition of the execution
environment, can be specified by name. Some targets (e.g., `lab`) are
privileged, and not available to all repositories.

The list of unrestricted targets available for all jobs includes:

- `helios-latest`; an illumos execution environment (Oxide Helios
distribution) running in an ephemeral virtual machine, with a reasonable
set of build tools. 32GB of RAM and 200GB of disk should be available.
- `omnios-r151038`; an illumos execution environment (OmniOS r151038 LTS)
running in an ephemeral virtual machine, with a reasonable set of build
tools. 32GB of RAM and 200GB of disk should be available.
- `ubuntu-18.04`, `ubuntu-20.04`, and `ubuntu-22.04`; an Ubuntu execution
environment running in an ephemeral virtual machine, with a reasonable set
of build tools. 32GB of RAM and 200GB of disk should be available.

## Licence

Unless otherwise noted, all components are licenced under the [Mozilla Public
License Version 2.0](./LICENSE).