https://github.com/not-that-guy-again/bioaf

A self-hosted platform for orchestrating bioinformatics pipelines, managing experimental metadata, and running reproducible compute workloads.
https://github.com/not-that-guy-again/bioaf

bioinformatics computational-biology data-provenance gcp genomics kubernetes lims ngs pipeline-management platform-engineering reproducible-research research-data-management scientific-workflows terraform workflow-orchestration

Last synced: about 1 month ago
JSON representation

A self-hosted platform for orchestrating bioinformatics pipelines, managing experimental metadata, and running reproducible compute workloads.

Host: GitHub
URL: https://github.com/not-that-guy-again/bioaf
Owner: not-that-guy-again
License: apache-2.0
Created: 2026-03-05T23:55:09.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-05-02T23:25:47.000Z (about 1 month ago)
Last Synced: 2026-05-03T00:37:52.370Z (about 1 month ago)
Topics: bioinformatics, computational-biology, data-provenance, gcp, genomics, kubernetes, lims, ngs, pipeline-management, platform-engineering, reproducible-research, research-data-management, scientific-workflows, terraform, workflow-orchestration
Language: Python
Homepage:
Size: 6.81 MB
Stars: 4
Watchers: 2
Forks: 1
Open Issues: 4
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

README

          


  



bioAF

Computational Biology Automation Framework


A turnkey computational biology platform for small biotech companies (5-50 researchers), deployed on Google Cloud Platform. bioAF provides a web-based control plane for managing HPC clusters, notebook environments, pipeline engines, and data visualization tools -- all provisioned through UI-driven Terraform.

## Features

- **Experiment Tracking** - MINSEQE-compliant metadata, sample management, batch processing, project organization

- **Compute Orchestration** - Kubernetes (GKE) compute via the BioAF Adapter Layer, JupyterHub/RStudio notebooks, versioned compute environments, auto-scaling, Cloud Build image pipeline

- **Pipeline Engine** - Nextflow integration, custom pipelines, pipeline catalog, run monitoring, parameter management

- **Data Management** - File upload/download, dataset browser, GCS storage integration, GEO export, SuperSeries cross-experiment packaging

- **Results & Visualization** - QC dashboards, cellxgene single-cell viewer, plot archive, search

- **SSH Access** - One-click kubectl exec into running pipeline jobs and notebook sessions

- **Notifications** - Event-driven alerts via in-app, email (SMTP), and Slack (OAuth integration)

- **Cost Center** - GCP billing integration, budget alerts, component cost breakdown, projections

- **Backup & Recovery** - 4-tier GCS backups (pg_dump, GCS versioning, platform config, terraform state), restore with review period

- **Session Credentials** - Per-user RStudio credentials with PAM authentication, auto-generated usernames

- **Role-Based Access** - Permission-based RBAC with four built-in roles, custom role creation, and per-resource/action grants

- **Upgrade System** - GitHub-based version checking, managed upgrade flow with rollback

- **Audit Log** - Immutable audit trail with filtering, pagination, and human-readable descriptions

- **GitOps** - Version-controlled platform configuration with diff and rollback

## Architecture



  



### How it works

A computational biologist registers an experiment, links FASTQ files (uploaded or auto-ingested from a sequencer drop), selects a pipeline from the catalog (nf-core/scrnaseq, rnaseq, or custom), and launches a run. The **BioAF Adapter Layer** handles everything below that: staging inputs from GCS, submitting Kubernetes Jobs to GKE Autopilot, monitoring execution via Nextflow trace parsing, collecting outputs back to GCS, and transitioning the experiment through its status lifecycle (`registered` -> `library_prep` -> `sequencing` -> `fastq_uploaded` -> `processing` -> `pipeline_complete` -> [`reviewed` ->] `analysis` -> `complete`). Pipeline completion triggers event-driven notifications (in-app, email, Slack), and results are browsable through the plot archive, cellxgene viewer, and GEO export tools. Jupyter and RStudio sessions run as Kubernetes Pods with GCS-backed home directories and SSH access. RStudio sessions use per-user PAM authentication ([ADR-030](decisions/ADR-030-session-credentials-pam-auth.md)), and notebook container images are managed as versioned environments ([ADR-033](decisions/ADR-033-versioned-compute-environments.md)), built automatically via Cloud Build ([ADR-031](decisions/ADR-031-notebook-image-build-pipeline.md)).

The adapter layer ([ADR-020](decisions/ADR-020-bioaf-adapter-layer.md)) abstracts compute, storage, and notebook providers behind clean interfaces, so all application logic is decoupled from infrastructure specifics. Today that means GKE + GCS ([ADR-021](decisions/ADR-021-kubernetes-compute-backend.md), [ADR-022](decisions/ADR-022-gcs-storage-backend.md)).

Infrastructure is provisioned through UI-driven Terraform ([ADR-007](decisions/ADR-007-ui-driven-terraform.md)) -- researchers never touch HCL. All secrets live in Secret Manager ([ADR-008](decisions/ADR-008-secret-manager.md)), all actions are recorded in an immutable audit log ([ADR-009](decisions/ADR-009-immutable-audit-log.md)), and data portability is guaranteed ([ADR-012](decisions/ADR-012-data-portability.md)).

See all architecture decision records in [decisions/README.md](decisions/README.md).

## Quick Start

### Prerequisites

- Docker and Docker Compose

- Git

- openssl (for secret generation)

### Deploy on GCP (one command)

Run this on your local machine to provision a GCP VM and get started:

```bash

curl -fsSL https://raw.githubusercontent.com/not-that-guy-again/bioAF/main/install-gcp.sh | bash

```

The script sets up gcloud, creates a VM with Docker, and walks you through

the process. Once the VM is ready, SSH in and run:

```bash

git clone https://github.com/not-that-guy-again/bioAF.git

cd bioAF

./bioaf setup

```

### Deploy on an existing server

If you already have a Linux server with Docker installed:

```bash

git clone https://github.com/not-that-guy-again/bioAF.git

cd bioAF

./bioaf setup

```

The `setup` command handles everything: checks prerequisites, generates

secrets and TLS certs, pulls pre-built images, runs migrations, and prints

a one-time setup code. Open the URL it shows in your browser and enter the

code to create your admin account and configure the platform.

### Management Commands

| Command | Description |

| ------- | ----------- |

| `./bioaf setup` | First-run setup (pulls images, generates secrets, prints setup code) |

| `./bioaf start` | Start all services in dependency order |

| `./bioaf stop` | Stop all services |

| `./bioaf restart` | Restart all services |

| `./bioaf status` | Show service status |

| `./bioaf logs [service]` | Tail logs (all or one service) |

| `./bioaf build [service]` | Build container images locally (development only) |

| `./bioaf migrate` | Run database migrations |

| `./bioaf migrate-down ` | Downgrade database to a specific revision |

| `./bioaf seed ` | Run a seed/data script in the backend container |

| `./bioaf backup` | Create a database backup |

| `./bioaf update [version]` | Update to latest (or specific) version |

| `./bioaf reset-db` | Destroy and recreate the database (with confirmation) |

| `./bioaf shell [service]` | Open a shell in a container (default: backend) |

| `./bioaf dbshell` | Open a psql session to the database |

| `./bioaf register-outputs` | Register pipeline output files from GCS |

| `./bioaf help` | Show all commands |

See the full [Deployment Guide](docs/deployment-guide.md) for detailed instructions.

## Documentation

- [Quickstart](docs/README.md) - Documentation hub

- [Deployment Guide](docs/deployment-guide.md) - Full deployment walkthrough

- [Bench Scientist Guide](docs/user-guide-bench.md) - Experiments, samples, results

- [Computational Biologist Guide](docs/user-guide-compbio.md) - Pipelines, notebooks, environments

- [Admin Guide](docs/user-guide-admin.md) - User management, costs, backups, notifications

- [Life After bioAF](docs/life-after-bioaf.md) - Data portability after teardown

- [ADR Index](decisions/README.md) - Architecture Decision Records

- [SSH Access Guide](docs/guides/ssh-access.md) - Connecting to running workloads

- [GEO Export Guide](docs/guides/geo-export.md) - Exporting to NCBI GEO

- [Reference Data Guide](docs/guides/reference-data.md) - Managing reference genomes and annotations

- [Compute Stack Setup](docs/guides/compute-stack-setup.md) - Kubernetes configuration

## Development Setup

### Using Docker Compose (recommended)

```bash

# Start backend, frontend, and PostgreSQL

docker compose -f docker/docker-compose.dev.yml up

# Backend:  http://localhost:8000

# Frontend: http://localhost:3000

# Postgres: localhost:5432

```

### Manual Setup

```bash

# Backend

cd backend

python -m venv .venv && source .venv/bin/activate

pip install -r requirements.txt -r requirements-dev.txt

uvicorn app.main:app --reload

# Frontend

cd frontend

npm install

npm run dev

# Database (requires PostgreSQL 16)

cd backend

alembic upgrade head

```

### Running Tests

```bash

# Backend tests (requires PostgreSQL)

docker compose -f docker/docker-compose.dev.yml up -d db

cd backend && python -m pytest tests/ -v

# Frontend tests

cd frontend && npm test

```

## Component Catalog

bioAF manages these infrastructure components through its UI:

| Component | Category | Compute Stack | Dependencies |

| --------- | -------- | ------------- | ----------- |

| GKE Cluster | Compute | Kubernetes | None |

| GCS Buckets | Storage | Kubernetes | GKE |

| JupyterHub | Notebooks | Kubernetes | Compute, Storage |

| RStudio Server | Notebooks | Kubernetes | Compute, Storage |

| Nextflow | Pipelines | Kubernetes | Compute |

| cellxgene | Visualization | Any | None |

| QC Dashboard | Visualization | Any | None |

## Project Structure

```text

bioAF/

  backend/           FastAPI application

  frontend/          Next.js 14 application

  docker/            Dockerfiles, compose, and nginx config

  terraform/         GCP infrastructure as code

  helm/              Kubernetes deployment chart

  decisions/         Architecture Decision Records

  documentation/     Product and architecture specs

  docs/              User-facing documentation

  scripts/           Utility scripts (seed data, update agent)

  tests/shell/       BATS tests for install.sh and bioaf scripts

  bioaf              Management script (entry point)

  install.sh         First-time installer (prereq checks + env generation)

  install-gcp.sh     One-command GCP provisioning script

```

## Contributing

See the ADRs in [decisions/](decisions/) for architectural context before making changes. All infrastructure changes must go through the UI-driven Terraform workflow (ADR-007). The audit log is immutable by design (ADR-009).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/not-that-guy-again/bioaf

Awesome Lists containing this project

README

bioAF