An open API service indexing awesome lists of open source software.

https://github.com/4gt-104/matyan-backend

REST API server for Matyan experiment tracking — reads and writes experiment data, metrics, and artifacts to FoundationDB; handles control operations.
https://github.com/4gt-104/matyan-backend

aim apache-2 asyncio experiment-tracking fastapi foundationdb kafka machine-learning matyan mlops object-storage python rest-api

Last synced: about 2 months ago
JSON representation

REST API server for Matyan experiment tracking — reads and writes experiment data, metrics, and artifacts to FoundationDB; handles control operations.

Awesome Lists containing this project

README

          

# Matyan Backend

REST API and workers for the Matyan experiment-tracking stack (fork of Aim). Serves reads and control operations from **FoundationDB**; consumes ingestion and control events from **Kafka**; uses **S3/GCS/Azure** for artifact blobs. The UI talks to this API; training clients send data via the frontier, which publishes to Kafka consumed by these workers.

## Layout

- **`src/matyan_backend/`** — Python package: FastAPI app (`app.py`), API routes under `api/` (runs, experiments, tags, projects, dashboards, reports, streaming), `storage/` (FDB + S3/GCS/Azure), `workers/` (ingestion + control Kafka consumers), `jobs/` (FDB lock, used by CLI cleanup commands), `backup/` (export/restore), CLI in `cli.py`.
- **Entrypoints**: `matyan-backend start` (API server, default port 53800), `matyan-backend ingest-worker`, `matyan-backend control-worker`; plus one-off CLI commands (reindex, backup, restore, finish-stale, cleanup-orphan-blobs, cleanup-tombstones, convert tensorboard).

## Prerequisites

- Python 3.12+. The package uses `uv` in the repo: `uv run matyan-backend` or install then `matyan-backend` CLI.
- **Runtime dependencies**: FoundationDB (cluster file), Kafka (for workers), blob store. For local dev, typically run FDB + Kafka + S3 (RustFS) via docker-compose.

## Run

- **API server**: `uv run matyan-backend start` (or `matyan-backend start`). Options: `--host`, `--port` (defaults: `0.0.0.0`, 53800). API is under `/api/v1`; health at `/health/ready/`, `/health/live/`, metrics at `/metrics/` when enabled.
- **Workers**: `uv run matyan-backend ingest-worker` and `uv run matyan-backend control-worker`. Both require Kafka and FDB; ingestion worker also writes to FDB and reads blob storage config for blob references.
- **CLI (one-off)**: `reindex` (rebuild indexes), `backup` / `restore`, `finish-stale`, `cleanup-orphan-blobs`, `cleanup-tombstones`. See the backend CLI help (`matyan-backend cleanup-orphan-blobs --help`, `matyan-backend cleanup-tombstones --help`) and [References — CLI](../../docs/refs/cli.md) for all options. Cleanup commands are intended for CronJobs or cron; use `--dry-run` to preview and `--lock-ttl-seconds` for FDB-based single-run locking. Optional: `convert tensorboard` to convert TensorBoard logs to backup format.

## Configuration (environment variables)

| Variable | Default | Purpose |
|----------|---------|---------|
| `MATYAN_ENVIRONMENT` / `ENVIRONMENT` | `development` | When `production`, strict checks apply (see Production configuration). |
| `LOG_LEVEL` | `INFO` | Log level (loguru + uvicorn). |
| `FDB_CLUSTER_FILE` | `fdb.cluster` | Path to FoundationDB cluster file. |
| `BLOB_BACKEND_TYPE` | `s3` | Storage backend: `s3`, `gcs`, or `azure`. |
| `S3_ENDPOINT` | `http://localhost:9000` | S3-compatible API URL. |
| `S3_ACCESS_KEY` / `S3_SECRET_KEY` | (dev defaults) | S3 credentials. |
| `S3_BUCKET` | `matyan-artifacts` | Bucket for artifacts (when using `s3`). |
| `S3_REGION` | `us-east-1` | S3 region (default: `us-east-1`). |
| `GCS_BUCKET` | `matyan-artifacts` | Bucket for artifacts (when using `gcs`). |
| `AZURE_CONTAINER` | `matyan-artifacts` | Container for artifacts (when using `azure`). |
| `AZURE_CONN_STR` | `""` | Azure connection string. |
| `AZURE_ACCOUNT_URL` | `""` | Azure account URL (for `DefaultAzureCredential`). |
| `BLOB_URI_SECRET` | (dev default) | Fernet key for blob URIs; must be set in production. |
| `KAFKA_BOOTSTRAP_SERVERS` | `localhost:9092` | Kafka broker list. |
| `KAFKA_DATA_INGESTION_TOPIC` | `data-ingestion` | Topic for ingestion messages. |
| `KAFKA_CONTROL_EVENTS_TOPIC` | `control-events` | Topic for control events. |
| `KAFKA_SECURITY_PROTOCOL` / `KAFKA_SASL_*` | (empty) | Optional Kafka SASL. |
| `METRICS_ENABLED` | `true` | Expose Prometheus metrics. |
| `METRICS_PORT` | `9090` | Port for metrics HTTP server (workers). |
| `INGEST_MAX_MESSAGES_PER_TXN` | `100` | Max messages per FDB transaction (ingestion worker). |
| `INGEST_MAX_TXN_BYTES` | `8388608` (8 MB) | Target max transaction size; FDB limit is 10 MB. |
| `CORS_ORIGINS` | (localhost list) | Allowed origins for CORS. |

Source of truth: [config.py](src/matyan_backend/config.py).

## Production configuration

See **[docs/PRODUCTION_CONFIG.md](docs/PRODUCTION_CONFIG.md)** for enabling production mode (`MATYAN_ENVIRONMENT=production`), required overrides, and supplying secrets via env or a secrets backend.

## Deployment

- **Docker**: Build the backend image (context from repo root); run API and workers as separate processes or containers.
- **Kubernetes/Helm**: The chart in `deploy/helm/matyan` deploys the backend API, ingestion worker, and control worker as separate Deployments; optional CronJobs for `cleanup-orphan-blobs` and `cleanup-tombstones`. Configure FDB, blob storage (S3, GCS, Azure), and Kafka via chart values; see the chart README. Set `MATYAN_ENVIRONMENT=production` and required env for production.

## Related

- **UI**: matyan-ui calls this backend REST API.
- **Frontier**: matyan-frontier publishes to Kafka; backend workers consume.
- **API models**: matyan-api-models shared types (Kafka messages, run creation, etc.).
- **Monorepo**: This package lives under `extra/matyan-backend` in the matyan-core repo.