{"id":50298004,"url":"https://github.com/tamohannes/clausius","last_synced_at":"2026-05-28T10:30:23.496Z","repository":{"id":344304570,"uuid":"1180391027","full_name":"tamohannes/clausius","owner":"tamohannes","description":"Multi-cluster Slurm dashboard with AI agent integration via MCP","archived":false,"fork":false,"pushed_at":"2026-05-14T22:40:16.000Z","size":7688,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-15T00:38:02.942Z","etag":null,"topics":["cluster-management","dashboard","flask","gpu","hpc","mcp","slurm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tamohannes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-13T01:55:22.000Z","updated_at":"2026-05-14T22:40:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/tamohannes/clausius","commit_stats":null,"previous_names":["tamohannes/job-monitor","tamohannes/ncluster","tamohannes/clausius"],"tags_count":6,"template":false,"template_full_name":null,"purl":"pkg:github/tamohannes/clausius","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamohannes%2Fclausius","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamohannes%2Fclausius/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamohannes%2Fclausius/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamohannes%2Fclausius/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tamohannes","download_url":"https://codeload.github.com/tamohannes/clausius/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tamohannes%2Fclausius/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33605377,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-28T02:00:06.440Z","response_time":99,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cluster-management","dashboard","flask","gpu","hpc","mcp","slurm"],"created_at":"2026-05-28T10:30:22.886Z","updated_at":"2026-05-28T10:30:23.486Z","avatar_url":"https://github.com/tamohannes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/icon.png\" width=\"80\" alt=\"clausius logo\"\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eclausius\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cem\u003eResearch clusters are chaotic. We are here to reverse the entropy.\u003c/em\u003e\u003cbr\u003e\u003cbr\u003e\n  Multi-cluster Slurm dashboard with AI agent integration via MCP.\u003cbr\u003e\n  Monitor, explore, and manage GPU jobs across HPC clusters from a single browser tab —\u003cbr\u003e\n  or let your AI coding agent do it through the built-in MCP server.\n\u003c/p\u003e\n\n## Quick Start\n\n```bash\ngit clone https://github.com/tamohannes/clausius.git\ncd clausius\npip install flask paramiko\n\n# Initialise the database (creates data/ and the schema)\npython -m server.cli setup --non-interactive\n\n# Add your first cluster\npython -m server.cli add-cluster my-cluster \\\n    --host login-node.example.com \\\n    --gpu-type H100 --gpus-per-node 8 \\\n    --account my_ppp_account \\\n    --mount-path /shared/storage/$USER\n\n# Start the server\npython app.py\n```\n\nOpen [http://localhost:7272](http://localhost:7272)\n\n## Architecture\n\n![clausius architecture](docs/architecture.png)\n\nThree-lane SSH connection pool: **primary** (Slurm control), **background** (metadata), **data** (file I/O routed to data-copier nodes with automatic login-node fallback). AI Hub OpenSearch integration for formal GPU allocations and fairshare data.\n\nAll runtime configuration lives in the SQLite database (`data/history.db`). The only file-based config is a tiny bootstrap TOML for the four values needed before the DB is reachable (data directory, port, SSH defaults). Everything else is managed through three equivalent interfaces:\n\n| Interface | Example |\n|-----------|---------|\n| **Settings UI** | Clusters tab, Profile \u003e PPPs, Advanced |\n| **CLI** | `python -m server.cli add-cluster eos --host ...` |\n| **MCP tools** | `add_cluster_config(name=\"eos\", host=\"...\")` |\n\n## Features\n\n### Live Board\n- Multi-cluster job board grouped by run name (active, idle, unreachable, local)\n- Slurm dependency chain detection with topological sorting\n- Persistent run grouping — completed jobs retain their dependency structure\n- Live progress tracking, crash detection (OOM, segfault, traceback)\n- Cluster availability tooltip with wait-time estimates, pending reason translation, and team fair-share priority\n- Board-pinned terminal jobs persist until dismissed\n- Background job dimming for long-running server processes (configurable suffixes)\n- Per-GPU utilization and memory charts, CPU utilization, RSS memory tracking\n- Configurable GPU stats snapshot interval\n\n### Log Explorer\n- Mount-first reads with SSH fallback to data-copier nodes\n- Nested directory browsing with lazy-loaded tree\n- Syntax-aware rendering for `.json`, `.jsonl`, `.jsonl-async`, `.md`\n- Full log pagination, JSONL record viewer, clipboard copy\n\n### History\n- SQLite-backed job history with dependency-aware grouping\n- Text search, state filters (completed/failed/cancelled/timeout/running/pending)\n- Paginated view with configurable page size\n\n### Projects\n- Auto-detected projects from job name prefixes\n- Per-project detail pages with live jobs, stats, and search\n- Customizable project colors and emojis\n\n### Logbook\n- Per-project structured entries with BM25 full-text search (FTS5 with porter stemming)\n- Two entry types: **note** (experiments, debugging, findings) and **plan** (implementation/research plans)\n- Full markdown support: tables, code blocks, blockquotes, links\n- `@run-name` references to link to job logs\n- `#id` cross-references between entries (rendered as clickable links with title resolution)\n- Drag/drop and paste image uploads\n- HTML file embeds for interactive figures (plotly, bokeh, matplotlib exports)\n- `@` autocomplete for run names in the editor\n- Entry IDs displayed in sidebar and detail view for agent communication\n\n### Logbook Map\n- Visual map of entry relationships built from `#id` cross-references\n- **Tree view**: hierarchical layout with connector lines, sorted by edit time\n- **Graph view**: static DAG layout with D3.js, curved directed edges, zoom/pan/drag\n- Entry-centric graph: open from any entry's detail page with configurable neighbor depth (1-5 hops or all)\n- Edge direction filter: show outgoing, incoming, or both connections\n- Focus controls shared between tree and graph views\n- Color-coded nodes: neutral for notes, red for plans (matching sidebar)\n\n### Compute (GPU Allocations \u0026 Cluster Intelligence)\n- **GPU Allocations Dashboard**: Per-cluster cards showing formal PPP allocations, consumption, and fairshare from AI Hub OpenSearch\n- **Stacked usage bars**: Side-by-side segments — your running/pending (accent, striped), team running/pending (orange, striped), PPP non-team (gray) — with toggle controls\n- **\"Where to Submit\" strip**: Ranked cluster chips scored by team-aware headroom (considers informal team quota, PPP fairshare, and current usage)\n- **Hover popup**: Per-account breakdown with your usage, team, PPP non-team, other PPPs, cluster total, and team alloc\n- **Click-through modal**: Full per-user GPU breakdown with running/pending/CPU counts, sorted by usage\n- **GPU Usage History**: Chart.js time-series of allocation vs consumption per account with 7d/14d/30d range selector\n- **Pending job tooltips**: Fairshare-based wait estimates using AI Hub `level_fs`, plus cross-cluster recommendations filtered by job size and GPU type\n- **Mounts**: SSHFS mount/unmount/remount per cluster; mount-all in parallel with progress; stale mount detection via `/proc/mounts` (never blocks on dead FUSE)\n- **Storage Quotas**: Lustre filesystem and PPP project quotas\n\n### Settings\n- **Profile**: Avatar, username, team name, PPP quota list\n- **General**: Theme (system/light/dark), auto-refresh toggle, refresh interval\n- **Shortcuts**: Configurable keyboard bindings (toggle sidebar, spotlight, close/next/prev tab, refresh live data)\n- **Clusters**: Add/edit/remove clusters, mount controls with restart button\n- **Projects**: Prefix, color, and emoji customization\n- **Advanced**: SSH timeout, cache freshness, GPU stats interval, database backup interval and retention, history page size, JSONL record limits, background run suffixes, local process include/exclude filters\n\n### UI\n- Multi-tab interface with persistent tab state across sessions\n- Collapsible sidebar with draggable width\n- Spotlight search (Cmd+P): search across projects, logbook entries, and job history\n- Loading toasts with animated progress bar for all async actions\n- Theme-aware color system with CSS custom properties\n- Keyboard shortcuts: Cmd+Shift+R (refresh live), Cmd+S (toggle sidebar), Cmd+P (spotlight), Cmd+W (close tab), Cmd+]/[ (cycle tabs)\n- Charts: per-GPU utilization/memory line charts, CPU utilization, RSS memory (Chart.js)\n- D3.js for interactive logbook graph visualization\n\n### Database Backups\n- Automatic daily backups using SQLite online backup API (safe during writes)\n- Configurable backup interval (default: 24 hours) and retention (default: 7 backups)\n- Stored in `data/backups/history-YYYY-MM-DD.db`\n- Old backups automatically cleaned up\n\n### MCP Server (AI Agent API)\n- Standalone local Streamable HTTP MCP server (recommended for Cursor and other MCP-compatible agents)\n- 49 tools covering every aspect of the dashboard:\n\n| Category | Tools |\n|----------|-------|\n| GPU Allocations | `where_to_submit`, `get_ppp_allocations`, `get_gpu_usage_history` |\n| Jobs | `list_jobs`, `get_job_log`, `get_job_stats`, `list_log_files` |\n| History | `get_history`, `list_projects`, `get_project_jobs` |\n| Actions | `cancel_job`, `cancel_jobs` |\n| Runs | `get_run_info`, `run_script`, `cleanup_history` |\n| Clusters (config) | `list_cluster_configs`, `get_cluster_config`, `add_cluster_config`, `update_cluster_config`, `remove_cluster_config` |\n| Cluster (status) | `get_cluster_status`, `get_team_gpu_status`, `get_cluster_availability`, `get_partitions`, `get_partition_summary`, `recommend_submission`, `get_storage_quota` |\n| Team | `list_team_members`, `add_team_member`, `remove_team_member` |\n| PPP Accounts | `list_ppp_accounts`, `add_ppp_account`, `update_ppp_account`, `remove_ppp_account` |\n| Paths | `list_path_bases`, `add_path_base`, `remove_path_base` |\n| Process Filters | `list_process_filters`, `add_process_filter`, `remove_process_filter` |\n| App Settings | `get_app_setting`, `set_app_setting`, `list_app_settings` |\n| Mounts | `get_mounts`, `mount_cluster`, `clear_failed`, `clear_completed` |\n| Logbook | `list_logbook_entries`, `read_logbook_entry`, `bulk_read_logbooks`, `create_logbook_entry`, `update_logbook_entry`, `delete_logbook_entry`, `search_logbook`, `upload_logbook_image` |\n\n- `where_to_submit(nodes, gpu_type)` — **primary tool** for \"where should I submit this job?\" — ranks clusters by team headroom, fairshare, and GPU type match\n- `run_script()` — execute Python/bash on a cluster and return stdout/stderr\n- Resource: `jobs://summary` — quick text overview of running/pending/failed per cluster\n- **Standalone local service, no HTTP hop back into the UI**: `clausius-mcp.service` runs `mcp_server.py` as its own user service and exposes FastMCP over Streamable HTTP at `http://127.0.0.1:7273/mcp`. Inside that process, MCP still boots the same Flask `app` as gunicorn and dispatches each tool through `app.test_client()`. Both processes share SQLite (WAL) and `server.ssh`; gunicorn crashes don't take MCP down.\n- **Follower poller**: MCP probes the gunicorn `/api/health` endpoint every 10 s and starts the cluster poller in its own process after ~30 s of silence, then steps back as soon as gunicorn answers again. Single-writer work (backups, mount remounts, WDS snapshots, the progress scraper) stays gunicorn-only.\n\n### SDK Experiment Tracking (v3)\n- NeMo-Skills SDK integration: add `CLAUSIUS_URL=http://\u003chost\u003e:7272` to any `ns` command to enable tracking\n- Runs appear on the board in `SUBMITTING` state immediately, before any Slurm job exists\n- Lifecycle: `SUBMITTING` -\u003e `PENDING` (Slurm accepts) -\u003e `RUNNING`/`COMPLETED`/`FAILED`\n- Submit command, git commit, hostname, and working directory captured automatically\n- Aim-style manual tracking: import `Run`, attach to an existing SDK run, store static metadata, and track metric time series\n- The run metrics explorer is inspired by [Aim](https://github.com/aimhubio/aim)'s Metrics Explorer: metric selection, Python-like search, modifiers, and context tables are adapted for Clausius' single-run workflow.\n- Ingest endpoint: `POST /api/sdk/events` with optional bearer-token auth (`sdk_ingest_token` setting)\n- If submission fails, the run is auto-marked `FAILED` with \"submission interrupted\"\n- Run popup shows provenance, SDK metadata, latest metric values, metric charts, and recent metric points\n\nManual metric tracking from code:\n\n```python\nfrom clausius_sdk import Run\n\nrun = Run(\n    run_name=\"hle_debug_manual\",\n    cluster=\"eos\",\n    metadata={\n        \"model\": \"nvidia/Nemotron-Super-49B-v1\",\n        \"benchmark\": \"gpqa_diamond\",\n        \"prompt_config\": \"generic/math\",\n    },\n)\n\nrun.track(\"loss\", 0.42, step=1)\nrun.track(\"accuracy\", 0.84, step=1, context={\"split\": \"eval\"})\nrun.set_metadata({\"tokens_to_generate\": 32768, \"num_chunks\": 16})\nrun.close()\n```\n\nAttach to an existing run created by the NeMo-Skills hooks when you already know its `run_uuid`:\n\n```python\nfrom clausius_sdk import Run\n\nrun = Run(run_uuid=\"abc123...\", connect=True)\nrun.set_metadata({\"judge_model\": \"gpt-oss-120b\"})\nrun.track(\"judge_accuracy\", 0.71, step=10)\nrun.close()\n```\n\nInstruction for agents modifying NeMo-Skills:\n\n```text\nInstrument this NeMo-Skills code with Clausius SDK tracking.\n\nUse `from clausius_sdk import Run`. If an active Clausius SDK\nrun UUID is available, attach with `Run(run_uuid=\u003cuuid\u003e, connect=True)`;\notherwise create a new `Run(run_name=\u003cexpname\u003e, cluster=\u003ccluster\u003e)`.\n\nStore static experiment metadata with `run.set_metadata({...})`: model,\nbenchmark/dataset, split, prompt config/template, server type, GPUs/nodes,\nnum_samples, num_chunks, tokens_to_generate, judge model, sandbox settings,\ngit/config identifiers, and any run flags that should be searchable later.\n\nTrack scalar metrics with `run.track(name, value, step=step, context={...})`.\nUse stable metric names such as `accuracy`, `loss`, `pass_at_1`,\n`num_generated_tokens`, `empty_generation_rate`, `judge_accuracy`, and\n`samples_completed`. Put dimensions like benchmark, split, seed, chunk,\ntask, or judge in `context`, not in the metric name. Do not log secrets,\ntokens, API keys, raw prompts, or huge payloads. Call `run.close()` when done.\n```\n\n### Performance\n- On-demand architecture: clusters are only contacted when a user or agent requests data\n- Three-lane SSH connection pool with data-copier node routing\n- Per-cluster caching with configurable TTL\n- Prefetch warming for running jobs (log index, content, stats)\n- Mount status detection via `/proc/mounts` (no filesystem stat, never blocks on stale FUSE)\n- No background polling — login nodes are not contacted when nobody is looking\n\n## Setup\n\n### Adding a Cluster\n\nThree equivalent ways to register a cluster:\n\n**CLI** (recommended for first setup):\n```bash\npython -m server.cli add-cluster my-cluster \\\n    --host login-node.example.com \\\n    --gpu-type H100 --gpus-per-node 8 \\\n    --account my_ppp_account \\\n    --mount-path /shared/storage/$USER\n```\n\n**MCP tool** (from your AI agent):\n```\nadd_cluster_config(\n    name=\"my-cluster\",\n    host=\"login-node.example.com\",\n    gpu_type=\"H100\",\n    gpus_per_node=8,\n    account=\"my_ppp_account\",\n    mount_paths=[\"/shared/storage/$USER\"],\n)\n```\n\n**Settings UI**: Open Settings \u003e Clusters \u003e Add Cluster, fill in the fields.\n\n### Bootstrap Configuration\n\nThe only file-based config is `conf/clausius.toml` (optional — clausius boots with sensible defaults if this file is missing). Copy the example to get started:\n\n```bash\ncp conf/clausius.toml.example conf/clausius.toml\n```\n\n```toml\n[bootstrap]\ndata_dir = \"./data\"     # SQLite DB, backups, logbook images\nport     = 7272         # UI listen port\n\n[ssh]\nuser = \"$USER\"          # default SSH user for all clusters\nkey  = \"~/.ssh/id_ed25519\"\n```\n\nEvery field can also be set via environment variable (`CLAUSIUS_DATA_DIR`, `CLAUSIUS_PORT`, `CLAUSIUS_SSH_USER`, `CLAUSIUS_SSH_KEY`). Env vars always win.\n\nEverything else (clusters, team members, PPP accounts, search paths, process filters, runtime tunables) lives in the SQLite database and is managed through the Settings UI, CLI, or MCP tools.\n\n### Database Schema\n\nThe canonical schema is in [`server/schema.py`](server/schema.py). Key v4 tables:\n\n| Table | Purpose |\n|-------|---------|\n| `clusters` | Cluster registry (host, GPU type, mount paths, team quota) |\n| `team_members` | Team roster for usage overlays |\n| `ppp_accounts` | PPP accounts tracked across clusters |\n| `path_bases` | Log search paths, NeMo-Run output dirs, Lustre mount prefixes |\n| `process_filters` | Local process scanner include/exclude patterns |\n| `app_settings` | Runtime tunables (SSH timeout, cache TTL, backup interval, ...) |\n| `projects` | Project registry with prefixes and colors |\n| `job_history` | Every Slurm job ever observed |\n| `runs` | Logical experiment runs (groups multiple Slurm jobs) |\n| `logbook_entries` | Per-project structured notes with FTS5 search |\n\nRun `python -m server.cli setup` to create all tables from scratch.\n\n### CLI Reference\n\n```bash\npython -m server.cli setup [--non-interactive]\npython -m server.cli add-cluster \u003cname\u003e --host \u003chost\u003e [--gpu-type ...] [--mount-path ...]\npython -m server.cli list-clusters\npython -m server.cli remove-cluster \u003cname\u003e\npython -m server.cli add-team-member \u003cusername\u003e [--display-name ...]\npython -m server.cli list-team\npython -m server.cli add-ppp \u003cname\u003e [--id 12345]\npython -m server.cli list-ppp\npython -m server.cli add-path --kind log_search \u003cpath\u003e\npython -m server.cli list-paths [--kind log_search]\npython -m server.cli add-filter --mode include \u003cpattern\u003e\npython -m server.cli list-filters\npython -m server.cli set \u003ckey\u003e \u003cvalue\u003e\npython -m server.cli get \u003ckey\u003e\npython -m server.cli settings\npython -m server.cli import-json \u003cpath/to/config.json\u003e   # v3-\u003ev4 migration\n```\n\n### MCP Server\n\n```bash\npip install mcp\n```\n\nInstall and start the standalone MCP service:\n\n```bash\ncp systemd/clausius-mcp.service ~/.config/systemd/user/clausius-mcp.service\nsystemctl --user daemon-reload\nsystemctl --user enable --now clausius-mcp.service\nsystemctl --user status clausius-mcp.service\n```\n\nThen add to `~/.cursor/mcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"clausius\": {\n      \"url\": \"http://127.0.0.1:7273/mcp\"\n    }\n  }\n}\n```\n\nReload Cursor (or restart MCP servers) to activate. The web UI service on `:7272` can restart independently; the MCP service stays up on `:7273`.\n\n### Cursor Agent Skill\n\nInstall the clausius skill so Cursor's agent knows how to use the MCP tools across all your projects:\n\n```bash\nmkdir -p ~/.cursor/skills/clausius\ncp skills/SKILL.md ~/.cursor/skills/clausius/SKILL.md\n```\n\n### Migrating from v3\n\nIf you have an existing `conf/config.json` from clausius v3:\n\n```bash\npython tools/import_legacy_config.py\n```\n\nThis imports all clusters, team members, PPP accounts, paths, process filters, and settings into the database and renames `config.json` to `config.json.bak`. Safe to re-run — skips entries that already exist.\n\n### Environment Variables\n\n| Variable | Default | Purpose |\n|----------|---------|---------|\n| `CLAUSIUS_DATA_DIR` | `./data` | Override data directory |\n| `CLAUSIUS_PORT` | `7272` | Override listen port |\n| `CLAUSIUS_SSH_USER` | `$USER` | Default SSH user for all clusters |\n| `CLAUSIUS_SSH_KEY` | `~/.ssh/id_ed25519` | Default SSH key for all clusters |\n| `CLAUSIUS_BOOTSTRAP_FILE` | `conf/clausius.toml` | Override bootstrap config path |\n| `CLAUSIUS_MOUNT_MAP` | (auto) | JSON map of cluster -\u003e mount roots |\n\n## Job Name Prefix Protocol\n\nJobs are grouped by project using a name prefix convention:\n\n```\n\u003cproject\u003e_\u003ccampaign\u003e_\u003crun-details\u003e\n```\n\n| Component | Rules | Example |\n|-----------|-------|---------|\n| `\u003cproject\u003e` | Lowercase letters, digits, hyphens. Starts with a letter. | `my-project`, `eval-suite`, `training` |\n| `_` | Required underscore separator | |\n| `\u003ccampaign\u003e` | Groups related runs visually (distinct shade of project color) | `mpsf`, `eval`, `train` |\n| `_` | Second underscore separator | |\n| `\u003crun-details\u003e` | The experiment/eval name | `nem120b-r9`, `kimi-k25-no-tool-r22` |\n\nThe monitor auto-detects projects on first encounter, assigning a color and emoji. Customize in Settings \u003e Projects.\n\nDependency chain auto-detection from run name suffixes:\n- `*-judge-rs\u003cN\u003e` — linked as child of the base eval\n- `*-summarize-results` — linked as child of the judge run\n\n## Systemd (User Service)\n\n```ini\n[Unit]\nDescription=clausius — Research clusters are chaotic. We are here to reverse the entropy.\nAfter=network.target\n\n[Service]\nType=simple\nWorkingDirectory=%h/clausius\nExecStart=%h/miniconda3/bin/python %h/clausius/app.py\nRestart=always\nRestartSec=5\nTimeoutStopSec=10\nKillMode=mixed\n\n[Install]\nWantedBy=default.target\n```\n\n```bash\nsystemctl --user daemon-reload\nsystemctl --user enable --now clausius.service\n```\n\n## Testing\n\n898 tests across unit, integration, MCP, and CLI layers.\n\n```bash\npip install pytest pytest-cov\npytest -m \"not live\"         # all deterministic tests (no SSH, no cluster)\npytest -m unit               # unit tests only\npytest -m integration        # Flask test client with mock cluster\npytest -m mcp                # MCP tool contracts\npytest -m live               # real cluster tests (requires running app)\n```\n\n| Layer | Directory | What it covers |\n|-------|-----------|----------------|\n| Unit | `tests/unit/` | Bootstrap, schema, CRUD (clusters, team, paths, settings), parsers, DB ops, cache, mount resolution, config proxies, entry refs |\n| Integration | `tests/integration/` | All Flask routes via test client (including new per-namespace endpoints), logbook links, storage quota, CLI |\n| MCP | `tests/mcp/` | Tool contracts, bulk read, config management, transport errors, edge cases |\n| Live | `tests/live/` | Real SSH/Slurm reads + job cancel |\n\nCI runs without any config files — falls back to bootstrap defaults with a mock cluster injected via `tests/conftest.py`.\n\n## Built With\n\n- **Backend**: Python, Flask, Paramiko, SQLite (FTS5)\n- **Frontend**: Vanilla JS, CSS custom properties, Chart.js, D3.js (no build step)\n- **Agent API**: MCP (Model Context Protocol)\n- **Infrastructure**: SSH connection pooling, SSHFS mounts, systemd\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftamohannes%2Fclausius","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftamohannes%2Fclausius","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftamohannes%2Fclausius/lists"}