An open API service indexing awesome lists of open source software.

https://github.com/archmaxai/archmax

A semantic layer for your data: archmax describes it, you sharpen it, agents query it.
https://github.com/archmaxai/archmax

agents data-federation mcp ontologies opensemanticinterchange osi semantic-search semantic-view semantic-views text-to-sql

Last synced: about 2 months ago
JSON representation

A semantic layer for your data: archmax describes it, you sharpen it, agents query it.

Awesome Lists containing this project

README

          

# archmax

A semantic layer for your data: archmax describes it, you sharpen it, AI agents query it.


Documentation  · 
Issues  · 
GitHub

> **Heads up: archmax is experimental.** The core ideas are stable, but APIs, file formats, and configuration may change between releases. We try to avoid breaking changes, but can't guarantee stability yet. Pin your version and check the changelog before upgrading.

Project home dashboard
Semantic model graph view
AI-assisted model builder
MCP access

Project Home
Graph View
Model Builder
MCP Access

Test agents configuration
Test cases
Test runs
Project settings

Test Agents
Test Cases
Test Runs
Settings

## The Problem

Connecting AI agents to databases today is a gamble. You either hand over raw SQL access and hope the LLM doesn't hallucinate column names, run destructive queries, or leak sensitive data, or you spend weeks writing bespoke tool integrations that break the moment your schema changes.

Even when agents *can* query your database, they have no idea what the data actually means. A column called `amt_01` could be revenue, tax, or a refund. A table called `dim_cust` is meaningless without business context. Without that context, agents guess, and guessing on real data has real consequences.

The gap between "AI can write SQL" and "AI understands our data" is where most agent-database projects stall.

## How archmax Solves This

archmax puts a **semantic layer** between your databases and AI agents. Archmax describes your data; you sharpen it with the context that matters; agents query through the [Model Context Protocol (MCP)](https://modelcontextprotocol.io/).

Instead of raw database access, agents get:

- **Business context**: field descriptions, synonyms, examples, and enum values so the agent knows `amt_01` is "gross revenue in EUR"
- **Guardrails**: read-only queries scoped to sandboxed VIEWs, not raw tables; token-based access with model-level permissions
- **Federation**: a single query interface across Postgres, MySQL, MSSQL, SQLite, DuckDB, and Iceberg REST Catalogs, powered by DuckDB's in-process engine
- **Structure**: typed datasets, explicit relationships, and reusable metric definitions stored as [OSI](https://github.com/open-semantic-interchange/OSI) YAML
- **Token efficiency**: OSI models are converted to compressed markdown digests before being sent to agents, reducing token usage by 3–5× compared to raw YAML

The result: AI agents that query your data reliably, safely, and with understanding, not guesswork. The approach is conceptually similar to [Snowflake Semantic Views](https://docs.snowflake.com/en/user-guide/views-semantic/overview): Both layer business meaning (metrics, dimensions, relationships) over physical tables so consumers get consistent definitions instead of raw column names. The key difference is that archmax is database-agnostic (federating across Postgres, MySQL, MSSQL, SQLite, DuckDB, and Iceberg REST Catalogs).

Built on the **[Open Semantic Interchange (OSI)](https://github.com/open-semantic-interchange/OSI)** spec, an open standard for describing datasets, relationships, and metrics in a vendor-neutral way. archmax uses OSI YAML as its internal storage format for semantic model definitions — every dataset, field, relationship, and metric is persisted as spec-compliant YAML files on disk.

Because the OSI YAML format is verbose and token-intensive, archmax **does not serve raw YAML to AI agents**. Instead, when an agent requests model information through MCP tools, the OSI model is converted on-the-fly into a **compressed markdown digest** that preserves all semantically relevant information (field types, descriptions, enums, relationships, examples) while using **3–5× fewer tokens** than the equivalent YAML. This makes agent interactions significantly cheaper and faster without sacrificing context quality.

## Features

- **Semantic Models**: describe tables as datasets with typed fields, relationships, and metrics in YAML
- **MCP Server**: expose your semantic layer to any MCP-compatible AI agent (Claude, Cursor, custom agents)
- **Data Federation**: query across Postgres, MySQL, MSSQL, SQLite, DuckDB, and Iceberg REST Catalogs from a single project
- **AI-Assisted Model Builder**: a chat-based agent discovers schemas, maps fields, detects enums, and infers relationships
- **Scoped Query Execution**: agents run read-only SQL against sandboxed VIEWs, never raw tables
- **Token-Based Access Control**: MCP tokens with configurable model scopes and expiry
- **Testing Suite**: test cases that validate whether agents can use your semantic models correctly
- **Version Control**: built-in Git tracks every publish; optional GitHub sync pushes models to a remote repository
- **Self-Hosted**: deploy with Docker in minutes, keep your data on your infrastructure

## Quick Start

### Docker (Standalone)

Run archmax as a single container with embedded MongoDB and Redis:

```bash
docker run -d \
--name archmax \
-p 8080:8080 \
-e BETTER_AUTH_SECRET=$(openssl rand -base64 32) \
-e UI_USERNAME=admin \
-e UI_PASSWORD=changeme \
-e AGENT_API_KEY=your-openrouter-api-key \
-v ~/.archmax:/data \
ghcr.io/archmaxai/archmax:latest
```

> **Volume mount:** The `-v ~/.archmax:/data` bind mount persists all application data on the host — semantic model YAML files (`projects/`), embedded MongoDB data (`mongodb/`), and the DuckDB extension cache (`.duckdb/`). Without this mount, all data is lost when the container is removed.

> **Save your `BETTER_AUTH_SECRET`.** If you lose this value or change it on a restart, all sessions and authentication data become invalid. Generate it beforehand and store it in a safe place.

> **`AGENT_API_KEY` is required for AI features.** The Semantic Model Builder, Testing Playground, and automatic title generation all need an API key for an OpenAI-compatible provider. The default endpoint is [OpenRouter](https://openrouter.ai). Without this key, agent features will be unavailable.

Open `http://localhost:8080` and log in with username `admin` (or your `UI_USERNAME`) and the password you set in `UI_PASSWORD`.

### Docker Compose (Recommended for Production)

Runs archmax with dedicated MongoDB and Redis containers instead of the embedded services:

```bash
# 1. Clone the repo (or copy docker-compose.yml + .env.example)
git clone https://github.com/archmaxai/archmax.git
cd archmax

# 2. Create your .env from the example and fill in the required values
cp .env.example .env

# 3. Start the stack
docker compose up -d
```

The `.env` file needs at minimum:

```bash
BETTER_AUTH_SECRET= # openssl rand -base64 32
UI_PASSWORD=
AGENT_API_KEY= # required for AI features
```

Optional overrides (defaults shown):

| Variable | Default |
|----------|---------|
| `UI_USERNAME` | `admin` |
| `AGENT_API_BASE_URL` | `https://openrouter.ai/api/v1` |
| `AGENT_MODEL` | `anthropic/claude-sonnet-4.6` |

The stack exposes port **8080** and persists data in two Docker volumes:

| Volume | Container Path | Contents |
|--------|---------------|----------|
| `archmax-data` | `/data` | Semantic model YAML files, DuckDB extension cache |
| `mongo-data` | `/data/db` (mongo container) | MongoDB database files |

These volumes are created automatically by Docker Compose. To use host bind mounts instead, edit `docker-compose.yml` and replace the named volumes with paths (e.g. `./data/archmax:/data`).

### Local Development

```bash
git clone https://github.com/archmaxai/archmax.git
cd archmax
cp .env.example .env.local # Edit with your settings
pnpm install
pnpm dev
```

| Service | URL |
|---------|-----|
| Frontend | http://localhost:5173 |
| API | http://localhost:3000 |
| Docs | http://localhost:4321 |
| MCP | `POST http://localhost:3000/mcp//mcp` |

## Architecture

```
archmax/
├── apps/
│ ├── api/ # Hono API server
│ ├── e2e/ # Playwright end-to-end tests
│ ├── frontend/ # Vite + React SPA (TanStack Router)
│ ├── worker/ # BullMQ worker for agent jobs
│ └── docs/ # Documentation site (Astro Starlight)
├── packages/
│ ├── core/ # Shared models, services, config (@archmax/core)
│ └── ui/ # React UI components (@archmax/ui)
└── openspec/ # Specifications and change proposals
```

**Tech stack:** TypeScript, Hono, React 19, Vite 6, MongoDB, DuckDB, Tailwind CSS 4, Turborepo, Playwright

## MCP Tools

| Tool | Description |
|------|-------------|
| `list_semantic_models` | List semantic models the token has access to |
| `get_semantic_model` | Overview of a model with datasets, relationships, and metrics |
| `get_datasets` | Fields for one or more datasets with types, examples, enums, and instructions |
| `execute_query` | Run a read-only SQL query scoped to a semantic model's VIEWs |
| `request_improvement` | Submit an improvement request for a semantic model |

### Connecting an AI Agent

Configure your MCP client with:

- **Endpoint:** `https://your-server/mcp//mcp`
- **Auth:** `Bearer `

```json
{
"mcpServers": {
"archmax": {
"url": "https://your-server/mcp/your-project/mcp",
"headers": {
"Authorization": "Bearer sk-your-token"
}
}
}
}
```

## Version Control & GitHub

Every project has a built-in Git repository that automatically tracks changes. Each time you **publish**, archmax commits the current state of your semantic models (source YAML and assembled build output) with your publish message.

Optionally connect a GitHub remote in **Settings → GitHub** to keep an external copy in sync. The publish flow becomes: pull → build → commit → push. You need a [Personal Access Token](https://github.com/settings/tokens) with the `repo` scope (classic) or `Contents: Read and write` (fine-grained).

Key capabilities:
- **Automatic change detection** — the Publish button appears when any file in the project changes
- **Publish history** — paginated list of publishes with messages and timestamps in Settings
- **Revert** — restore your project to any previous publish with one click (creates a new commit, rebuilds, and pushes)
- **Sync Now** — pull remote changes without publishing
- **Conflict detection** — blocks publishing when merge conflicts exist and tells you which files need attention
- **Re-init** — reset local Git history if needed (working files are preserved)

See the [Version Control guide](https://docs.archmax.ai/guides/version-control/) for details.

## Configuration

Key environment variables (see `.env.example` for the full list):

| Variable | Description |
|----------|-------------|
| `BETTER_AUTH_SECRET` | Session encryption secret (min 32 chars). Save and reuse across restarts. |
| `UI_USERNAME` / `UI_PASSWORD` | Initial admin credentials (default username: `admin`) |
| `APP_BASE_URL` | Public URL of this instance (e.g. `https://archmax.example.com`). Set when behind a reverse proxy to auto-configure CORS and auth. |
| `ENCRYPTION_KEY` | Optional. Encrypts database connection passwords and API keys at rest (AES-256-GCM). Generate with `openssl rand -base64 32`. |
| `MONGODB_URI` | MongoDB connection string (optional in Docker; embedded when omitted) |
| `AGENT_API_BASE_URL` | OpenAI-compatible API endpoint (default: OpenRouter) |
| `AGENT_API_KEY` | API key for the AI agent (required for agent features) |
| `AGENT_MODEL` | LLM model identifier (e.g., `anthropic/claude-sonnet-4`) |
| `REDIS_URL` | Optional. Enables BullMQ worker queue (embedded in Docker when omitted) |

## Contributing

archmax uses [OpenSpec](https://github.com/nicholasgriffintn/openspec) for spec-driven development. **Every feature PR must include a corresponding spec change.**

### Setup

```bash
npm install -g openspec-cli
```

### Workflow

OpenSpec integrates with your AI coding assistant. Use these prompts in Cursor (or any OpenSpec-aware agent):

| Prompt | What it does |
|--------|-------------|
| `/openspec-proposal` | Scaffolds a new change proposal with `proposal.md`, `tasks.md`, and spec deltas |
| `/openspec-apply` | Implements an approved proposal by following the task checklist |
| `/openspec-archive` | Archives a completed change and updates specs |

The typical flow:

1. **Propose** — Run `/openspec-proposal` and describe the change. The agent creates the proposal directory, writes spec deltas, and validates with `openspec validate --strict`.
2. **Review** — Get the proposal approved before any code is written.
3. **Implement** — Run `/openspec-apply` to work through the task list.
4. **Archive** — After merging, run `/openspec-archive` to move the change to the archive and update the canonical specs.

You can also drive the workflow manually with the CLI (`openspec list`, `openspec show`, `openspec validate`, `openspec archive`). See `openspec/AGENTS.md` for the full reference.

See the [Contributing guide](apps/docs/src/content/docs/contributing/openspec.mdx) for details.

## License

[AGPL-3.0](LICENSE)