https://github.com/pasunboneleve/cheap-rag
Simple RAG with hard guardrails. Answers only when local content is relevant — otherwise refuses. Cheap to run, cheap to understand, cheap to replace.
https://github.com/pasunboneleve/cheap-rag
embeddings golang guardrails llm rag semantic-search sqlite unix-socket
Last synced: 16 days ago
JSON representation
Simple RAG with hard guardrails. Answers only when local content is relevant — otherwise refuses. Cheap to run, cheap to understand, cheap to replace.
- Host: GitHub
- URL: https://github.com/pasunboneleve/cheap-rag
- Owner: pasunboneleve
- License: mit
- Created: 2026-04-21T01:10:14.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-05-07T02:06:19.000Z (about 2 months ago)
- Last Synced: 2026-05-07T04:12:36.873Z (about 2 months ago)
- Topics: embeddings, golang, guardrails, llm, rag, semantic-search, sqlite, unix-socket
- Language: Go
- Homepage:
- Size: 138 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Security: docs/security.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# cheap-rag
[](https://github.com/pasunboneleve/cheap-rag/actions/workflows/linux-ci.yml)
[](https://github.com/pasunboneleve/cheap-rag/actions/workflows/macos-ci.yml)
**cheap-rag** is a deliberately simple, low-cost RAG implementation. It
answers only when local content is sufficiently similar to the
question. Otherwise, it refuses.
This is the intended shape for this project.
Not designed for scale. When scale becomes a bottleneck, replace it.
## Why this exists
This project keeps deterministic guardrails in front of a stochastic model.
Retrieval decides whether the model is allowed to speak.
If local evidence is out of scope, cheap-rag refuses.
If local evidence is in scope, cheap-rag answers from that evidence.
## Architecture
Pipeline:
`embed -> retrieve -> gate -> generate -> validate`
- embed: turn the question into a vector
- retrieve: fetch top-k local chunks by similarity
- gate: refuse if similarity/evidence is insufficient
- generate: answer only from retrieved chunks
- validate: lightweight checks for evidence coverage/support
## Quick start
```bash
go run ./cmd/cheaprag index --config ./cheaprag.example.yaml
go run ./cmd/cheaprag shell --config ./cheaprag.example.yaml
go run ./cmd/cheaprag serve --config ./cheaprag.example.yaml
go run ./cmd/cheaprag ask --config ./cheaprag.example.yaml "what is cheap to change?"
go run ./cmd/cheaprag inspect query --config ./cheaprag.example.yaml "ci cd"
go run ./cmd/cheaprag version
```
## API example
When running `serve`, cheap-rag listens on `runtime.socket_path` using HTTP over a Unix domain socket only.
Success:
```json
{
"outcome": "answer",
"content": "Use short feedback loops and explicit boundaries...",
"reason": null,
"query_similarity": 0.73,
"provider_statuses": {"embedding": 200, "generation": 200},
"retrieval": [
{"chunk_id":"chunk_1","similarity":0.73,"path":"post.md","citation":"my-post-slug"}
]
}
```
Refusal:
```json
{
"outcome": "refusal",
"content": "Sorry, I don't know how to answer this.",
"reason": "out-of-scope",
"query_similarity": 0.18,
"retrieval": []
}
```
## Key ideas
- retrieval is the gatekeeper: generation happens only after retrieval passes scope checks
- validation is intentionally limited: useful heuristics, not a formal truth system
- local + inspectable beats scalable-by-default for this project
## Documentation
- [Architecture](docs/architecture.md)
- [Configuration](docs/config.md)
- [API](docs/api.md)
- [Storage](docs/storage.md)
- [Security](docs/security.md)
- [Validation](docs/validation.md)
- [Release schedule](docs/release-schedule.md)
## Security model
- no TCP listener is opened
- server binds only to the configured Unix socket path
- socket file is recreated on startup and chmod'd to `0660`
- if `internal_token` is set, requests must include `Authorization: Bearer `
See [docs/security.md](docs/security.md) for full details.