https://github.com/obnoxiousmods/obbyai

Self-hosted local AI chat with multi-GPU routing, RAG, web search, and file intelligence. Powered by Ollama.
https://github.com/obnoxiousmods/obbyai

ai-chat amd-gpu arangodb local-llm ollama python rag self-hosted starlette vulkan

Last synced: about 2 months ago
JSON representation

Self-hosted local AI chat with multi-GPU routing, RAG, web search, and file intelligence. Powered by Ollama.

Host: GitHub
URL: https://github.com/obnoxiousmods/obbyai
Owner: obnoxiousmods
License: mit
Created: 2026-04-07T05:27:07.000Z (3 months ago)
Default Branch: master
Last Pushed: 2026-04-07T07:09:31.000Z (3 months ago)
Last Synced: 2026-04-07T07:24:42.903Z (3 months ago)
Topics: ai-chat, amd-gpu, arangodb, local-llm, ollama, python, rag, self-hosted, starlette, vulkan
Language: Python
Homepage: https://ai.obby.ca
Size: 133 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
- Security: .github/SECURITY.md

Awesome Lists containing this project

README

          


# ObbyAI

**A fully local, self-hosted AI chat platform with multi-GPU routing, RAG, web search, and comprehensive file intelligence.**

[![CI](https://github.com/obnoxiousmods/obbyai/actions/workflows/ci.yml/badge.svg)](https://github.com/obnoxiousmods/obbyai/actions/workflows/ci.yml)

[![License: MIT](https://img.shields.io/badge/License-MIT-7c5cfc.svg)](LICENSE)

[![Python](https://img.shields.io/badge/Python-3.12%2B-blue)](https://python.org)

[![Ollama](https://img.shields.io/badge/Ollama-Vulkan%20%7C%20CUDA-orange)](https://ollama.com)

[![ArangoDB](https://img.shields.io/badge/ArangoDB-3.12-green)](https://arangodb.com)

[Features](#features) · [Quick Start](#quick-start) · [Architecture](#architecture) · [Models](#models) · [Configuration](#configuration) · [Contributing](#contributing)



---

## What is ObbyAI?

ObbyAI is a **production-quality, self-hosted AI chat interface** built on top of [Ollama](https://ollama.com). It runs entirely on your local hardware with no external API calls — except when you explicitly use the web search tool.

Key design goals:

- **Privacy-first**: all inference runs on your hardware, nothing leaves your network

- **Multi-GPU**: route requests to different machines based on the task (e.g. vision to RTX 2080 Super, fast chat to RX 580)

- **Tool-augmented**: real web search, persistent vector memory, exact math, and file intelligence built in

- **Single-file frontend**: the entire UI is one `index.html` — zero build steps, zero Node.js in production

---

## Features

### 🖥️ Multi-GPU Server Routing

Switch between GPU backends from the header:

- **Local**: AMD RX 580 8GB (Vulkan) — fast 7–8B models

- **Remote**: NVIDIA RTX 2080 Super 8GB (CUDA) — Gemma4 vision, larger models

Each server has its own model list. Requests are proxied through the Python backend — no CORS, no direct browser-to-Ollama.

### 🤖 10+ Curated Models

| Model | Size | Tag | Best For |

|---|---|---|---|

| llama3.1:8b | 4.9 GB | General | Default — reasoning, coding, Q&A |

| qwen2.5:7b | 4.7 GB | General | Multilingual, strong reasoning |

| qwen2.5-coder:7b | 4.7 GB | Coding | Code generation and debugging |

| qwen3:8b | 5.2 GB | Thinking | Hybrid thinking/reasoning mode |

| deepseek-r1:8b | 5.2 GB | Reasoning | Chain-of-thought problems |

| mistral:7b-instruct | 4.4 GB | General | Structured tasks, fast responses |

| mistral-nemo:12b | 7.1 GB | General | More capable, ~7GB VRAM |

| gemma3:4b | 3.3 GB | Fast | Quick responses, efficient |

| phi4-mini:3.8b | 2.5 GB | Fast | Microsoft Phi-4, punches above weight |

| gemma4:latest | 9.6 GB | Vision | Multimodal image+text (RTX 2080S) |

### 🛠️ Tool Calling

Automatic tool use for capable models (llama3.1, qwen2.5, qwen3, mistral, phi4-mini):

| Tool | Trigger | Capability |

|---|---|---|

| 🔍 `web_search` | Current events, lookups, docs | DuckDuckGo, no API key |

| 📚 `rag_search` | "in my documents", context recall | ArangoDB vector search |

| 🧮 `calculator` | Any arithmetic | Safe AST eval, trig, log |

| 🕐 `get_datetime` | Time/date queries | Timezone-aware |

Tool calls are streamed live — you see exactly what the model is searching for and what results it gets.

### 📚 RAG — Retrieval-Augmented Generation

Persistent knowledge base powered by [ArangoDB](https://arangodb.com) + [nomic-embed-text](https://ollama.com/library/nomic-embed-text) embeddings:

- Automatic context injection: relevant chunks from your documents are silently prepended to every query

- Conversation memory: past messages are embedded and retrievable across sessions

- Document library: upload files and they stay in the knowledge base permanently

### 📁 File Intelligence — 40+ Formats

**Images** → Gemma4 vision model performs deep analysis:

- Full OCR — extracts all visible text

- Image description and context

- Table/chart/diagram interpretation

- Code and URL extraction

**Documents:**

| Format | Extraction Method |

|---|---|

| PDF | [pymupdf4llm](https://github.com/pymupdf/RAG) — layout-aware markdown |

| DOCX | python-docx — headings, paragraphs, tables |

| PPTX | python-pptx — per-slide text |

| XLSX/ODS | openpyxl — all sheets as structured text |

**Text & Code** (30+ extensions): direct UTF-8 extraction with encoding detection. Python, JavaScript, TypeScript, Rust, Go, Java, C/C++, SQL, YAML, TOML, JSON, Markdown, and more.

All extracted content is chunked, embedded, and stored in ArangoDB for future retrieval.

### 🎭 Persona Library

7 curated system prompts selectable from the toolbar:

| Persona | Best For |

|---|---|

| ✨ Default | General knowledge, balanced |

| ⚙️ Senior Developer | Architecture, code reviews, production-quality code |

| 💻 Coding Assistant | Write/debug/review code |

| 🔬 Research Assistant | Analysis, citations, web-sourced facts |

| 📊 Data Analyst | Numbers, insights, business intelligence |

| 🎨 Creative Partner | Writing, brainstorming, storytelling |

| ⚡ Concise Mode | Shortest accurate answer, zero fluff |

### 💬 Chat Features

- **Multi-conversation sidebar** with search, rename, delete

- **Persistent history** in localStorage with per-conversation session IDs

- **Markdown rendering** with syntax-highlighted code (highlight.js, 100+ languages)

- **Streaming tokens** with real-time render

- **Regenerate / edit** user messages

- **Export conversation** as Markdown

- **Token stats** per message (tokens generated, tok/s)

- **Drag & drop** files directly into chat

- **Image attachment** for vision-capable models

- **System prompt bar** — per-session custom instructions

- **Parameter controls** — temperature, top-p, max tokens, context length

- **Keyboard shortcuts**: `Ctrl+Enter` send, `Ctrl+K` new chat, `Esc` close modals

---

## Quick Start

### Prerequisites

| Requirement | Notes |

|---|---|

| Python 3.12+ | |

| [uv](https://docs.astral.sh/uv/) | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |

| [Ollama](https://ollama.com) | AMD: use `ollama-vulkan`; NVIDIA: standard install |

| [ArangoDB](https://arangodb.com) 3.12 | For RAG features |

### 1. Clone and install

```bash

git clone https://github.com/obnoxiousmods/obbyai.git

cd obbyai

uv sync

```

### 2. Configure

```bash

cp .env.example .env

# Edit .env: set ARANGO_PASS and OLLAMA_REMOTE_URL

```

### 3. Pull models

```bash

# Primary model

ollama pull llama3.1:8b

# Required for RAG

ollama pull nomic-embed-text

# Optional extras

ollama pull qwen2.5:7b qwen2.5-coder:7b deepseek-r1:8b

```

### 4. Run

```bash

uv run uvicorn main:app --host 0.0.0.0 --port 8091

```

Open **http://localhost:8091**

### Production (systemd)

```ini

[Unit]

Description=ObbyAI Chat Web UI

After=network.target ollama.service arangodb3.service

[Service]

WorkingDirectory=/opt/ai-chat

EnvironmentFile=/opt/ai-chat/.env

ExecStart=/home/user/.local/bin/uv run uvicorn main:app --host 127.0.0.1 --port 8091

Restart=on-failure

User=youruser

[Install]

WantedBy=multi-user.target

```

### HTTPS with nginx

```nginx

server {

    listen 443 ssl;

    server_name ai.yourdomain.com;

    ssl_certificate     /path/to/cert.pem;

    ssl_certificate_key /path/to/key.pem;

    location / {

        proxy_pass http://127.0.0.1:8091;

        proxy_set_header Host $host;

        proxy_buffering off;

    }

    location ~ ^/(api|v1)/ {

        proxy_pass http://127.0.0.1:11434;

    }

}

```

---

## Architecture

```

Browser

  │  HTTPS (nginx SSL termination)

  ▼

Starlette (port 8091)

  ├── GET  /              → index.html (self-contained SPA)

  ├── GET  /servers       → [{id, name, gpu, default_model}]

  ├── GET  /models        → Ollama /api/tags (proxied per server)

  ├── GET  /prompts       → persona list

  ├── POST /upload        → file_processor → RAG ingest

  ├── POST /ingest        → direct text → RAG ingest

  ├── GET  /rag/stats     → document + session message counts

  └── POST /chat          → tool loop → Ollama SSE → frontend SSE

        │

        ├── tools/web_search.py     DuckDuckGo

        ├── tools/rag.py            ArangoDB + nomic-embed-text

        ├── tools/calculator.py     AST eval

        ├── tools/datetime_tool.py  zoneinfo

        └── tools/file_processor.py

              ├── Images   → Gemma4 @ remote:11434

              ├── PDF      → pymupdf4llm

              ├── DOCX     → python-docx

              ├── PPTX     → python-pptx

              ├── XLSX     → openpyxl

              └── Text/Code → chardet + direct decode

ArangoDB (port 8529)

  ├── rag_documents   (chunked text + nomic embeddings)

  └── rag_sessions    (per-conversation message embeddings)

Ollama instances

  ├── Local  127.0.0.1:11434   (RX 580 / Vulkan)

  └── Remote 192.168.1.x:11434 (RTX 2080S / CUDA)

```

### Tool Call Flow

```

User message

    │

    ├─ RAG search (if use_rag=true) → inject top-K chunks into context

    │

    └─ Ollama /api/chat (with tools=[web_search, rag_search, calculator, get_datetime])

            │

            ├─ Model streams tokens → forwarded to browser via SSE

            │

            └─ Model calls tool?

                    │

                    ├─ Execute tool

                    ├─ Stream tool_call + tool_result events to browser

                    └─ Re-submit with tool results → continue streaming

```

---

## Configuration

All configuration is in `.env` (copy from `.env.example`):

```env

ARANGO_URL=http://localhost:8529

ARANGO_USER=root

ARANGO_PASS=your_password

ARANGO_DB=ai_chat_rag

OLLAMA_LOCAL_URL=http://127.0.0.1:11434

OLLAMA_REMOTE_URL=http://192.168.1.x:11434

EMBED_MODEL=nomic-embed-text

VISION_MODEL=gemma4:latest

```

### AMD GPU Notes (RX 580 / Polaris)

The RX 580 2048SP (PCI ID `0x6fdf`) is a Chinese market variant not in ROCm's whitelist. Use Vulkan:

```bash

# Arch Linux

pacman -S ollama-vulkan   # NOT ollama (ROCm-only)

# Verify GPU layers

ollama run llama3.1:8b "hi" 2>&1 | grep -i gpu

# Expected: "offloaded 33/33 layers to GPU"

```

---

## Development

```bash

uv sync

uv run uvicorn main:app --reload --port 8091

# Lint

uv run ruff check .

uv run ruff format .

# Check JS

python3 -c "

import re

html = open('index.html').read()

scripts = re.findall(r'(.*?)', html, re.DOTALL)

open('/tmp/check.js','w').write('\n'.join(scripts))

"

node --check /tmp/check.js

```

---

## Roadmap

- [ ] Authentication (nginx basic auth / OAuth)

- [ ] Docker Compose deployment

- [ ] Code execution sandbox (Python REPL tool)

- [ ] Voice input (Whisper via Ollama)

- [ ] Image generation (Stable Diffusion / ComfyUI)

- [ ] Conversation export to HTML

- [ ] Mobile layout improvements

- [ ] Scheduled automation tools

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). PRs welcome.

---

## License

[MIT](LICENSE) © 2026 obnoxiousmods

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/obnoxiousmods/obbyai

Awesome Lists containing this project

README