https://github.com/gommezen/rag_writing_assistant
A transparent, fully local RAG document assistant that grounds every claim in your documents with interactive citations, coverage tracking, and confidence indicators.
https://github.com/gommezen/rag_writing_assistant
claude document-analysis document-analysis-tool faiss fastapi local-llm ollama rag react typescript
Last synced: about 22 hours ago
JSON representation
A transparent, fully local RAG document assistant that grounds every claim in your documents with interactive citations, coverage tracking, and confidence indicators.
- Host: GitHub
- URL: https://github.com/gommezen/rag_writing_assistant
- Owner: gommezen
- Created: 2026-01-22T15:58:53.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-02-10T02:44:10.000Z (18 days ago)
- Last Synced: 2026-02-10T05:54:04.418Z (18 days ago)
- Topics: claude, document-analysis, document-analysis-tool, faiss, fastapi, local-llm, ollama, rag, react, typescript
- Language: Python
- Homepage:
- Size: 925 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# RAG Document Intelligence





-black)
A transparent AI writing assistant that grounds every claim in your documents. Hover any citation to preview the source, click to verify the full context — so you always know exactly where the AI got its information.

## Features
- **Document upload** - PDF, DOCX, and TXT with drag-and-drop (non-blocking processing)
- **Grounded generation** - All AI content derived from your uploaded documents
- **Interactive citations** - Hover to preview source, click to navigate to full context
- **Confidence indicators** - Visual cues for high/medium/low confidence content
- **Section-level editing** - Regenerate or manually edit individual sections
- **Coverage transparency** - See what % of your documents were analyzed
- **Intent detection** - Auto-selects retrieval strategy per query type (analysis/QA/writing)
- **Chat mode** - Multi-turn conversations with persistent history
- **Dark mode** - Toggle between light and dark themes
- **Keyboard shortcuts** - Ctrl+Enter to generate, Escape to close modals
- **Fully local** - Documents never leave your machine (runs on Ollama)
## Requirements
- Python 3.11+
- Node.js 18+
- [Ollama](https://ollama.ai/) running locally
## Installation
### 1. Install Ollama and pull models
```bash
ollama pull qwen2.5:7b-instruct-q4_0
ollama pull mxbai-embed-large
```
### 2. Set up the backend
```bash
cd backend
pip install -r requirements.txt
```
### 3. Set up the frontend
```bash
cd frontend
npm install
```
## Running the App
Start both services in separate terminals:
```bash
# Terminal 1 - Backend
cd backend
python -m uvicorn app.main:app --port 8001 --reload
# Terminal 2 - Frontend
cd frontend
npm run dev
```
Open http://localhost:5173 in your browser.
## Configuration
Create `backend/.env` to customize (all optional):
```env
# LLM Models (must be available in Ollama)
GENERATION_MODEL=qwen2.5:7b-instruct-q4_0
EMBEDDING_MODEL=mxbai-embed-large
# Intent-specific models (falls back to GENERATION_MODEL)
ANALYSIS_MODEL=llama3.1:8b-instruct-q8_0
WRITING_MODEL=qwen2.5:7b-instruct-q4_0
QA_MODEL=gemma3:4b
# Retrieval settings
SIMILARITY_THRESHOLD=0.35
TOP_K=10
# Coverage settings (for analysis/summary mode)
DEFAULT_COVERAGE_PCT=35
MAX_COVERAGE_PCT=60
# Ollama connection
OLLAMA_BASE_URL=http://localhost:11434
```
## How It Works
### Intent Detection
The system classifies queries and adjusts retrieval strategy automatically:
| Query Example | Intent | Retrieval | Coverage |
|---|---|---|---|
| "Summarize this document" | Analysis | Diverse (regions) | ~35% |
| "What is data feminism?" | Q&A | Similarity (top-k) | ~8-10% |
| "Write a report on X" | Writing | Similarity (top-k) | ~8-10% |
### Coverage Tracking
The UI shows how much of your documents the system actually read:
- Coverage percentage with color indicator (green/yellow/red)
- Chunks analyzed vs total
- "Expand to ~50%" button for deeper analysis
A summary based on 8% is different from 35% — the system makes this visible.
### Confidence Levels
| Level | Criteria |
|---|---|
| High | 3+ citations |
| Medium | 1-2 citations |
| Low | Hedging language detected |
| Unknown | 0 citations |
### Blind Spot Detection
The system reports what it didn't see — documents with no coverage, regions (intro/middle/conclusion) that weren't sampled.
### No Learning From Your Data
RAG does **not** train on your documents. Files are chunked and indexed for retrieval only — the AI model is never modified. Deleting a document removes it completely.
## Chat Mode
Switch to Chat mode for multi-turn conversations about your documents:
- Follow-up questions that build on prior context
- Conversation history in the sidebar (browse, resume, delete)
- Per-message sources showing which chunks were used
- Cumulative coverage tracking across a conversation
- Stored locally as JSON files, never sent externally
## Project Structure
```
rag_writing_assistant/
├── backend/
│ ├── app/
│ │ ├── api/routes/ # FastAPI endpoints
│ │ ├── core/ # Exceptions, logging
│ │ ├── models/ # Pydantic models
│ │ ├── rag/ # Chunking, embeddings, vector store, prompts
│ │ ├── services/ # Business logic (generation, retrieval, intent)
│ │ ├── config.py # Settings
│ │ └── main.py # FastAPI app entry point
│ ├── tests/ # Pytest test suite
│ ├── data/ # Document storage, vector indices, conversations
│ ├── requirements.txt
│ └── pyproject.toml
├── frontend/
│ ├── src/
│ │ ├── api/ # API client
│ │ ├── components/ # React components
│ │ ├── hooks/ # React Query hooks
│ │ ├── types/ # TypeScript interfaces
│ │ └── test/ # Test utilities
│ ├── package.json
│ ├── tsconfig.json
│ └── vite.config.ts
├── docs/ # Screenshots
├── CLAUDE.md # AI agent instructions
├── CHANGELOG.md
└── README.md
```
## API Reference
| Endpoint | Method | Description |
|---|---|---|
| `/api/documents` | POST | Upload document |
| `/api/documents` | GET | List documents |
| `/api/documents/{id}` | GET | Get document status |
| `/api/documents/{id}` | DELETE | Delete document |
| `/api/generate` | POST | Generate draft |
| `/api/generate/section` | POST | Regenerate section |
| `/api/generate/suggestions` | POST | Generate suggested questions |
| `/api/chat` | POST | Send chat message |
| `/api/chat` | GET | List conversations |
| `/api/chat/{id}` | GET | Get conversation |
| `/api/chat/{id}` | DELETE | Delete conversation |
| `/api/chat/{id}` | PATCH | Update conversation title |
| `/api/health` | GET | Health check |
Document uploads are non-blocking — the endpoint returns immediately with `status: "pending"`, then progresses through `processing` to `ready` (or `failed`).
## Development
### Running Tests
```bash
# Backend (pytest)
cd backend && pytest tests/ -v
# Frontend (vitest)
cd frontend && npm run test
# Frontend type check + build
cd frontend && npm run build
```
### Tech Stack
| Layer | Technology |
|---|---|
| Backend | Python 3.11+, FastAPI, Pydantic |
| Frontend | React 18, TypeScript (strict), React Query |
| Vector DB | FAISS (local, file-based) |
| LLM | Ollama (local inference) |
| Embeddings | mxbai-embed-large |
## License
MIT