https://github.com/taha-kms/classmate-rag
a local, multilingual (EN/IT) study assistant that indexes course materials and answers questions with citations—using multilingual-e5-base for retrieval and Llama 3.1-8B for generation. CLI-only.
https://github.com/taha-kms/classmate-rag
bm25 chromadb cli docker e5 information-retrieval llama3 llm rag retrieval-augmented-generation
Last synced: 3 months ago
JSON representation
a local, multilingual (EN/IT) study assistant that indexes course materials and answers questions with citations—using multilingual-e5-base for retrieval and Llama 3.1-8B for generation. CLI-only.
- Host: GitHub
- URL: https://github.com/taha-kms/classmate-rag
- Owner: taha-kms
- Created: 2025-08-26T17:28:01.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-09-27T21:07:47.000Z (4 months ago)
- Last Synced: 2025-09-27T21:08:03.999Z (4 months ago)
- Topics: bm25, chromadb, cli, docker, e5, information-retrieval, llama3, llm, rag, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 154 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CLASSMATE-RAG
A **Retrieval-Augmented Generation (RAG)** system for course materials.
It ingests documents (PDF, DOCX, PPTX, EPUB, HTML, CSV, TXT, MD), indexes them in **BM25** + **Chroma vector DB**, and answers questions with grounded citations using LLaMA/Mistral GGUF models.
---
## ✨ Features
* **CLI-first workflow** (`rag` command)
* Ingestion with metadata (course, unit, tags, language, semester, author)
* **Hybrid retrieval** (BM25 keyword + vector embeddings, fused with RRF)
* **Cited answers** generated with local LLMs
* **Admin tools**: stats, preview, backup/restore, vacuum, rebuild embeddings, reingest
* **Document loaders**: PDF, DOCX, PPTX, EPUB, HTML, CSV, TXT, Markdown
* **Multilingual support** with E5 embeddings (`intfloat/multilingual-e5-base`)
---
## 📦 Installation
See [docs/installation.md](docs/installation.md) for details.
Quick setup (Linux/macOS):
```bash
./quicksetup.sh
source .venv/bin/activate
rag --help
```
Windows (PowerShell):
```powershell
.\quicksetup.ps1
.\.venv\Scripts\Activate.ps1
rag --help
```
---
## 🚀 Usage
Ingest a document:
```bash
rag add path/to/file.pdf --course "Math101" --unit "1" --language "en" --tags exam,week1
```
Ask a question:
```bash
rag ask "What is the chain rule?" --course "Math101"
```
Preview retrieval (no generation):
```bash
rag preview "Explain entropy"
```
See [docs/usage.md](docs/usage.md) for more.
---
## 🛠️ Maintenance
* Show stats: `rag stats`
* Backup: `rag dump --path dumps/corpus.jsonl`
* Restore: `rag restore --path dumps/corpus.jsonl`
* Vacuum: `rag vacuum`
* Rebuild embeddings:
`rag rebuild --model intfloat/multilingual-e5-large`
* Manage entries: `rag list`, `rag show`, `rag delete`, `rag reingest`
Details in [docs/configuration.md](docs/configuration.md).
---
## 📖 Documentation
* [Installation](docs/installation.md)
* [Usage](docs/usage.md)
* [Configuration](docs/configuration.md)
* [Architecture](docs/architecture.md)
---
## 🧩 Project Structure
```
cli/ # CLI entrypoint
rag/ # Core RAG system
admin/ # Backup, restore, manage, inspect
chunking/ # Text splitting into chunks
embeddings/ # Embedding models & cache
generation/ # LLM runner, prompting, postprocessing
loaders/ # File loaders
retrieval/ # BM25, Chroma, hybrid fusion
pipeline/ # Ingestion, query orchestration
docs/ # Documentation
tools/ # Benchmark scripts
```
---