https://github.com/posit-dev/raghilda
https://github.com/posit-dev/raghilda
Last synced: 15 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/posit-dev/raghilda
- Owner: posit-dev
- Created: 2025-09-03T00:53:58.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2026-04-24T12:33:38.000Z (19 days ago)
- Last Synced: 2026-04-24T14:09:55.681Z (18 days ago)
- Language: Python
- Homepage: https://posit-dev.github.io/raghilda/
- Size: 6.6 MB
- Stars: 11
- Watchers: 1
- Forks: 0
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Agents: AGENTS.md
Awesome Lists containing this project
README
# raghilda 
RAG made simple.
raghilda is a Python package for implementing Retrieval-Augmented Generation (RAG) workflows. It provides a complete solution with sensible defaults while remaining transparent—not a black box.
## Installation
```bash
pip install raghilda
```
Or install from GitHub:
```bash
pip install git+https://github.com/posit-dev/raghilda.git
```
## Key Steps
raghilda handles the complete RAG pipeline:
1. **Document Processing** — Convert documents to Markdown using MarkItDown
2. **Text Chunking** — Split text at semantic boundaries (headings, paragraphs, sentences)
3. **Embedding** — Generate vector representations via OpenAI or other providers
4. **Storage** — Store chunks and embeddings in DuckDB, ChromaDB, or OpenAI Vector Stores
5. **Retrieval** — Find relevant chunks using similarity search or BM25
## Usage
```python
from raghilda.store import DuckDBStore
from raghilda.embedding import EmbeddingOpenAI
from raghilda.scrape import find_links
from raghilda.read import read_as_markdown
from raghilda.chunker import MarkdownChunker
# Create a store with embeddings
store = DuckDBStore.create(
location="chatlas.db",
embed=EmbeddingOpenAI(),
)
# Find and index pages from the chatlas documentation
links = find_links("https://posit-dev.github.io/chatlas/")
chunker = MarkdownChunker()
for link in links:
document = read_as_markdown(link)
chunked_document = chunker.chunk(document)
store.upsert(chunked_document)
# Build indexes before retrieval
store.build_index()
# Retrieve relevant chunks
chunks = store.retrieve("How do I stream a response?", top_k=5)
for chunk in chunks:
print(chunk.text)
```
## Links
- [Documentation](https://posit-dev.github.io/raghilda/)
- [Source Code](https://github.com/posit-dev/raghilda)
- [PyPI](https://pypi.org/project/raghilda/)
- [Report Issues](https://github.com/posit-dev/raghilda/issues)