https://github.com/posit-dev/raghilda

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/posit-dev/raghilda
Owner: posit-dev
Created: 2025-09-03T00:53:58.000Z (9 months ago)
Default Branch: main
Last Pushed: 2026-04-24T12:33:38.000Z (about 1 month ago)
Last Synced: 2026-04-24T14:09:55.681Z (about 1 month ago)
Language: Python
Homepage: https://posit-dev.github.io/raghilda/
Size: 6.6 MB
Stars: 11
Watchers: 1
Forks: 0
Open Issues: 11
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Agents: AGENTS.md

Awesome Lists containing this project

README

          # raghilda 

RAG made simple.

raghilda is a Python package for implementing Retrieval-Augmented Generation (RAG) workflows. It provides a complete solution with sensible defaults while remaining transparent—not a black box.

## Installation

```bash

pip install raghilda

```

Or install from GitHub:

```bash

pip install git+https://github.com/posit-dev/raghilda.git

```

## Key Steps

raghilda handles the complete RAG pipeline:

1. **Document Processing** — Convert documents to Markdown using MarkItDown

2. **Text Chunking** — Split text at semantic boundaries (headings, paragraphs, sentences)

3. **Embedding** — Generate vector representations via OpenAI or other providers

4. **Storage** — Store chunks and embeddings in DuckDB, ChromaDB, or OpenAI Vector Stores

5. **Retrieval** — Find relevant chunks using similarity search or BM25

## Usage

```python

from raghilda.store import DuckDBStore

from raghilda.embedding import EmbeddingOpenAI

from raghilda.scrape import find_links

from raghilda.read import read_as_markdown

from raghilda.chunker import MarkdownChunker

# Create a store with embeddings

store = DuckDBStore.create(

    location="chatlas.db",

    embed=EmbeddingOpenAI(),

)

# Find and index pages from the chatlas documentation

links = find_links("https://posit-dev.github.io/chatlas/")

chunker = MarkdownChunker()

for link in links:

    document = read_as_markdown(link)

    chunked_document = chunker.chunk(document)

    store.upsert(chunked_document)

# Build indexes before retrieval

store.build_index()

# Retrieve relevant chunks

chunks = store.retrieve("How do I stream a response?", top_k=5)

for chunk in chunks:

    print(chunk.text)

```

## Links

- [Documentation](https://posit-dev.github.io/raghilda/)

- [Source Code](https://github.com/posit-dev/raghilda)

- [PyPI](https://pypi.org/project/raghilda/)

- [Report Issues](https://github.com/posit-dev/raghilda/issues)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/posit-dev/raghilda

Awesome Lists containing this project

README