https://tidyverse.github.io/ragnar/
https://tidyverse.github.io/ragnar/
Last synced: 10 days ago
JSON representation
- Host: GitHub
- URL: https://tidyverse.github.io/ragnar/
- Owner: tidyverse
- License: other
- Created: 2025-01-20T19:30:06.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2025-03-27T15:43:14.000Z (20 days ago)
- Last Synced: 2025-04-02T02:37:37.111Z (14 days ago)
- Language: R
- Homepage: https://tidyverse.github.io/ragnar/
- Size: 14.4 MB
- Stars: 39
- Watchers: 4
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- awesome-generative-ai-data-scientist - Ragnar - Augmented Generation (RAG) workflows. | [Website](https://tidyverse.github.io/ragnar/) | (RAG in R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# ragnar
[](https://github.com/tidyverse/ragnar/actions/workflows/R-CMD-check.yaml)
`ragnar` is an R package that helps implement Retrieval-Augmented
Generation (RAG) workflows. It focuses on providing a complete solution
with sensible defaults, while still giving the knowledgeable user
precise control over each steps. We don't believe that you can fully
automate the creation of a good RAG system, so it's important that
`ragnar` is not a black box. `ragnar` is designed to be transparent—you
can inspect easily outputs at intermediate steps to understand what's
happening.## Installation
``` r
pak::pak("tidyverse/ragnar")
```## Key Steps
### 1. Document Processing
`ragnar` works with a wide variety of document types, using
[MarkItDown](https://github.com/microsoft/markitdown) to convert content
to Markdown.Key functions:
- `ragnar_find_links()`: Find all links in a webpage
- `ragnar_read()`: Convert a file or URL to markdown### 2. Text Chunking
Next we divide each document into multiple chunks. Ragnar defaults to a
strategy that preserves some of the semantics of the document, but
provide plenty of options to tweak the approach.Key functions:
- `ragnar_chunk()`: Higher-level function that both identifies
semantic boundaries and chunks text.
- `ragnar_segment()`: Lower-level function that identifies semantic
boundaries.
- `ragnar_chunk_segments()`: Lower-level function that chunks
pre-segmented text.### 3. Context Augmentation (Optional)
RAG applications benefit from augmenting text chunks with additional
context, such as document headings and subheadings. While `ragnar`
doesn't directly export functions for this, it supports template-based
augmentation through `ragnar_read(frame_by_tags, split_by_tags)`. Future
versions will support generating context summaries via LLM calls.Key functions:
- `ragnar_read()`: Use `frame_by_tags` and/or `split_by_tags`
arguments to associate text chunks with their document position.
- `markdown_segment()`: Segment markdown text into a character vector
using semantic tags (e.g., headings, paragraphs, or code chunks).
- `markdown_frame()`: Convert markdown text into a dataframe.### 4. Embedding
`ragnar` can help compute embeddings for each chunk. The goal is for
`ragnar` to provide access to embeddings from popular LLM providers.
Currently only `ollama` and `openai` providers.Key functions:
- `embed_ollama()`
- `embed_openai()`Note that calling the embedding function directly is typically not
necessary. Instead, the embedding function is specified when a store is
first created, and then automatically called when needed by
`ragnar_retreive()` and `ragnar_store_insert()`.### 5. Storage
Processed data is stored in a format optimized for efficient searching,
using `duckdb` by default. The API is designed to be extensible,
allowing additional packages to implement support for different storage
providers.Key functions:
- `ragnar_store_create()`
- `ragnar_store_connect()`
- `ragnar_store_insert()`### 6. Retrieval
Given a prompt, retrieve related chunks based on embedding distance or
bm25 text search.Key functions:
- `ragnar_retrieve()`
- `ragnar_retrieve_vss()`: Retrieve using [`vss` DuckDB
extension](https://duckdb.org/docs/extensions/vss.html)
- `ragnar_retrieve_bm25()`: Retrieve using
[`full-text search DuckDB extension`](https://duckdb.org/docs/extensions/full_text_search.html)### 7. Re-ranking (Optional)
Re-ranking of retrieved chunks is planned for future releases.
### 8. Prompt Generation
`ragnar` can equip an `ellmer::Chat` object with a retrieve tool that
enables an LLM to retreive content from a store on-demand.- `ragnar_register_tool_retrieve(chat, store)`.
## Usage
Here's an example of using `ragnar` to create a knowledge store from the
*R for Data Science (2e)* book:```{r, code = readLines("examples/example-create-store.R")}
```Once the store is set up, you can then retrieve the most relevant text
chunks.```{r, code = readLines("examples/example-retrieve.R")}
```