Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/takeshape/unstructured-rag-example

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/takeshape/unstructured-rag-example
Owner: takeshape
Created: 2024-11-14T17:46:52.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-11-18T17:49:24.000Z (2 months ago)
Last Synced: 2024-11-18T18:51:42.552Z (2 months ago)
Size: 2.93 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Unstructured RAG Example

## Instructions

### Create a new TakeShape Project
1. Click [Deploy to TakeShape](https://app.takeshape.io/add-pattern?repo=https://github.com/takeshape/unstructured-rag-example).
1. Select "Create new project" from the dropdown
1. Enter a name for the new project or leave the default
1. Click "Add to TakeShape"

### Add your API keys

#### Unstructured

Follow the directions in the [TakeShape Unstructured documentation](https://app.takeshape.io/docs/services/providers/unstructured) to connect Unstructured API

#### OpenAI

1. Create an OpenAI API Key https://platform.openai.com/api-keys with the "Models" and "Model capabilities" permissions (this example uses `/v1/embeddings` and `/v1/chat/completions`)
2. Copy/paste your API key into the service configuration dialog
3. Click "Save"

### Try it out!
Once your services are connected now try out the API in the API Explorer

```graphql
mutation {
chat(input: "what shoes should I buy?")
}
```

```graphql
query {
getRelatedDocumentList(text: "what shoes should I buy?") {
items {
filename
chunks {
text
}
}
total
}
}
```

## How it works

This example demonstrates how to use TakeShape's vector capabilities combined with indexing to enable the RAG use-case. A prerequisite for RAG is to populate a vector database, for this example we will use TakeShape's built-in index. The first step to preparing our data is to create the `Document` which extends the built-in Asset shape from TakeShape but adds `chunks` an array of text chunks and their corresponding vector embeddings.
```mermaid
sequenceDiagram
participant TakeShape
participant Unstructured
participant OpenAI
participant Index

TakeShape->>Unstructured: 1. Assets uploaded to TakeShape are sent to Unstructured
Unstructured->>TakeShape: 2. Unstructured parses and chunks the documents
TakeShape->>OpenAI: 3. Create embeddings the text chunks
OpenAI->>TakeShape: 4. OpenAI returns embeddings
TakeShape->>Index: 5. Text chunks and embeddings in TakeShape Index
```

Now that our `Document` data is stored in the built-in index we can perform RAG:
```mermaid
sequenceDiagram
actor User
participant TakeShape
participant Index
participant OpenAI

User->>TakeShape: 1. GraphQL containing user prompt
TakeShape->>OpenAI: 2. Create embedding of user prompt
OpenAI->>TakeShape: 3. OpenAI returns vector
TakeShape->>Index: 4. Use vector to search TakeShape Index
Index->>TakeShape: 5. Return related products
TakeShape->>OpenAI: 6. Combine related results with original prompt and send to GPT 4o
OpenAI->>TakeShape: 7. OpenAI returns generated text
TakeShape->>User: 8. GraphQL response
```