https://github.com/pinecone-io/genqa-rag-demo

Using Pinecone, LangChain + OpenAI for Generative Q&A with Retrieval Augmented Generation (RAG).
https://github.com/pinecone-io/genqa-rag-demo

Last synced: 6 months ago
JSON representation

Using Pinecone, LangChain + OpenAI for Generative Q&A with Retrieval Augmented Generation (RAG).

Host: GitHub
URL: https://github.com/pinecone-io/genqa-rag-demo
Owner: pinecone-io
Created: 2023-06-27T15:08:59.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-08-09T20:50:44.000Z (almost 2 years ago)
Last Synced: 2024-11-12T07:34:10.789Z (8 months ago)
Language: Jupyter Notebook
Size: 25.8 MB
Stars: 14
Watchers: 7
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Generative Q&A
Using [Pinecone](https://docs.pinecone.io/docs/overview), [LangChain](https://langchain-langchain.vercel.app/) + [OpenAI](https://platform.openai.com/overview) for Generative Q&A with [Retrieval Augmented Generation (RAG)](https://ai.facebook.com/blog/retrieval-augmented-generation-streamlining-the-creation-of-intelligent-natural-language-processing-models/).

## Overview

1. Setup the knowledge base (in [Pinecone](https://www.pinecone.io/))
- Chunk the content
- Create vector embeddings from the chunks
- Load embeddings into a Pinecone index

2. Ask a question
- Create vector embedding of the question
- Find relevant context in Pinecone, looking for embeddings similar to the question
- Ask a question of OpenAI, using the relevant context from Pinecone

[TODO - add diagram]

## Setup

### Install dependencies
```console
pip install -r ./setup/requirements.txt
```

### Provide Pinecone & OpenAI API Keys
```console
cp dotenv .env
vi .env
```

### Use the notebooks to load the data into the Pinecone index (and run samnple queries)

[TODO - non-splade example is working, update the splade example]

[TODO - show sample output]

## Q&A App (using Streamlit)

### Install Dependencies
[TODO - clean up requirements.txt or pipenv]

### Run
```console
streamlit run streamlit-app.py
```
[TODO - show example screenshot]

## Next Steps

[TODO - Update to support multiple PDFs]

[TODO - add customization of look & feel]

## Background

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., … Kiela, D. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), **Advances in Neural Information Processing Systems** (Vol. 33, pp. 9459–9474). Retrieved from https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pinecone-io/genqa-rag-demo

Awesome Lists containing this project

README