https://github.com/pinecone-io/contextual-webinar-rag
Contextual RAG over webinar videos using Pinecone, Claude and AWS.
https://github.com/pinecone-io/contextual-webinar-rag
claude contextual-retrieval pinecone rag
Last synced: about 2 months ago
JSON representation
Contextual RAG over webinar videos using Pinecone, Claude and AWS.
- Host: GitHub
- URL: https://github.com/pinecone-io/contextual-webinar-rag
- Owner: pinecone-io
- Created: 2024-10-24T15:41:04.000Z (12 months ago)
- Default Branch: main
- Last Pushed: 2025-04-10T19:02:22.000Z (6 months ago)
- Last Synced: 2025-04-10T20:36:30.572Z (6 months ago)
- Topics: claude, contextual-retrieval, pinecone, rag
- Language: Python
- Homepage:
- Size: 23.4 MB
- Stars: 12
- Watchers: 2
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# pc-yt-rag
A simplified Contextual Video RAG implementation using Pinecone, AWS, and ClaudeEver wanted to ask questions over your video data, such as Youtube, Zoom webinars, recorded meetings, etc? This application aims to create a RAG chatbot over these content using contextual retrieval and Pinecone, AWS, and Claude.
Wanna try it live? Click here for the [deployed application](https://pinecone-contextual-rag-demo.streamlit.app/).
This branch contains the **Streamlit Web App** version of the implementation. This allows you to run a local web app to interact with the RAG chatbot, and uses a makefile to make the data preprocessing smoother. Please read the following section to ensure you have the appropriate prerequisites before proceeding.
If you'd rather work in Sagemaker Notebook, use the webinar-notebook branch above!
## Before you BeginThis repo presents the RAG solution in two ways: one using scripting and makefiles, to create a Streamlit application, and another using a notebook intended for use on Sagemaker.
You'll also need access to AWS Bedrock, Pinecone (via an API Key), and Claude specifically via Bedrock.
Finally, you need to add the videos you'd like to process under a folder called data, with a subfolder called videos. Leave them in .mp4 format. If you have access to your own Youtube channel, downloading videos from the console there will be perfect!
### Running the Scripts Locally
Before beginning, authenthicate your session with AWS using your preferred method. You can
save the access key, default region, and secret access key as environmental variables, or use
'aws sso login' if you have that setup.**You'll still need access to AWS Bedrock and Claude via Bedrock, as well as a Pinecone API Key**
To run the scripts locally, you can use the provided Makefile. Below are the available commands:
1. **Create the .env file**:
```sh
make create-env
```
This command will create the .env file for new users and prompt you to add your API keys.2. **Clean the data folder**:
```sh
make clean
```
This command will clean the data folder, removing everything except the videos. Useful for resetting the environment.3. **Create the Conda environment**:
```sh
make create-conda-env
```
This command will create the Conda environment specified in the Makefile.4. **Install dependencies**:
```sh
make install-deps
```
This command will install the required dependencies within the Conda environment.5. **Preprocess the videos**:
```sh
make preprocess
```
This command will preprocess the videos using the specified script.6. **Run the vector enrichment**:
```sh
make enrich
```
This command will run the Claude Contextual embedding step process.7. **Run the upsertion process**:
```sh
make upsert
```
This command will run the upsertion process into Pinecone.8. **Data setup process**:
```sh
make setup
```
This command will clean the data folder, create the Conda environment, install dependencies, preprocess the videos, do the Claude contextual preprocessing step, and upsert the data into Pinecone## Launching the Streamlit App
To launch the Streamlit app, use the following command:
```sh
make run-app
```This command will run the Streamlit app defined in `app.py`.
For more information on available commands, you can use:
```sh
make help
```It's easiest to run the whole pipeline (setup) and then run the Streamlit app.
From there, the Streamlit app should pop up locally and you can start querying!