https://github.com/wittyicon29/rag-over-audio-data
Python-based system designed to transcribe audio files, split the transcripts into manageable chunks, create text embeddings using HuggingFace models, and employ advanced question-answering models for retrieval-based QA.
https://github.com/wittyicon29/rag-over-audio-data
assemblyai embeddings langchain llm vector-database
Last synced: 7 months ago
JSON representation
Python-based system designed to transcribe audio files, split the transcripts into manageable chunks, create text embeddings using HuggingFace models, and employ advanced question-answering models for retrieval-based QA.
- Host: GitHub
- URL: https://github.com/wittyicon29/rag-over-audio-data
- Owner: wittyicon29
- License: mit
- Created: 2023-12-17T17:31:16.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-17T17:57:55.000Z (almost 2 years ago)
- Last Synced: 2023-12-17T18:35:51.806Z (almost 2 years ago)
- Topics: assemblyai, embeddings, langchain, llm, vector-database
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# RAG-over-Audio-Data - Audio Transcript Processing and QA System
## Introduction
This repository orchestrates a sophisticated pipeline for text processing, leveraging various libraries and modules. It begins by transcribing remote audio files, segmenting the text into manageable chunks, and embedding these chunks into a vector database for efficient retrieval. The transcription is facilitated using the AssemblyAI service, allowing for easy access to the content of the audio files.The **HuggingFaceEmbeddings** module is employed to generate text embeddings using the HuggingFace models, enabling the conversion of text into numerical representations. These embeddings are then used to construct a vector database via Chroma, optimizing the storage and retrieval of text chunks.
The **ChatOpenAI** model, a heightened iteration of GPT-3, is utilized in a question-answering setup (RetrievalQA). This allows users to input questions, which are processed by the model to retrieve relevant answers from the text database created earlier. The retrieved answers are then presented along with the source documents containing relevant content, aiding transparency and context.
Overall, this code amalgamates audio transcription, text segmentation, text embedding, vector database creation, and advanced question-answering capabilities, providing a robust framework for handling text-based queries and interactions in an automated setup.

## Components
1. **Audio Transcription**
The code utilizes the AssemblyAIAudioTranscriptLoader to transcribe audio files from remote URLs into text.2. **Text Splitting**
The transcribed texts are split into smaller chunks using RecursiveCharacterTextSplitter to facilitate processing and analysis.3. **Text Embedding**
HuggingFace embeddings (HuggingFaceEmbeddings) are used to convert text chunks into embeddings. This allows for better semantic understanding and similarity calculations.4. **Vector Database Creation**
The embedded text chunks are stored in a vector database (Chroma) for efficient retrieval and comparison.5. **Question-Answering (QA) System**
The code implements a QA system using an advanced version of the GPT-3 model (ChatOpenAI). It retrieves relevant information from the vector database based on user queries.## Installation
**Clone the repo**
```cd
git clone https://github.com/wittyicon29/RAG-over-Audio-Data.git
```**Switch to the directory**
```cd
cd RAG-over-Audio-Data
```**Create a virtual environment**
```cd
python -m venv venv
```**Install the dependencies**
```cd
pip install -r requirements.txt
```## Usage
Set up your environment variables in a **.env** file.
**Transcribing Files**: Provide remote audio file URLs in the URLs list. The provided URLs of the audio files must be publicily accessible so that AssemmbyAI API can access those audio files.
**Run the script**
```python
python main.py
```## Configuration
Adjust the chunk size and overlap in RecursiveCharacterTextSplitter for text splitting customization.
Modify model names and parameters in the make_embedder() and make_qa_chain() functions to experiment with different language models and settings.
## References
[Retrieval Augumented Generation for Audio using Langchain](https://www.assemblyai.com/blog/retrieval-augmented-generation-audio-langchain)
## LICENSE
This project is licensed under the MIT License - see the LICENSE file for details