https://github.com/fork123aniket/llm-rag-powered-qa-app
A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
https://github.com/fork123aniket/llm-rag-powered-qa-app
context-aware-system eleutherai fine-tuning large-language-models llm-inference llm-serving llm-training llmops parameter-efficient-fine-tuning question-answering ray ray-serve retrieval-augmented-generation
Last synced: 2 months ago
JSON representation
A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
- Host: GitHub
- URL: https://github.com/fork123aniket/llm-rag-powered-qa-app
- Owner: fork123aniket
- License: mit
- Created: 2024-01-02T02:34:01.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-27T22:04:00.000Z (9 months ago)
- Last Synced: 2025-04-07T04:33:50.963Z (6 months ago)
- Topics: context-aware-system, eleutherai, fine-tuning, large-language-models, llm-inference, llm-serving, llm-training, llmops, parameter-efficient-fine-tuning, question-answering, ray, ray-serve, retrieval-augmented-generation
- Language: Python
- Homepage:
- Size: 22.5 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐End-to-End LLM-based Scalable RAG-powered QA App
This repo implements a _production-ready, scalable_ ***Retrieval Augmented Generation (RAG)-powered LLM-based Open Generative (or Extractive) context-aware Question-Answering (QA)*** App that:
- Takes as input a new `query` (or `question`);
- Implements ***vector similarity search*** within the ***embedding space*** by seeking ***relevant contexts*** corresponding to the incoming `query` in the ***vector database***;
- Passes the ***relevant contexts*** as well as the `input query` to LLM;
- LLM then produces the `answer` to the input `query` while being aware of the ***relevant contexts*** related to the requested `query`.This project also includes `Fine-tuning` a `20B` parameters **_Large Language Model (LLM)_** in a multi-GPU cluster environment by leveraging the _distributed training_ paradigm. Moreover, this repo develops _scalable_ major _ML workloads_ for `contexts` (`load`, `embed`, and `index` the `contexts` in the _vector database_) across multiple workers with different compute resources and serves the _LLM App_ in a highly robust and scalable manner.
The below diagram shows the _architectural design_ of this ***RAG-powered LLM App***:

## Requirements
- `Python`
- `Streamlit`
- `PEFT` (for Parameter-Efficient Fine-Tuning)
- `Accelerate`
- `Ray` (for distributed LLM Fine-Tuning)
- `Datasets`
- `Transformers`
- `PyTorch`
- `Numpy`
- `Scikit-Learn`
- `Deta` (To access Deta Vector Database)
- `LangChain`
- `FastAPI` (To serve production-ready LLM App)## Data
[***Squad***](https://huggingface.co/datasets/squad/viewer/plain_text/train?row=0) dataset is used to fine-tune [***Eleuther AI's GPT-Neo 20B***](https://huggingface.co/EleutherAI/gpt-neox-20b) LLM model, which comprises `Title`, `Question`, `Answer`, and `Context` for each of the `98.2k` dataset `IDs`.## LLM Training and Serving
- The `Fine-Tuning` process for `GPT-Neo` LLM model can be found in `finetune.py` file.
- The code to create ***RAG-powered LLM Agent*** for `QA` task can be seen in `qa_agent.py` file.
- To build the ***agent*** as ***production-ready API*** for `QA` task, it's worth delving deep into `serve.py` file.
- To seek prospects of using `Streamlit` to deploy the LLM app, head to `streamlit.py` file.
- All hyperparameters to control `fine-tuning` of the model are provided in the given `config.py` file.## App Usage
To learn more about how to use this ***LLM RAG-powered QA App***, consider watching the following video:[RAG-powered LLM App.webm](https://github.com/fork123aniket/LLM-RAG-powered-QA-App/assets/92912434/c003342c-c337-44d1-a29d-b6c554eaabf9)