An open API service indexing awesome lists of open source software.

https://github.com/fork123aniket/llm-rag-powered-qa-app

A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
https://github.com/fork123aniket/llm-rag-powered-qa-app

context-aware-system eleutherai fine-tuning large-language-models llm-inference llm-serving llm-training llmops parameter-efficient-fine-tuning question-answering ray ray-serve retrieval-augmented-generation

Last synced: 2 months ago
JSON representation

A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App

Awesome Lists containing this project

README

          

# ๐Ÿš€End-to-End LLM-based Scalable RAG-powered QA App

This repo implements a _production-ready, scalable_ ***Retrieval Augmented Generation (RAG)-powered LLM-based Open Generative (or Extractive) context-aware Question-Answering (QA)*** App that:

- Takes as input a new `query` (or `question`);
- Implements ***vector similarity search*** within the ***embedding space*** by seeking ***relevant contexts*** corresponding to the incoming `query` in the ***vector database***;
- Passes the ***relevant contexts*** as well as the `input query` to LLM;
- LLM then produces the `answer` to the input `query` while being aware of the ***relevant contexts*** related to the requested `query`.

This project also includes `Fine-tuning` a `20B` parameters **_Large Language Model (LLM)_** in a multi-GPU cluster environment by leveraging the _distributed training_ paradigm. Moreover, this repo develops _scalable_ major _ML workloads_ for `contexts` (`load`, `embed`, and `index` the `contexts` in the _vector database_) across multiple workers with different compute resources and serves the _LLM App_ in a highly robust and scalable manner.

The below diagram shows the _architectural design_ of this ***RAG-powered LLM App***:

![App Architecture](https://github.com/fork123aniket/LLM-RAG-powered-QA-App/assets/92912434/0387ac34-c876-4987-9400-9c0b9acc2934)

## Requirements
- `Python`
- `Streamlit`
- `PEFT` (for Parameter-Efficient Fine-Tuning)
- `Accelerate`
- `Ray` (for distributed LLM Fine-Tuning)
- `Datasets`
- `Transformers`
- `PyTorch`
- `Numpy`
- `Scikit-Learn`
- `Deta` (To access Deta Vector Database)
- `LangChain`
- `FastAPI` (To serve production-ready LLM App)

## Data
[***Squad***](https://huggingface.co/datasets/squad/viewer/plain_text/train?row=0) dataset is used to fine-tune [***Eleuther AI's GPT-Neo 20B***](https://huggingface.co/EleutherAI/gpt-neox-20b) LLM model, which comprises `Title`, `Question`, `Answer`, and `Context` for each of the `98.2k` dataset `IDs`.

## LLM Training and Serving
- The `Fine-Tuning` process for `GPT-Neo` LLM model can be found in `finetune.py` file.
- The code to create ***RAG-powered LLM Agent*** for `QA` task can be seen in `qa_agent.py` file.
- To build the ***agent*** as ***production-ready API*** for `QA` task, it's worth delving deep into `serve.py` file.
- To seek prospects of using `Streamlit` to deploy the LLM app, head to `streamlit.py` file.
- All hyperparameters to control `fine-tuning` of the model are provided in the given `config.py` file.

## App Usage
To learn more about how to use this ***LLM RAG-powered QA App***, consider watching the following video:

[RAG-powered LLM App.webm](https://github.com/fork123aniket/LLM-RAG-powered-QA-App/assets/92912434/c003342c-c337-44d1-a29d-b6c554eaabf9)