{"id":18028473,"url":"https://github.com/interactivetech/pdk-llm-rag-app","last_synced_at":"2025-04-04T20:45:13.372Z","repository":{"id":221765060,"uuid":"755242588","full_name":"interactivetech/pdk-llm-rag-app","owner":"interactivetech","description":"Deployment of LLM RAG Application using PDK ","archived":false,"fork":false,"pushed_at":"2024-04-24T05:46:41.000Z","size":1784,"stargazers_count":1,"open_issues_count":1,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-10T05:24:57.480Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/interactivetech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-09T18:11:45.000Z","updated_at":"2024-02-28T17:38:44.000Z","dependencies_parsed_at":"2024-10-30T09:02:37.712Z","dependency_job_id":null,"html_url":"https://github.com/interactivetech/pdk-llm-rag-app","commit_stats":null,"previous_names":["interactivetech/pdk-llm-rag-demo-test-","interactivetech/pdk-llm-rag-app"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactivetech%2Fpdk-llm-rag-app","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactivetech%2Fpdk-llm-rag-app/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactivetech%2Fpdk-llm-rag-app/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/interactivetech%2Fpdk-llm-rag-app/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/interactivetech","download_url":"https://codeload.github.com/interactivetech/pdk-llm-rag-app/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247249601,"owners_count":20908211,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-30T08:42:17.433Z","updated_at":"2025-04-04T20:45:13.344Z","avatar_url":"https://github.com/interactivetech.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Continuous Retrieval Augmentation Generation (RAG) with the HPE MLOPs Platform\n \nAuthor: andrew.mendez@hpe.com\n\nThis is a proof of concept showing how developers can create a Retrieval Augmentation Generation (RAG) system using Pachyderm and Determined AI.\nThis is a unique RAG system sitting on top of the HPE MLOPs platform, which is a combination of Pachyderm and Determined.AI. A RAG system built on top of an MLOPs platform allowing developers to continuously update and deploy a RAG application as more data is ingested.\nWe also provide an example of how developers can automatically trigger finetuning an LLM on a instruction tuning dataset.\n\nWe use the following technologies to implement the RAG System:\n* ChromaDB for the vector database\n* Chainlit for the User Interface\n* Mistral-7B-Instruct for the large language model (LLM)\n* DeterminedAI for finetuning the LLM\n* Pachyderm to manage dataset versioning and pipeline orchestration.\n\n# Pre-requisite\n* This Demo requires running on a GPU. We support running an A100 80GB GPU or a Tesla T4 16GB GPU.\n* This demo currently only supports deployment and running on the Houston Cluster\n* This Demo assumes you have Pachyderm and DeterminedAI installed on top of Kubernetes. A guide will be provided soon to show how to install Pachyderm and Kubernetes.\n* If you have a machine with GPUs, you can install PDK using this guide: https://github.com/interactivetech/pdk-install\n* [WIP,Coming soon] We will modify the installation steps to also support installation on a PDK GCP cluster\n\n# Overview\n- [Quickstart Installation](#quickstart-installation)\n- [Location of pachyderm pipelines](#location-of-pachyderm-pipelines)\n- [Notebooks included in this demo](#notebooks-included-in-this-demo)\n- [Detailed Installation Steps](#detailed-installation-steps)\n- [Bring your own documents](#bring-your-own-documents)\n- [Bulid your own containers](#bulid-your-own-containers)\n- [Bring your own Huggingface model](#bring-your-own-huggingface-model)\n- [Bring your own Sentence Transformer model](#bring-your-own-sentence-transformer-model)\n\n# Quickstart Installation\n\n* Create new notebook on the Houston cluster using the `pdk-llm-rag-demo` template, you can select one gpu or no gpu.\n* In your `shared_nb/01 - Users/\u003cUSER_NAME\u003e` create a terminal and run`git clone ttps://github.com/interactivetech/pdk-llm-rag-demo-test-.git`\n\n* Open the `Deploy RAG with PDK.pynb`, and it should run out-of-the-box.\n* Note: The default to deploy the TitanML pod is using an A100 (using the taint `A100-MLDM`), if you want to change this to deploy to a T4, do the following:\n    * go to `src/scripts/deploy_app.sh`\n    GPU_DEVICE=A100-MLDM\n    * update # GPU_DEVICE=Tesla-T4\n\n\n# Location of pachyderm pipelines:\n* [deploy-rag](http://mldm-pachyderm.us.rdlabs.hpecorp.net/lineage/deploy-rag)\n* [deploy-rag-finetune](http://mldm-pachyderm.us.rdlabs.hpecorp.net/lineage/deploy-rag-finetune)\n\n## Notebooks included in this demo\n* Run `Deploy RAG with PDK.pynb` to deploy a RAG system using a pretrained LLM\n* Run `Finetune and Deploy RAG with PDK.ipynb` to both finetune an LLM and deploy a finetuned model.\n\n\n# Detailed Installation Steps\nWe will show how to (in detail) setup this repo and demo for a new environment. We are assuming to create shared directories for storing model files:\n* `/nvmefs1/test_user` and `/nvmefs1/test_user/cache`\nPlease modify this to your respective environment.\n\n## Setup Jupyter Notebook in DeterminedAI environment\n* Install DeterminedAI Notebook template to use all the required python libraries to run\n* Create DeterminedAI notebook\n* Setup directory of pretrained models and vectordb\n* Modify `Deploy RAG with PDK.ipynb` notebook\n\nOpen a terminal, and make sure you create these folders within a shared file directory `/nvmefs1/test_user` and `/nvmefs1/test_user/cache`\n\nExample command to create directory: `mkdir -p /nvmefs1/test_user/cache`\n\n### Review Determined Notebook template: pdk-llm-nb-env-houston.yaml\n\n`env_files/pdk-llm-nb-env-houston.yaml` is a configured template to setup all the packages needed to run the notebook demos. This template mounts the host `/nvmefs1` directory to the determined notebook and training job. You do not need to modify this file if you are running on houston.\n\n### Install Determined Notebook template to use all the required python libraries to run\nNOTE: You do not need to run this step on houston, as the notebook template `pdk-llm-rag-demo` is already accessible\n\nIf in an new environment, change directory to the directory of this project, and run the command \n\n`det template create pdk-llm-rag-env env_files/pdk-llm-nb-env-houston.yaml`\n\nnext, create Notebook with No GPUs, or One GPU\n\nCreate a notebook using the `pdk-llm-rag-demo` template.\n\n\n## Setup directory of pretrained models and vectordb\nMake sure you have a PDK deployment with a shared file directory\n\n`mkdir -p /nvmefs1/test_user/cache`\n\nCreate a directory that will store the persistent vector database. This will be used for the add_to_vector pipeline (defined in Deploy RAG with PDK.ipynb) \n\n`mkdir -p /nvmefs1/test_user/cache/rag_db/`\nWe need to create a folder for  ChromaDB cache:\n\n`mkdir -p /nvmefs1/test_user/cache/chromadb_cache `\n\nWe need to cache the embedding model to vectorize our data into ChromaDB\n\n` mkdir -p  /nvmefs1/test_user/cache/vector_model/all-MiniLM-L6-v2`\n\nWe need to cache our LLM Model and Tokenizer\n\n` mkdir -p  /nvmefs1/test_user/cache/model/mistral_7b_model_tokenizer`\n\nTo prevent any interruption downloading, we will create a separate cache folder when first downloading the model\n(We can delete this after successfully saving)\n` mkdir -p  /nvmefs1/test_user/cache/model_cache/mistral_7b_model_tokenizer`\n\nFinally, create a directory for the titanml cache:\n\n`mkdir -p /nvmefs1/test_user/cache/titanml_cache`\n\nRun the code in this notebook `env/Download_Vector_Embedding.ipynb` to download the embedding model to `/nvmefs1/test_user/cache/vector_model/all-MiniLM-L6-v2` \n\nRun the code in this notebook `env/Download_and_cache_Mistral_7B_model.ipynb` to run to download mistral 7B model t `/nvmefs1/test_user/cache/model/mistral_7b_model_tokenizer` \n\n\n## Get IPs to deploy RAG Application.\n\nWe need two IP Addresses that will allocate on the Houston Kubernetes Cluster. One IP will be used to deploy the TitanML API Service, and the user will deploy the user interface.\n\n\n### Get the first IP\nyou will need two dedicated IPs that can persist on the houston cluster. Here are the steps I recommend running to make sure you can get IPs to use for the cluster. \n\nCreate a temporary pod on Houston Cluster:\n```bash\nkubectl apply -f - \u003c\u003cEOF\napiVersion: v1\nkind: Pod\nmetadata:\n  name: jupyter1\n  labels:\n    name: jupyter1\nspec:\n  containers:\n  - name: ubuntu\n    image: ubuntu:latest\n    command: [\"/bin/sh\", \"-c\"]\n    args:\n      - echo starting;\n        apt-get update;\n        apt-get install -y python3 python3-pip;\n        pip install jupyterlab;\n        jupyter lab --ip=0.0.0.0 --port=8080 --NotebookApp.token='' --NotebookApp.password='' --allow-root\n    ports:\n    - containerPort: 8080\n      hostPort: 8080\nEOF\n```\n\nOnce this pod is running, run the command to assign the next available IP on the houston cluster\n\n```bash\nkubectl expose pod jupyter1 --port 8080 --target-port 8080 --type LoadBalancer\n```\n\nThen run this command to see what IP was allocated:\n```bash\nkubectl get svc jupyter1\n```\n\nAnd see the output:\n```bash\n[andrew@mlds-mgmt ~]$ kubectl get svc jupyter1\nNAME       TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE\njupyter1   LoadBalancer   10.43.186.18   10.182.1.51   8080:31685/TCP   5s\n```\n\nWe see that the ip address `10.182.1.51` is allocated, so save this IP address for the TitanML deployment. \n\n### Get the Second IP for the User Interface Pod\n\nCreate another temporary pod on Houston Cluster:\n\n```bash\nkubectl apply -f - \u003c\u003cEOF\napiVersion: v1\nkind: Pod\nmetadata:\n  name: jupyter2\n  labels:\n    name: jupyter2\nspec:\n  containers:\n  - name: ubuntu\n    image: ubuntu:latest\n    command: [\"/bin/sh\", \"-c\"]\n    args:\n      - echo starting;\n        apt-get update;\n        apt-get install -y python3 python3-pip;\n        pip install jupyterlab;\n        jupyter lab --ip=0.0.0.0 --port=8080 --NotebookApp.token='' --NotebookApp.password='' --allow-root\n    ports:\n    - containerPort: 8080\n      hostPort: 8080\nEOF\n```\n\nOnce this is running, then run the command to assign the next available IP\n\n```bash\nkubectl expose pod jupyter2 --port 8080 --target-port 8080 --type LoadBalancer\n```\n\nThen run this command:\n```bash\nkubectl get svc jupyter2\n```\n\nAnd see the output:\n```bash\n[andrew@mlds-mgmt ~]$ kubectl get svc jupyter2\nNAME       TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE\njupyter2   LoadBalancer   10.43.131.94   10.182.1.54   8080:31934/TCP   4s\n```\nWe see that the ip address `10.182.1.54` is allocated, so save this IP address for the TitanML deployment. \n\nSo we will use `10.182.1.51` for TitanML deployment, and `10.182.1.54` for the user interface deployment. \n\nClean up pods and svcs by running the command:\n\n```bash\nkubectl delete pod/jupyter1 \u0026\u0026 kubectl delete pod/jupyter2 \u0026\u0026 kubectl delete svc/jupyter1 \u0026\u0026 kubectl delete svc/jupyter2\n```\n\n# Modify `Deploy RAG with PDK.ipynb` notebook\n\nThis notebook allows SE's to drive how to continuosly update a vector database with new documents.\n\nYou will need to modify the notebook if you have a custom directory that is not `/nvmefs1/test_user/cache/`\n\nBackground: There are two data repos:\n\n* code: this is the repo that has all your code for preprocessing, training, and deployment. Code can be shown in the `src/` folder\n* data: this includes all the raw XML files that contain HPE press releases\n\nBackground: Overview of pipelines we will deploy:\n\n* **process_xml**: This runs `src/py/process_xmls.py` script to extract the text from the raw xml files, and save them into `/pfs/out/hpe_press_releases.csv`\n* **add_to_vector_db**: This runs `src/py/seed.py` that takes `hpe_press_releases.csv` as input and indexes it to the vector database. NOTE: We are persisting the vector db as a folder in the directory you created `/nvmefs1/test_user/cache/rag_db/`\n* **deploy**: This runs a runner script `src/scripts/generate_titanml_and_ui_pod_check.sh`. This script deploys the LLM located at `/nvmefs1/test_user/cache/model/mistral_7b_model_tokenizer` to TitanML. TitanML does some efficient optimization so that models only uses 8.4GB on a GPU. \n\n\n### Modify the **add_to_vector_db** pipline\nCan leave the **process_xml** pipline as is. No need to modify, will support any environment\n\nWe will need to modify the **add_to_vector_db** pipeline yaml definition.\n\nIn jupyter notebook cell, make sure you modify te --path-to-db to the correct location:\n```yaml\ntransform:\n    image: mendeza/python38_process:0.2\n    cmd: \n        - '/bin/sh'\n    stdin: \n    - 'python /pfs/code/src/py/seed.py --path_to_db /nvmefs1/test_user/cache/rag_db/\n    --csv_path /pfs/process_xml/hpe_press_releases.csv\n    --emb_model_path /run/determined/workdir/shared_fs/cache/vector_model/all-MiniLM-L6-v2'\n    - 'echo \"$(openssl rand -base64 12)\" \u003e /pfs/out/random_file.txt'\n```\n\n### Modify the **src/scripts/generate_titanml_and_ui_pod_check.sh** script\n\ngo to `src/scripts/deploy_app.sh` and modify several variables:\n* `TITANML_POD_NAME`\n* `TITANML_CACHE_HOST`\n* `HOST_VOLUME`\n* `TAKEOFF_MODEL_NAME`\n* `DB_PATH`\n* `API_HOST`\n* `UI_IP`\n* `EMB_PATH`\n* `APP_PY_PATH`\n\nso it aligns with the current location of your shared directory:\n\n\nHere is an example values that work assuming you created and downloaded all the necessary files in `/nvmefs1/test_user/cache/` \n```bash\n# Environment variables\nROOT_DIR=/pfs/code/src/scripts/ # ROOT_DIR is the directory where the scripts reside in /pfs\n\nTITANML_POD_NAME=titanml-pod # TITANML_POD_NAME is the name of the titanml pod we are deploying\n\nTITANML_CACHE_HOST=/nvmefs1/test_user/cache/titanml_cache # TITANML_CACHE_HOST is the directory of the cache titanml needs during deployment\n\nHOST_VOLUME=/nvmefs1/ # HOST_VOLUME is the path to the root mounted directory\n\nTAKEOFF_MODEL_NAME=/nvmefs1/test_user/cache/model/mistral_7b_model_tokenizer # TAKEOFF_MODEL_NAME is the local path of a huggingface model titanml will optimize and deploy\n\nTAKEOFF_DEVICE=cuda # TAKEOFF_DEVICE specifys to use GPU Acceleration for TitanML\n\nAPI_PORT=8080\nAPI_HOST=10.182.1.48 # This should update with the IP you verified\nUI_POD_NAME=ui-pod\nUI_PORT=8080\nDB_PATH=/nvmefs1/test_user/cache/rag_db/ # DB_PATH is the path to the chromadb vector database\n\nUI_IP=10.182.1.5 0# This should update with the second IP you verified\nCHROMA_CACHE_HOST= /nvmefs1/test_user/cache/chromadb_cache\n\nEMB_PATH=/nvmefs1/test_user/cache/vector_model/e5-base-v2 \n# APP_PY_PATH is the python path used to the python script that implements the UI\n# Use /nvmefs1/ if you want fast debugging\nAPP_PY_PATH=\"/nvmefs1/shared_nb/01 - Users/andrew.mendez/2024/pdk-llm-rag-demo-test-/src/py/app.py\"\n```\n\nYou can run this as is, but if you want to deploy the TitanML API and the UI App on different IPS, change all the above values.\n\n# Bring your own documents\n\nFollow the [Bring_your_own_data.md](Bring_your_own_data.md)\n\n# Bulid your own containers\n\n* Docker container for notebok environment: mendeza/mistral-rag-env:0.0.11-pachctl\nCan build a similar container using:\n```bash\nFROM determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-mpi-0.27.1\nRUN pip install transformers==4.36.0\nRUN pip install peft accelerate bitsandbytes trl\nRUN pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117\nRUN pip install einops\nRUN curl -L https://github.com/pachyderm/pachyderm/releases/download/v2.8.2/pachctl_2.8.2_linux_amd64.tar.gz | sudo tar -xzv --strip-components=1 -C /usr/local/bin\n```\n* Docker container for TitanML Serving appllication: mendeza/takeoff-mistral:0.5\n    * To build a new `mendeza/takeoff-mistral` look at this [repo](https://github.com/interactivetech/takeoff-community) that includes:\n        * the [Dockerfile](https://github.com/interactivetech/takeoff-community/blob/main/Dockerfile) \n        * and [build_container.sh](https://github.com/interactivetech/takeoff-community/blob/main/build_container.sh)\n* Docker container for User Interface Chainlit app: mendeza/mistral-llm-rag-ui:0.0.7\n```bash\nFROM determinedai/environments:cuda-11.3-pytorch-1.12-tf-2.11-gpu-mpi-0.27.1\nRUN pip install transformers==4.36.0\nRUN pip install peft accelerate bitsandbytes trl\nRUN pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117\nRUN pip install einops chainlit==0.7.700 sentence_transformers==2.2.2 sentencepiece==0.1.99\nRUN curl -L https://github.com/pachyderm/pachyderm/releases/download/v2.8.2/pachctl_2.8.2_linux_amd64.tar.gz | sudo tar -xzv --strip-components=1 -C /usr/local/bin\n```\n* Docker container for PyTorchTrial Mistal finetuning: mendeza/mistral-rag-env:0.0.11-pachctl\n    * Can build the same container using the instructions from mendeza/mistral-rag-env:0.0.11-pachctl\n\n\n# Bring your own Huggingface model\n\nCreate a new folder for the new LLM Model and Tokenizer\n\n` mkdir -p  /nvmefs1/test_user/cache/model/mistral_7b_model_tokenizer2`\n\nTo prevent any interruption downloading, we will create a separate cache folder when first downloading the model\n(We can delete this after successfully saving)\n` mkdir -p  /nvmefs1/test_user/cache/model_cache/mistral_7b_model_tokenizer2`\n\nFollow the notebook `env/Download_and_cache_Mistral_7B_model.ipynb` and modify the path to save model at `/nvmefs1/test_user/cache/model/mistral_7b_model_tokenizer2` and the cache_dir `  /nvmefs1/test_user/cache/model_cache/mistral_7b_model_tokenizer2`\n\nYou will need to modify the `TAKEOFF_MODEL_NAME` in `src/scripts/deploy_app.sh` that points to the new local HF model\n\n# Bring your own Sentence Transformer model\n\nCreate a new folder for the new Embedding Model\n\n` mkdir -p  /nvmefs1/test_user/cache/vector_model/all-MiniLM-L6-v22`\n\nFollow the notebook `env/Download_Vector_Embedding.ipynb` and modify the path to save model at `/nvmefs1/test_user/cache/vector_model/all-MiniLM-L6-v22`\n\nYou will need to modify the `EMB_PATH` in `src/scripts/deploy_app.sh` that points to the new local HF model","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractivetech%2Fpdk-llm-rag-app","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finteractivetech%2Fpdk-llm-rag-app","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finteractivetech%2Fpdk-llm-rag-app/lists"}