{"id":22466712,"url":"https://github.com/blib-la/ask-poddy","last_synced_at":"2025-08-02T07:31:05.053Z","repository":{"id":244268712,"uuid":"814693742","full_name":"blib-la/ask-poddy","owner":"blib-la","description":"Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)","archived":false,"fork":false,"pushed_at":"2024-07-19T13:47:35.000Z","size":7527,"stargazers_count":10,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-08T15:46:18.632Z","etag":null,"topics":["ai","embedding","endpoint","infinity","llm","nextjs","openai","rag","runpod","serverless","vllm","worker"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blib-la.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-13T14:06:17.000Z","updated_at":"2025-01-29T23:01:50.000Z","dependencies_parsed_at":"2024-06-13T19:15:13.375Z","dependency_job_id":"b3bb0d35-c676-4ff4-8cf5-b2b8ff353997","html_url":"https://github.com/blib-la/ask-poddy","commit_stats":null,"previous_names":["blib-la/ask-poddy"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/blib-la/ask-poddy","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blib-la%2Fask-poddy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blib-la%2Fask-poddy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blib-la%2Fask-poddy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blib-la%2Fask-poddy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blib-la","download_url":"https://codeload.github.com/blib-la/ask-poddy/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blib-la%2Fask-poddy/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":268348610,"owners_count":24236297,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-02T02:00:12.353Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","embedding","endpoint","infinity","llm","nextjs","openai","rag","runpod","serverless","vllm","worker"],"created_at":"2024-12-06T10:13:19.809Z","updated_at":"2025-08-02T07:31:04.385Z","avatar_url":"https://github.com/blib-la.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eAsk Poddy\u003c/h1\u003e\n\n![A screenshot of the Ask Poddy web app showing a chat between the user and the AI](./assets/20240610_screenshot_ask_poddy_what_is_a_network_volume.png)\n\n**Ask Poddy** _(named after [\"Poddy\"](./public/poddy.png), the [RunPod](https://runpod.io) bot on\n[Discord](https://discord.gg/cUpRmau42V))_ is a user-friendly RAG (Retrieval-Augmented Generation)\nweb application designed to showcase the ease of setting up OpenAI-compatible APIs using open-source\nmodels running serverless on [RunPod](https://runpod.io). Built with [Next.js](https://nextjs.org/),\n[React](https://reactjs.org/), [Tailwind](https://tailwindcss.com/),\n[Vercel AI SDK](https://sdk.vercel.ai/docs/introduction), and\n[LangChain](https://js.langchain.com/), it uses\n[Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as LLM and\n[multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct) for\ntext embeddings.\n\nThis tutorial will guide you through deploying **Ask Poddy** in your environment, enabling it to\nanswer questions related to [RunPod](https://runpod.io) effectively, by leveraging the open-source\nworkers [worker-vllm](https://github.com/runpod-workers/worker-vllm) and\n[worker-infinity-embedding](https://github.com/runpod-workers/worker-infinity-embedding).\n\n---\n\n\u003cbr /\u003e\n\n\u003c!-- toc --\u003e\n\n-   [Concept](#concept)\n-   [Tutorial: Setting Up \"Ask Poddy\" in Your Environment](#tutorial-setting-up-ask-poddy-in-your-environment)\n    -   [Prerequisites](#prerequisites)\n    -   [1. Clone the Repository](#1-clone-the-repository)\n    -   [2. Install Dependencies](#2-install-dependencies)\n    -   [3. Set Up RunPod Serverless Endpoints](#3-set-up-runpod-serverless-endpoints)\n        -   [3.1 Network Volumes](#31-network-volumes)\n        -   [3.2 Worker-vLLM Endpoint](#32-worker-vllm-endpoint)\n        -   [3.3 Worker-Infinity-Embedding Endpoint](#33-worker-infinity-embedding-endpoint)\n    -   [4. Configure Environment Variables](#4-configure-environment-variables)\n    -   [5. Populate the Vector Store](#5-populate-the-vector-store)\n    -   [6. Start the Local Web Server](#6-start-the-local-web-server)\n    -   [7. Ask Poddy](#7-ask-poddy)\n\n\u003c!-- tocstop --\u003e\n\n\u003cbr /\u003e\n\n---\n\n## Concept\n\n**Ask Poddy** is designed to demonstrate the integration of serverless OpenAI-compatible APIs with\nopen-source models. The application runs locally (but it could also be deployed into the cloud),\nwhile the computational heavy lifting is handled by serverless endpoints on\n[RunPod](https://runpod.io). This architecture allows seamless use of existing OpenAI-compatible\ntools and frameworks without needing to develop custom APIs.\n\nHere's how RAG works in **Ask Poddy**:\n\n![Diagram showing how the RAG process works](./assets/20240613_diagram_rag.png)\n\n1. **User**: Asks a question.\n2. **Vector Store**: The question is sent to LangChain, which uses the\n   [worker-infinity-embedding](https://github.com/runpod-workers/worker-infinity-embedding) endpoint\n   to convert the question into an embedding using the\n   [multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)\n   model.\n3. **Vector Store**: Performs a similarity search to find relevant documents based on the question.\n4. **AI SDK**: The retrieved documents and the user's question are sent to the\n   [worker-vllm](https://github.com/runpod-workers/worker-vllm) endpoint.\n5. **worker-vllm**: Generates an answer using the\n   [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) model.\n6. **User**: Receives the answer.\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!TIP] \n\u003e You can [choose any of the supported models](https://docs.vllm.ai/en/latest/models/supported_models.html) that come with [vLLM](https://github.com/vllm-project/vllm). \n\u003c!-- prettier-ignore-end --\u003e\n\n\u003cbr /\u003e\n\n---\n\n## Tutorial: Setting Up \"Ask Poddy\" in Your Environment\n\n### Prerequisites\n\n-   [git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) installed\n-   [Node.js and npm](https://nodejs.org/en) installed\n-   [RunPod](https://www.runpod.io/) account\n\n### 1. Clone the Repository\n\n1. Clone the **Ask Poddy** repository and go into the cloned directory:\n\n```bash\ngit clone https://github.com/blib-la/ask-poddy.git\ncd ask-poddy\n```\n\n2. Clone the [RunPod docs](https://github.com/runpod/docs) repository into\n   `ask-poddy/data/runpod-docs`.\n\n```bash\ngit clone https://github.com/runpod/docs.git ./data/runpod-docs\n```\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!NOTE] \n\u003e The [RunPod docs](https://github.com/runpod/docs) repository contains the [RunPod documentation](https://docs.runpod.io) that **Ask Poddy** will use to answer\n\u003e questions.\n\u003c!-- prettier-ignore-end --\u003e\n\n3. Copy the `img` folder from `./data/runpod-docs/static/img` to `./public`\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!NOTE] \n\u003e This makes it possible for **Ask Poddy** to include images from the [RunPod documentation](https://docs.runpod.io).\n\u003c!-- prettier-ignore-end --\u003e\n\n\u003cbr /\u003e\n\n### 2. Install Dependencies\n\nNavigate to the `ask-poddy` directory and install the dependencies:\n\n```bash\nnpm install\n```\n\n\u003cbr /\u003e\n\n### 3. Set Up RunPod Serverless Endpoints\n\n#### 3.1 Network Volumes\n\n1. Create two network volumes with 15GB storage each in the same data center as the serverless\n   endpoints.\n    - Volume for embeddings: `infinity_embeddings`\n    - Volume for LLM: `vllm_llama3`\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!NOTE] \n\u003e Using network volumes ensures that the models and embeddings are stored persistently, allowing for\n\u003e faster subsequent requests as the data does not need to be downloaded or recreated each time.\n\u003c!-- prettier-ignore-end --\u003e\n\n#### 3.2 Worker-vLLM Endpoint\n\n1. [Follow the guide for setting up the vLLM endpoint](https://docs.runpod.io/serverless/workers/vllm/get-started),\n   but make sure to use the `meta-llama/Meta-Llama-3-8B-Instruct` model instead of the one mentioned\n   in the guide. And also make sure to select the network volume `vllm_llama3` when creating the\n   endpoint.\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!TIP] \n\u003e The worker is using [worker-vllm](https://github.com/runpod-workers/worker-vllm).\n\u003c!-- prettier-ignore-end --\u003e\n\n#### 3.3 Worker-Infinity-Embedding Endpoint\n\n1. [Create a new template](https://docs.runpod.io/pods/templates/manage-templates#creating-a-template)\n2. Use the Docker image `runpod/worker-infinity-embedding:stable-cuda12.1.0` from\n   [worker-infinity-embedding](https://github.com/runpod-workers/worker-infinity-embedding) and set\n   the environment variable `MODEL_NAMES` to `intfloat/multilingual-e5-large-instruct`.\n3. [Create a serverless endpoint](https://docs.runpod.io/serverless/workers/get-started#deploy-a-serverless-endpoint)\n   and make sure to select the network volume `infinity_embeddings`.\n\n\u003cbr /\u003e\n\n### 4. Configure Environment Variables\n\n1. [Generate your RunPod API key](https://docs.runpod.io/get-started/api-keys)\n2. Find the endpoint IDs underneath the\n   [deployed serverless endpoints](https://www.runpod.io/console/serverless).\n\n\u003cimg src=\"./assets/20240612_screenshot_id_of_worker.png\" alt=\"Screenshot showing the ID of the worker underneath the title\" width=\"550\"\u003e\n\n3. Create your `.env.local` based on [.env.local.example](./.env.local.example) or by creating a\n   file with the following variables:\n\n```bash\nRUNPOD_API_KEY=your_runpod_api_key\nRUNPOD_ENDPOINT_ID_VLLM=your_vllm_endpoint_id\nRUNPOD_ENDPOINT_ID_EMBEDDING=your_embedding_endpoint_id\n```\n\n\u003cbr /\u003e\n\n### 5. Populate the Vector Store\n\nTo populate the vector store, run the following command:\n\n```bash\nnpm run populate\n```\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!NOTE] \n\u003e The first run will take some time as the worker downloads the embeddings model\n\u003e ([multilingual-e5-large-instruct](https://huggingface.co/intfloat/multilingual-e5-large-instruct)).\n\u003e Subsequent requests will use the downloaded model stored in the network volume.\n\u003c!-- prettier-ignore-end --\u003e\n\nThis command reads all markdown documents from the `ask-poddy/data/runpod-docs/` folder, creates\nembeddings using the embedding endpoint running on RunPod, and stores these embeddings in the local\nvector store:\n\n![Diagram showing how the vector store gets populated with documents](./assets/20240613_diagram_populate_vector_store.png)\n\n1. **Documents**: The markdown documents from the `ask-poddy/data/runpod-docs/` folder are read by\n   LangChain.\n2. **Chunks**: LangChain converts the documents into smaller chunks, which are then sent to the\n   `worker-infinity-embedding` endpoint.\n3. **worker-infinity-embedding**: Receives chunks, generates embeddings using the\n   `multilingual-e5-large-instruct` model, and sends them back.\n4. **Vector Store**: LangChain saves these embeddings in the local vector store (`HNSWlib`).\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!TIP] \n\u003e A vector store is a database that stores embeddings (vector representations of text) to\n\u003e enable efficient similarity search. This is crucial for the RAG process as it allows the system to\n\u003e quickly retrieve relevant documents based on the user's question.\n\u003c!-- prettier-ignore-end --\u003e\n\n\u003cbr /\u003e\n\n### 6. Start the Local Web Server\n\n1. Start the local web server:\n\n```bash\nnpm run dev\n```\n\n2. Open http://localhost:3000 to access the UI.\n\n\u003cbr /\u003e\n\n### 7. Ask Poddy\n\nNow that everything is running, you can ask your [RunPod](https://runpod.io)-related question, like:\n\n-   What is RunPod?\n-   How do I create a serverless endpoint?\n-   What are the benefits of using a network volume?\n-   How can I become a host for the community cloud?\n-   Can RunPod help my startup to get going?\n\n\u003c!-- prettier-ignore-start --\u003e\n\u003e [!NOTE]\n\u003e The first run will take some time as the worker downloads the LLM\n\u003e ([Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)).\n\u003e Subsequent requests will use the downloaded model stored in the network volume.\n\u003c!-- prettier-ignore-end --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblib-la%2Fask-poddy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblib-la%2Fask-poddy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblib-la%2Fask-poddy/lists"}