{"id":26493999,"url":"https://github.com/dwarvesf/llm-hosting","last_synced_at":"2026-05-17T03:40:53.719Z","repository":{"id":235308870,"uuid":"790476214","full_name":"dwarvesf/llm-hosting","owner":"dwarvesf","description":"This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Large Language Models with an OpenAI compatible vLLM server.","archived":false,"fork":false,"pushed_at":"2024-05-08T10:31:36.000Z","size":59,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-05-09T04:31:00.183Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dwarvesf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-23T00:19:53.000Z","updated_at":"2024-05-30T05:08:37.583Z","dependencies_parsed_at":"2024-05-30T05:08:36.713Z","dependency_job_id":"2f0f9df1-bf4e-452b-b24b-1bf0757b33d6","html_url":"https://github.com/dwarvesf/llm-hosting","commit_stats":null,"previous_names":["dwarvesf/llm-hosting"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwarvesf%2Fllm-hosting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwarvesf%2Fllm-hosting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwarvesf%2Fllm-hosting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwarvesf%2Fllm-hosting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dwarvesf","download_url":"https://codeload.github.com/dwarvesf/llm-hosting/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244591451,"owners_count":20477709,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-20T09:58:33.673Z","updated_at":"2025-10-25T00:43:44.817Z","avatar_url":"https://github.com/dwarvesf.png","language":"Python","funding_links":[],"categories":["LLMs/Multimodal Models"],"sub_categories":[],"readme":"## Overview\nThis repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Large Language Models with an OpenAI compatible vLLM server using Modal.\n\n## Key Components\n1. **vllm_llama_70b.py, vllm_deepseek_coder_33b.py, vllm_llama3-8b.py, vllm_seallm_7b_v2_5.py, vllm_sqlcoder_7b_2.py, vllm_duckdb_nsql_7b.py, vllm_codeqwen_110b_v1_5.py**\n   - These scripts contain the function `openai_compatible_server()` which initiates an OpenAI compatible vLLM server by running a command that instantiates an OpenAI compatible FastAPI server..\n   - The `BASE_MODEL` variable appears to define the model path for the embedding tool, which is not shown but can be inferred from the context.\n\n2. **infinity_mxbai_embed_large_v1.py, infinity_mxbai_rerank_large_v1.py, infinity_snowflake_arctic_embed_l_335m.py**\n   - These scripts contain the function `infinity_embeddings_server()` which initiates the Infinity Embed server by running a command that utilizes the Infinity embedding tool with specified options (like CUDA device and Torch engine).\n   - The `BASE_MODEL` variable appears to define the model path for the embedding tool, which is not shown but can be inferred from the context.\n\n3. **devbox.json**\n   - This configuration file specifies the programming environment for the repository, including versions of Python, Pip, and Node.js.\n   - It also defines shell initialization hooks like activating a Python virtual environment and installing necessary Python packages, among other administration scripts.\n\n4. **.env.example**\n   - This file template shows environment variables that are likely necessary for the project to run (e.g., API keys for Infinity API and VLLM API).\n   \n## Prerequisites\nBefore diving into the project setup, make sure to:\n- [Have Devbox installed](https://www.jetify.com/devbox/docs/installing_devbox/), as it manages the development and operation environment for this project.\n- Set up necessary API keys by copying `.env.example` to `.env` and filling in the required values for `INFINITY_API_KEY` and `VLLM_API_KEY`.\n\n## Environment Setup\n1. **Initializing Development Environment with Devbox:**\n   - Enter the Devbox shell environment by running:\n     ```bash\n     devbox shell\n     ```\n   - This action will set up the environment according to the `init_hook` specified in `devbox.json`, which activates the Python virtual environment and installs the required packages.\n\n## Deployment\nThe scripts available in the repository can be deployed using the [Modal](https://modal.com/docs/examples/hello_world) tool. Deploy a script by running the corresponding command:\n```bash\nmodal deploy infinity_mxbai_embed_large_v1.py\nmodal deploy infinity_mxbai_rerank_large_v1.py\nmodal deploy infinity_snowflake_arctic_embed_l_335m.py\n\nmodal deploy vllm_llama3_70b.py\nmodal deploy vllm_deepseek_coder_33b.py\nmodal deploy vllm_llama3-8b.py\nmodal deploy vllm_seallm_7b_v2_5.py\nmodal deploy vllm_sqlcoder_7b_2.py\nmodal deploy vllm_duckdb_nsql_7b.py\nmodal deploy vllm_codeqwen_110b_v1_5.py\n```\nEach command will deploy the respective script, launching the Infinity embeddings server or an OpenAI compatible vLLM server configured per the script's specifications.\n\n## Inference\n\nExpect cold starts between 30s and 1 minute with Modal. Both the vLLM and Infinity servers take in an API key, specified in your `.env` file. You can use this to make requests for inference on these models:\n\n**Querying LLMs**:\n```bash\ntime curl \u003curl\u003e \\\n-H \"Content-Type: application/json\" \\\n-H \"Authorization: Bearer \u003cVLLM_API_KEY\u003e\" \\\n-d '{\n  \"model\": \"TheBloke/deepseek-coder-33B-instruct-AWQ\",\n  \"messages\": [\n    {\n      \"role\": \"user\",\n      \"content\": \"Write me a python snake game.\"\n    }\n  ],\n  \"temperature\": 0,\n  \"max_tokens\": 1024\n}'\n```\n\n**Querying Embeddings**:\n```bash\ntime curl \u003curl\u003e \\\n-H \"Content-Type: application/json\" \\\n-H \"Authorization: Bearer \u003cINFINITY_API_KEY\u003e\" \\\n-d '{\n  \"model\": \"Snowflake/snowflake-arctic-embed-l\",\n  \"input\": [\"The quick brown fox jumps over the lazy dog.\"]\n}'\n```\n\n**Querying Rerankings**:\n```bash\ntime curl -X 'POST' \\\n  \u003curl\u003e \\\n  -H 'accept: application/json' \\\n  -H \"Authorization: Bearer \u003cINFINITY_API_KEY\u003e\" \\\n  -H 'Content-Type: application/json' \\\n  -d '{                                          \n  \"model\": \"mixedbread-ai/mxbai-rerank-large-v1\",     \n  \"query\": \"What is the python package infinity_emb?\",\n  \"documents\": [                                                                  \n    \"This is a document not related to the python package infinity_emb, hence...\",\n    \"Paris is in France!\",                                                                                \n    \"infinity_emb is a package for sentence embeddings and rerankings using transformer models in Python!\"\n  ],                      \n  \"return_documents\": true\n}'\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdwarvesf%2Fllm-hosting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdwarvesf%2Fllm-hosting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdwarvesf%2Fllm-hosting/lists"}