{"id":19046344,"url":"https://github.com/saladtechnologies/llm-inference","last_synced_at":"2025-08-29T13:32:07.470Z","repository":{"id":200778726,"uuid":"706226615","full_name":"SaladTechnologies/llm-inference","owner":"SaladTechnologies","description":"A text generation inference server template for llms hosteb by huggingface","archived":false,"fork":false,"pushed_at":"2023-10-18T19:42:50.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-29T04:51:41.627Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SaladTechnologies.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-17T14:39:03.000Z","updated_at":"2023-10-17T17:56:04.000Z","dependencies_parsed_at":null,"dependency_job_id":"9579d1a6-4646-4759-8f98-4e7c0f13e041","html_url":"https://github.com/SaladTechnologies/llm-inference","commit_stats":null,"previous_names":["saladtechnologies/llm-inference"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/SaladTechnologies/llm-inference","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaladTechnologies%2Fllm-inference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaladTechnologies%2Fllm-inference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaladTechnologies%2Fllm-inference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaladTechnologies%2Fllm-inference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SaladTechnologies","download_url":"https://codeload.github.com/SaladTechnologies/llm-inference/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SaladTechnologies%2Fllm-inference/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272692431,"owners_count":24977356,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-29T02:00:10.610Z","response_time":87,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T22:54:22.525Z","updated_at":"2025-08-29T13:32:07.432Z","avatar_url":"https://github.com/SaladTechnologies.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llm-inference\nA text generation inference server template for LLMs hosted by huggingface\n\n## Discover compatible models\n\n```bash\n# Syntax: models [--task \u003ctask\u003e] [--limit \u003climit\u003e]\n#   --task: The task to search for. Defaults to \"text-generation\"\n#   --limit: The number of models to show. Defaults to 100\n\n./models --limit 500\n```\n\nThis script will let you search compatible models on huggingface hub, and copies your selection the clipboard\n\n## Build\n\nYou can bake a model into the image by supplying a model id to the build script. The model will be downloaded and cached in the image. The model will be loaded into memory when the container starts.\n\n```bash\n./build HuggingFaceH4/zephyr-7b-alpha\n```\n\nAlternatively, you can build and use the base image, and specify your model id at runtime. This will download the model when the container starts.\n\n```bash\n./build\n```\n\n## Run\n\nModify the docker-compose.yml file to select your model, port, and host.\n\n```bash\ndocker compose up\n```\n\n## Use\n\n```bash\ncurl  -X POST \\\n  'http://localhost:1234/chat' \\\n  --header 'Accept: */*' \\\n  --header 'Content-Type: application/json' \\\n  --data-raw '{\n  \"messages\": [\n    {\n      \"role\": \"system\",\n      \"content\": \"You are a helpful AI assistant. You strive for accuracy and usefulness. You only answer in rhyming iambic pentameter couplets.\"\n    },\n    {\n      \"role\": \"user\",\n      \"content\": \"How was pizza invented?\"\n    }\n  ],\n  \"options\": {\n    \"max_new_tokens\": 1024\n  }\n}' | jq -r '.outputs[0].generated_text'\n```\n\nOutput:\n```text\nIn Naples, a master did bake\nA flatbread with tomato and bake\nThe sauce was a simple mix\nOf canned tomatoes and olive's slick\n\nThe cheese, mozzarella, was added\nTo give it a flavor that was jaded\nThe result was a dish so divine\nThat people couldn't resist the sign\n\nFrom Naples, it traveled the world\nAnd pizza became the world's swirl\nA dish for all to enjoy\nThat's how pizza's history was sowed.\n\nSo, the next time you eat a slice\nOf pizza, take a moment to reminisce\nOn how a simple flatbread transformed\nInto a dish that we all adore.\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaladtechnologies%2Fllm-inference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsaladtechnologies%2Fllm-inference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsaladtechnologies%2Fllm-inference/lists"}