{"id":29125699,"url":"https://github.com/GeeeekExplorer/nano-vllm","last_synced_at":"2025-06-29T22:02:59.237Z","repository":{"id":298648471,"uuid":"999030842","full_name":"GeeeekExplorer/nano-vllm","owner":"GeeeekExplorer","description":"Nano vLLM","archived":false,"fork":false,"pushed_at":"2025-06-27T10:51:00.000Z","size":35,"stargazers_count":4266,"open_issues_count":18,"forks_count":473,"subscribers_count":43,"default_branch":"main","last_synced_at":"2025-06-27T11:44:38.920Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GeeeekExplorer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-09T16:22:14.000Z","updated_at":"2025-06-27T11:37:02.000Z","dependencies_parsed_at":"2025-06-19T16:27:21.620Z","dependency_job_id":null,"html_url":"https://github.com/GeeeekExplorer/nano-vllm","commit_stats":null,"previous_names":["geeeekexplorer/nano-vllm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/GeeeekExplorer/nano-vllm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeeeekExplorer%2Fnano-vllm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeeeekExplorer%2Fnano-vllm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeeeekExplorer%2Fnano-vllm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeeeekExplorer%2Fnano-vllm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GeeeekExplorer","download_url":"https://codeload.github.com/GeeeekExplorer/nano-vllm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GeeeekExplorer%2Fnano-vllm/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262674948,"owners_count":23346741,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-29T22:02:25.535Z","updated_at":"2025-06-29T22:02:59.206Z","avatar_url":"https://github.com/GeeeekExplorer.png","language":"Python","funding_links":[],"categories":["3. Inference Engines \u0026 Serving","Frameworks","A01_文本生成_文本对话","Python","Summary","Repos","Inference engines","Deployment and Serving","8. Inference Engines"],"sub_categories":["大语言对话模型及数据","Server / Production"],"readme":"# Nano-vLLM\n\nA lightweight vLLM implementation built from scratch.\n\n## Key Features\n\n* 🚀 **Fast offline inference** - Comparable inference speeds to vLLM\n* 📖 **Readable codebase** - Clean implementation in ~ 1,200 lines of Python code\n* ⚡ **Optimization Suite** - Prefix caching, Tensor Parallelism, Torch compilation, CUDA graph, etc.\n\n## Installation\n\n```bash\npip install git+https://github.com/GeeeekExplorer/nano-vllm.git\n```\n\n## Manual Download\n\nIf you prefer to download the model weights manually, use the following command:\n```bash\nhuggingface-cli download --resume-download Qwen/Qwen3-0.6B \\\n  --local-dir ~/huggingface/Qwen3-0.6B/ \\\n  --local-dir-use-symlinks False\n```\n\n## Quick Start\n\nSee `example.py` for usage. The API mirrors vLLM's interface with minor differences in the `LLM.generate` method:\n```python\nfrom nanovllm import LLM, SamplingParams\nllm = LLM(\"/YOUR/MODEL/PATH\", enforce_eager=True, tensor_parallel_size=1)\nsampling_params = SamplingParams(temperature=0.6, max_tokens=256)\nprompts = [\"Hello, Nano-vLLM.\"]\noutputs = llm.generate(prompts, sampling_params)\noutputs[0][\"text\"]\n```\n\n## Benchmark\n\nSee `bench.py` for benchmark.\n\n**Test Configuration:**\n- Hardware: RTX 4070 Laptop (8GB)\n- Model: Qwen3-0.6B\n- Total Requests: 256 sequences\n- Input Length: Randomly sampled between 100–1024 tokens\n- Output Length: Randomly sampled between 100–1024 tokens\n\n**Performance Results:**\n| Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) |\n|----------------|-------------|----------|-----------------------|\n| vLLM           | 133,966     | 98.37    | 1361.84               |\n| Nano-vLLM      | 133,966     | 93.41    | 1434.13               |\n\n\n## Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=GeeeekExplorer/nano-vllm\u0026type=Date)](https://www.star-history.com/#GeeeekExplorer/nano-vllm\u0026Date)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGeeeekExplorer%2Fnano-vllm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGeeeekExplorer%2Fnano-vllm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGeeeekExplorer%2Fnano-vllm/lists"}