https://github.com/llmsresearch/specstream

Fast LLM inference with 2.8x speedup using speculative decoding
https://github.com/llmsresearch/specstream

inference largelanguagemodel llms speculative-decoding

Last synced: 27 days ago
JSON representation

Fast LLM inference with 2.8x speedup using speculative decoding

Host: GitHub
URL: https://github.com/llmsresearch/specstream
Owner: llmsresearch
License: mit
Created: 2025-07-24T01:50:10.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-07-24T02:18:05.000Z (7 months ago)
Last Synced: 2025-09-29T16:36:06.189Z (4 months ago)
Topics: inference, largelanguagemodel, llms, speculative-decoding
Language: Python
Homepage: https://pypi.org/project/specstream/
Size: 39.1 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0

Awesome Lists containing this project