Projects in Awesome Lists tagged with speculative-decoding
A curated list of projects in awesome lists tagged with speculative-decoding .
https://github.com/intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
4-bits autoround chatbot chatpdf gaudi3 habana intel-optimized-llamacpp large-language-model llm-cpu llm-inference neural-chat neural-chat-7b rag retrieval speculative-decoding streamingllm
Last synced: 24 Feb 2025
https://github.com/aphrodite-engine/aphrodite-engine
Large-scale LLM inference engine
api-rest cuda inference-engine inferentia intel lora machine-learning rocm speculative-decoding tpu
Last synced: 14 May 2025
https://github.com/SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
large-language-models llm-inference speculative-decoding
Last synced: 20 Mar 2025
https://github.com/Infini-AI-Lab/Sequoia
scalable and robust tree-based speculative decoding algorithm
efficiency inference llm speculative-decoding
Last synced: 16 Oct 2025
https://github.com/facebookresearch/layerskip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
early-exit layer-drop llm optimization speculative-decoding transformers
Last synced: 12 Apr 2025
https://github.com/facebookresearch/LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
early-exit layer-drop llm optimization speculative-decoding transformers
Last synced: 11 Mar 2025
https://github.com/Infini-AI-Lab/TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
acceleration efficiency inference llm llm-inference long-context speculative-decoding
Last synced: 16 May 2025
https://github.com/fasterdecoding/rest
REST: Retrieval-Based Speculative Decoding, NAACL 2024
llm-inference retrieval speculative-decoding
Last synced: 16 May 2025
https://github.com/FasterDecoding/REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
llm-inference retrieval speculative-decoding
Last synced: 07 May 2025
https://github.com/kssteven418/biglittledecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
decoding efficient-inference fast-inference llm speculative-decoding speculative-execution
Last synced: 31 Jul 2025
https://github.com/autonomicperfectionist/pipeinfer
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
inference llamacpp llm speculative-decoding
Last synced: 05 Oct 2025
https://github.com/mscheong01/speculative_decoding.c
minimal C implementation of speculative decoding based on llama2.c
artificial-intelligence c llama2 llm speculative-decoding
Last synced: 23 Jun 2025
https://github.com/hsj576/griffin
Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"
large-language-models llm-inference speculative-decoding
Last synced: 13 May 2025
https://github.com/llmsresearch/specstream
Fast LLM inference with 2.8x speedup using speculative decoding
inference largelanguagemodel llms speculative-decoding
Last synced: 14 Jan 2026
https://github.com/geralt-targaryen/awesome-speculative-decoding
Reading notes on Speculative Decoding papers
awesome llm nlp papers speculative-decoding
Last synced: 14 Mar 2025
https://github.com/wtlow003/speculative-sampling
Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"
deepmind llm-inference speculative-decoding speculative-sampling
Last synced: 24 Sep 2025
https://github.com/wtlow003/ngram-decoding
(Re)-implementation of "Prompt Lookup Decoding" by Apoorv Saxena, with extended ideas from LLMA Decoding.
llm-inference n-gram ngram-decoding prompt-lookup-decoding speculative-decoding
Last synced: 05 Mar 2025
https://github.com/eps-ai-solutions/claudecli
HYDRA 10.0 - Advanced AI System with Self-Correction, Few-Shot Learning, Speculative Decoding, Load Balancing & Semantic RAG
ai automation claude few-shot-learning llm mcp ollama powershell self-correction speculative-decoding
Last synced: 16 Jan 2026