An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with speculative-decoding

A curated list of projects in awesome lists tagged with speculative-decoding .

https://github.com/intel/intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

4-bits autoround chatbot chatpdf gaudi3 habana intel-optimized-llamacpp large-language-model llm-cpu llm-inference neural-chat neural-chat-7b rag retrieval speculative-decoding streamingllm

Last synced: 24 Feb 2025

https://github.com/SafeAILab/EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

large-language-models llm-inference speculative-decoding

Last synced: 20 Mar 2025

https://github.com/Infini-AI-Lab/Sequoia

scalable and robust tree-based speculative decoding algorithm

efficiency inference llm speculative-decoding

Last synced: 16 Oct 2025

https://github.com/facebookresearch/layerskip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

early-exit layer-drop llm optimization speculative-decoding transformers

Last synced: 12 Apr 2025

https://github.com/facebookresearch/LayerSkip

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

early-exit layer-drop llm optimization speculative-decoding transformers

Last synced: 11 Mar 2025

https://github.com/Infini-AI-Lab/TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

acceleration efficiency inference llm llm-inference long-context speculative-decoding

Last synced: 16 May 2025

https://github.com/fasterdecoding/rest

REST: Retrieval-Based Speculative Decoding, NAACL 2024

llm-inference retrieval speculative-decoding

Last synced: 16 May 2025

https://github.com/FasterDecoding/REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

llm-inference retrieval speculative-decoding

Last synced: 07 May 2025

https://github.com/kssteven418/biglittledecoder

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference fast-inference llm speculative-decoding speculative-execution

Last synced: 31 Jul 2025

https://github.com/autonomicperfectionist/pipeinfer

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

inference llamacpp llm speculative-decoding

Last synced: 05 Oct 2025

https://github.com/mscheong01/speculative_decoding.c

minimal C implementation of speculative decoding based on llama2.c

artificial-intelligence c llama2 llm speculative-decoding

Last synced: 23 Jun 2025

https://github.com/hsj576/griffin

Official Implementation of "GRIFFIN: Effective Token Alignment for Faster Speculative Decoding"

large-language-models llm-inference speculative-decoding

Last synced: 13 May 2025

https://github.com/llmsresearch/specstream

Fast LLM inference with 2.8x speedup using speculative decoding

inference largelanguagemodel llms speculative-decoding

Last synced: 14 Jan 2026

https://github.com/geralt-targaryen/awesome-speculative-decoding

Reading notes on Speculative Decoding papers

awesome llm nlp papers speculative-decoding

Last synced: 14 Mar 2025

https://github.com/wtlow003/speculative-sampling

Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"

deepmind llm-inference speculative-decoding speculative-sampling

Last synced: 24 Sep 2025

https://github.com/wtlow003/ngram-decoding

(Re)-implementation of "Prompt Lookup Decoding" by Apoorv Saxena, with extended ideas from LLMA Decoding.

llm-inference n-gram ngram-decoding prompt-lookup-decoding speculative-decoding

Last synced: 05 Mar 2025

https://github.com/eps-ai-solutions/claudecli

HYDRA 10.0 - Advanced AI System with Self-Correction, Few-Shot Learning, Speculative Decoding, Load Balancing & Semantic RAG

ai automation claude few-shot-learning llm mcp ollama powershell self-correction speculative-decoding

Last synced: 16 Jan 2026