Projects in Awesome Lists tagged with model-quantization
A curated list of projects in awesome lists tagged with model-quantization .
https://github.com/inferflow/inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
baichuan2 bloom deepseek falcon gemma internlm llama2 llamacpp llm-inference m2m100 minicpm mistral mixtral mixture-of-experts model-quantization moe multi-gpu-inference phi-2 qwen
Last synced: 07 Apr 2025
https://github.com/sayakpaul/adventures-in-tensorflow-lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model
Last synced: 20 Sep 2025
https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model
Last synced: 09 Jul 2025
https://datawhalechina.github.io/awesome-compression/
模型压缩的小白入门教程
compression kd knowledge-distillation model-compression model-pruning model-quantization neural-architecture-search prune quantization tinyml
Last synced: 24 Sep 2025
https://github.com/seonglae/llama2gptq
Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.
chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers
Last synced: 22 Apr 2025
https://github.com/dcarpintero/ai-engineering
AI Engineering: Annotated NBs to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.
ai-engineering bert chunking embeddings fine-tuning generative-ai huggingface-transformers in-context-learning knowledge-graph langchain large-language-models llama3-1 model-quantization retrieval-augmented-generation self-attention transformer weights-and-biases
Last synced: 04 Apr 2025
https://github.com/dcarpintero/generative-ai-101
Annotated Notebooks to dive into Self-Attention, In-Context Learning, RAG, Knowledge-Graphs, Fine-Tuning, Model Optimization, and many more.
bert chunking embeddings fine-tuning generative-ai huggingface-transformers in-context-learning knowledge-graph langchain large-language-models llama3-1 model-quantization retrieval-augmented-generation self-attention transformer weights-and-biases
Last synced: 14 Mar 2025
https://github.com/keshavpatel2/local-llm-workbench
🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.
context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup
Last synced: 01 Apr 2025
https://github.com/dwain-barnes/llm-gguf-auto-converter
Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.
auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization
Last synced: 17 Jun 2025
https://github.com/satyampurwar/large-language-models
Unlocking the Power of Generative AI: In-Context Learning, Instruction Fine-Tuning and Reinforcement Learning Fine-Tuning.
bert conda-environment encoder-decoder-model encoder-model few-shot-prompting flan-t5 generative-ai instruction-fine-tuning kl-divergence large-language-models low-rank-adaptation megacmd memory-management model-quantization peft-fine-tuning-llm prompt-engineering proximal-policy-optimization reinforcement-learning-from-ai-feedback reinforcement-learning-from-human-feedback storage-management
Last synced: 21 Feb 2025