An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with model-quantization

A curated list of projects in awesome lists tagged with model-quantization .

https://github.com/inferflow/inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

baichuan2 bloom deepseek falcon gemma internlm llama2 llamacpp llm-inference m2m100 minicpm mistral mixtral mixture-of-experts model-quantization moe multi-gpu-inference phi-2 qwen

Last synced: 07 Apr 2025

https://github.com/sayakpaul/adventures-in-tensorflow-lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model

Last synced: 20 Sep 2025

https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model

Last synced: 09 Jul 2025

https://github.com/seonglae/llama2gptq

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers

Last synced: 22 Apr 2025

https://github.com/keshavpatel2/local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup

Last synced: 01 Apr 2025

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 17 Jun 2025