An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with model-quantization

A curated list of projects in awesome lists tagged with model-quantization .

https://github.com/inferflow/inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

baichuan2 bloom deepseek falcon gemma internlm llama2 llamacpp llm-inference m2m100 minicpm mistral mixtral mixture-of-experts model-quantization moe multi-gpu-inference phi-2 qwen

Last synced: 07 Apr 2025

https://github.com/sayakpaul/adventures-in-tensorflow-lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model

Last synced: 20 Sep 2025

https://github.com/sayakpaul/Adventures-in-TensorFlow-Lite

This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.

inference model-optimization model-quantization on-device-ml post-training-quantization pruning quantization-aware-training tensorflow-2 tensorflow-lite tf-hub tf-lite-model

Last synced: 09 Jul 2025

https://github.com/seonglae/llama2gptq

Chat to LLaMa 2 that also provides responses with reference documents over vector database. Locally available model using GPTQ 4bit quantization.

chatai chatbot chatgpt cuda gpt langchain llama-2 llama2 model-quantization quantization question-answering rye streamlit-chat transformers

Last synced: 22 Apr 2025

https://github.com/first-coding/vit

This project distills a ViT model into a compact CNN, reducing its size to 1.24MB with minimal accuracy loss. ONNXRuntime with CUDA boosts inference speed, while FastAPI and Docker simplify deployment.

docker fastapi image-classification knowledge-distillation model-quantization onnx onnxruntime python vision-transformer

Last synced: 18 Apr 2026

https://github.com/dwain-barnes/llm-gguf-auto-converter

Automated Jupyter notebook solution for batch converting Large Language Models to GGUF format with multiple quantization options. Built on llama.cpp with HuggingFace integration.

auto-converter batch-processing cuda gguf huggingface jupyter-notebook llama-cpp llm model-quantization

Last synced: 17 Jun 2025