Projects in Awesome Lists tagged with inference-optimization
A curated list of projects in awesome lists tagged with inference-optimization .
https://github.com/google/xnnpack
High-efficiency floating-point neural network inference operators for mobile, server, and Web
convolutional-neural-network convolutional-neural-networks cpu inference inference-optimization matrix-multiplication mobile-inference multithreading neural-network neural-networks simd
Last synced: 19 Feb 2026
https://github.com/google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
convolutional-neural-network convolutional-neural-networks cpu inference inference-optimization matrix-multiplication mobile-inference multithreading neural-network neural-networks simd
Last synced: 13 Mar 2025
https://github.com/alibaba/bladedisc
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
compiler deep-learning inference-optimization machine-learning mlir neural-network pytorch tensorflow
Last synced: 08 Oct 2025
https://github.com/alibaba/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
compiler deep-learning inference-optimization machine-learning mlir neural-network pytorch tensorflow
Last synced: 19 Mar 2025
https://github.com/jiazhihao/taso
The Tensor Algebra SuperOptimizer for Deep Learning
deep-learning deep-neural-networks inference-optimization
Last synced: 04 Apr 2025
https://github.com/jiazhihao/TASO
The Tensor Algebra SuperOptimizer for Deep Learning
deep-learning deep-neural-networks inference-optimization
Last synced: 04 Apr 2025
https://github.com/bentoml/llm-inference-handbook
Everything you need to know about LLM inference
inference-handbook inference-infrastructure inference-optimization llm llm-inference
Last synced: 14 Oct 2025
https://github.com/MIPT-Oulu/pytorch_bn_fusion
Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.
batch-normalization deep-learning deep-neural-networks inference-optimization pytorch
Last synced: 17 Aug 2025
https://github.com/mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
acceleration cnn inference-optimization parallelism
Last synced: 02 Sep 2025
https://github.com/zfturbo/keras-inference-time-optimizer
Optimize layers structure of Keras model to reduce computation time
Last synced: 08 Apr 2025
https://github.com/keli-wen/agi-study
The blog, read report and code example for AGI/LLM related knowledge.
code-examples demo inference-optimization llm train
Last synced: 11 Apr 2025
https://github.com/ksm26/efficiently-serving-llms
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
batch-processing deep-learning-techniques inference-optimization large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving optimization-techniques performance-enhancement scalability-strategies server-optimization serving-infrastructure text-generation
Last synced: 02 Aug 2025
https://github.com/grazder/template.cpp
[WIP] A template for getting started writing code using GGML
cpp deep-learning ggml inference-optimization
Last synced: 02 Mar 2025
https://github.com/amazon-science/llm-rank-pruning
LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.
graph-theory inference-optimization large-language-models llm llms pagerank pruning weighted-pagerank
Last synced: 16 Jan 2026
https://github.com/ez-optimium/optimium
Your AI Catalyst: inference backend to maximize your model's inference performance
ai-compiler amd arm deep-learning inference inference-engine inference-optimization intel mediapipe neural-network raspberry-pi runtime tensorflow-lite
Last synced: 24 Jul 2025
https://github.com/kiritigowda/mivisionx-inference-analyzer
MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results
amd amdgpu caffe docker-images inceptionv4 inference inference-engine inference-optimization mivisionx mivisionx-inference-analyzer nnef nnir onnx opencl openvx resnet resnet-50 rocm squeezenet vgg
Last synced: 11 Apr 2025
https://github.com/wb-az/yolov8-disease-detection-agriculture
YOLOV8 - Object detection
average-precision computer-vision deep-learning inference-optimization live-streaming object-detection openvino-inference-engine openvino-toolkit optimization-algorithms pandas pytorch ray-tune ultralytics yolov8
Last synced: 24 Aug 2025
https://github.com/piotrostr/infer-trt
Interface for TensorRT engines inference along with an example of YOLOv4 engine being used.
deep-learning inference-optimization object-detection tensorrt
Last synced: 21 Feb 2025
https://github.com/amazon-science/mlp-rank-pruning
MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.
centrality-measures graph-theory inference-optimization machine-learning multilayer-perceptron neural-network pagerank pruning structured-sparsity weighted-pagerank
Last synced: 03 May 2025
https://github.com/wb-az/yolov8-image-detection
YOLOV8 - Object detection
average-precision computer-vision deep-learning inference-optimization live-streaming object-detection openvino-inference-engine openvino-toolkit optimization-algorithms pandas pytorch ray-tune ultralytics yolov8
Last synced: 05 Mar 2025
https://github.com/keshavpatel2/local-llm-workbench
🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.
context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup
Last synced: 01 Apr 2025
https://github.com/matteo-stat/transformers-nlp-ner-token-classification
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization named-entity-recognition ner nlp onnx onnxruntime token-classification transformers
Last synced: 29 Jul 2025
https://github.com/shreyansh26/accelerating-cross-encoder-inference
Leveraging torch.compile to accelerate cross-encoder inference
cross-encoder inference-optimization jina mlsys torch-compile
Last synced: 11 Mar 2025
https://github.com/matteo-stat/transformers-nlp-multi-label-classification
This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing multi-label classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊
fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization multi-label-classification nlp onnx onnxruntime text-classification transformers
Last synced: 09 Jul 2025
https://github.com/zaki0521/hands-on-large-language-models
Explore hands-on projects with large language models. Learn techniques and best practices to harness AI effectively. Join the journey! 🤖🌟
book deep-learning-techniques generative-ai hans-on-llms inference-optimization large-language-models large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving ollama prompt-injection-llm-security serving-infrastructure spring-boot testcontainers vulnerable-application vulnerable-llm-application
Last synced: 15 Jul 2025