Projects in Awesome Lists tagged with inference-optimization

https://github.com/google/xnnpack

High-efficiency floating-point neural network inference operators for mobile, server, and Web

convolutional-neural-network convolutional-neural-networks cpu inference inference-optimization matrix-multiplication mobile-inference multithreading neural-network neural-networks simd

Last synced: 19 Feb 2026

https://github.com/google/XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

convolutional-neural-network convolutional-neural-networks cpu inference inference-optimization matrix-multiplication mobile-inference multithreading neural-network neural-networks simd

Last synced: 13 Mar 2025

https://github.com/alibaba/bladedisc

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

compiler deep-learning inference-optimization machine-learning mlir neural-network pytorch tensorflow

Last synced: 08 Oct 2025

https://github.com/alibaba/BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

compiler deep-learning inference-optimization machine-learning mlir neural-network pytorch tensorflow

Last synced: 19 Mar 2025

https://github.com/jiazhihao/TASO

The Tensor Algebra SuperOptimizer for Deep Learning

deep-learning deep-neural-networks inference-optimization

Last synced: 04 Apr 2025

https://github.com/jiazhihao/taso

The Tensor Algebra SuperOptimizer for Deep Learning

deep-learning deep-neural-networks inference-optimization

Last synced: 04 Apr 2025

https://github.com/brontoguana/krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware

cpu-inference gguf-model-support gpu-inference high-performance-inference hybrid-inference inference-engine inference-optimization large-language-models llama-cpp-alternative llm-inference mixture-of-experts transformer

Last synced: 18 Apr 2026

https://github.com/bentoml/llm-inference-handbook

Everything you need to know about LLM inference

inference-handbook inference-infrastructure inference-optimization llm llm-inference

Last synced: 14 Oct 2025

https://github.com/MIPT-Oulu/pytorch_bn_fusion

Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.

batch-normalization deep-learning deep-neural-networks inference-optimization pytorch

Last synced: 17 Aug 2025

https://github.com/mit-han-lab/inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

acceleration cnn inference-optimization parallelism

Last synced: 02 Sep 2025

https://github.com/zfturbo/keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time

inference-optimization keras

Last synced: 08 Apr 2025

https://github.com/efficientcontext/contextpilot

Accelerating Long Context LLM Inference with Accuracy-Preserving Context Optimization in SGLang, vLLM, llama.cpp, RAG, and Agentic AI.

ai-agents context-api context-engineering inference-optimization prompt-engineering

Last synced: 05 Mar 2026

https://github.com/keli-wen/agi-study

The blog, read report and code example for AGI/LLM related knowledge.

code-examples demo inference-optimization llm train

Last synced: 11 Apr 2025

https://github.com/ksm26/efficiently-serving-llms

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

batch-processing deep-learning-techniques inference-optimization large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving optimization-techniques performance-enhancement scalability-strategies server-optimization serving-infrastructure text-generation

Last synced: 02 Aug 2025

https://github.com/grazder/template.cpp

[WIP] A template for getting started writing code using GGML

cpp deep-learning ggml inference-optimization

Last synced: 02 Mar 2025

https://github.com/amazon-science/llm-rank-pruning

LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

graph-theory inference-optimization large-language-models llm llms pagerank pruning weighted-pagerank

Last synced: 16 Jan 2026

https://github.com/ez-optimium/optimium

Your AI Catalyst: inference backend to maximize your model's inference performance

ai-compiler amd arm deep-learning inference inference-engine inference-optimization intel mediapipe neural-network raspberry-pi runtime tensorflow-lite

Last synced: 24 Jul 2025

https://github.com/amazon-science/mlp-rank-pruning

MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.

centrality-measures graph-theory inference-optimization machine-learning multilayer-perceptron neural-network pagerank pruning structured-sparsity weighted-pagerank

Last synced: 07 Mar 2026

https://github.com/wb-az/yolov8-disease-detection-agriculture

YOLOV8 - Object detection

average-precision computer-vision deep-learning inference-optimization live-streaming object-detection openvino-inference-engine openvino-toolkit optimization-algorithms pandas pytorch ray-tune ultralytics yolov8

Last synced: 24 Aug 2025

https://github.com/piotrostr/infer-trt

Interface for TensorRT engines inference along with an example of YOLOv4 engine being used.

deep-learning inference-optimization object-detection tensorrt

Last synced: 07 Apr 2026

https://github.com/kiritigowda/mivisionx-inference-analyzer

MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results

amd amdgpu caffe docker-images inceptionv4 inference inference-engine inference-optimization mivisionx mivisionx-inference-analyzer nnef nnir onnx opencl openvx resnet resnet-50 rocm squeezenet vgg

Last synced: 11 Apr 2025

https://github.com/wb-az/yolov8-image-detection

YOLOV8 - Object detection

average-precision computer-vision deep-learning inference-optimization live-streaming object-detection openvino-inference-engine openvino-toolkit optimization-algorithms pandas pytorch ray-tune ultralytics yolov8

Last synced: 05 Mar 2025

https://github.com/shreyansh26/accelerating-cross-encoder-inference

Leveraging torch.compile to accelerate cross-encoder inference

cross-encoder inference-optimization jina mlsys torch-compile

Last synced: 11 Mar 2025

https://github.com/matteo-stat/transformers-nlp-ner-token-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization named-entity-recognition ner nlp onnx onnxruntime token-classification transformers

Last synced: 29 Jul 2025

https://github.com/keshavpatel2/local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup

Last synced: 01 Apr 2025

https://github.com/matteo-stat/transformers-nlp-multi-label-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing multi-label classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization multi-label-classification nlp onnx onnxruntime text-classification transformers

Last synced: 09 Jul 2025

https://github.com/zaki0521/hands-on-large-language-models

Explore hands-on projects with large language models. Learn techniques and best practices to harness AI effectively. Join the journey! 🤖🌟

book deep-learning-techniques generative-ai hans-on-llms inference-optimization large-language-models large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving ollama prompt-injection-llm-security serving-infrastructure spring-boot testcontainers vulnerable-application vulnerable-llm-application

Last synced: 15 Jul 2025

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome