An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with inference-optimization

A curated list of projects in awesome lists tagged with inference-optimization .

https://github.com/alibaba/bladedisc

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

compiler deep-learning inference-optimization machine-learning mlir neural-network pytorch tensorflow

Last synced: 08 Oct 2025

https://github.com/alibaba/BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

compiler deep-learning inference-optimization machine-learning mlir neural-network pytorch tensorflow

Last synced: 19 Mar 2025

https://github.com/jiazhihao/taso

The Tensor Algebra SuperOptimizer for Deep Learning

deep-learning deep-neural-networks inference-optimization

Last synced: 04 Apr 2025

https://github.com/jiazhihao/TASO

The Tensor Algebra SuperOptimizer for Deep Learning

deep-learning deep-neural-networks inference-optimization

Last synced: 04 Apr 2025

https://github.com/MIPT-Oulu/pytorch_bn_fusion

Batch normalization fusion for PyTorch. This is an archived repository, which is not maintained.

batch-normalization deep-learning deep-neural-networks inference-optimization pytorch

Last synced: 17 Aug 2025

https://github.com/mit-han-lab/inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

acceleration cnn inference-optimization parallelism

Last synced: 02 Sep 2025

https://github.com/zfturbo/keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time

inference-optimization keras

Last synced: 08 Apr 2025

https://github.com/keli-wen/agi-study

The blog, read report and code example for AGI/LLM related knowledge.

code-examples demo inference-optimization llm train

Last synced: 11 Apr 2025

https://github.com/ksm26/efficiently-serving-llms

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.

batch-processing deep-learning-techniques inference-optimization large-scale-deployment machine-learning-operations model-acceleration model-inference-service model-serving optimization-techniques performance-enhancement scalability-strategies server-optimization serving-infrastructure text-generation

Last synced: 02 Aug 2025

https://github.com/grazder/template.cpp

[WIP] A template for getting started writing code using GGML

cpp deep-learning ggml inference-optimization

Last synced: 02 Mar 2025

https://github.com/amazon-science/llm-rank-pruning

LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

graph-theory inference-optimization large-language-models llm llms pagerank pruning weighted-pagerank

Last synced: 16 Jan 2026

https://github.com/ez-optimium/optimium

Your AI Catalyst: inference backend to maximize your model's inference performance

ai-compiler amd arm deep-learning inference inference-engine inference-optimization intel mediapipe neural-network raspberry-pi runtime tensorflow-lite

Last synced: 24 Jul 2025

https://github.com/kiritigowda/mivisionx-inference-analyzer

MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results

amd amdgpu caffe docker-images inceptionv4 inference inference-engine inference-optimization mivisionx mivisionx-inference-analyzer nnef nnir onnx opencl openvx resnet resnet-50 rocm squeezenet vgg

Last synced: 11 Apr 2025

https://github.com/piotrostr/infer-trt

Interface for TensorRT engines inference along with an example of YOLOv4 engine being used.

deep-learning inference-optimization object-detection tensorrt

Last synced: 21 Feb 2025

https://github.com/amazon-science/mlp-rank-pruning

MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.

centrality-measures graph-theory inference-optimization machine-learning multilayer-perceptron neural-network pagerank pruning structured-sparsity weighted-pagerank

Last synced: 03 May 2025

https://github.com/keshavpatel2/local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

context-window-scaling cpu-inference cuda gpu-acceleration hybrid-inference inference-optimization llama-cpp llm-benchmarking llm-deployment local-llm model-management model-quantization ollama-optimization wsl-ai-setup

Last synced: 01 Apr 2025

https://github.com/matteo-stat/transformers-nlp-ner-token-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing token classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization named-entity-recognition ner nlp onnx onnxruntime token-classification transformers

Last synced: 29 Jul 2025

https://github.com/shreyansh26/accelerating-cross-encoder-inference

Leveraging torch.compile to accelerate cross-encoder inference

cross-encoder inference-optimization jina mlsys torch-compile

Last synced: 11 Mar 2025

https://github.com/matteo-stat/transformers-nlp-multi-label-classification

This repo provides scripts for fine-tuning HuggingFace Transformers, setting up pipelines and optimizing multi-label classification models for inference. They are based on my experience developing a custom chatbot, I’m sharing these in the hope they will help others to quickly fine-tune and use models in their projects! 😊

fine-tuning huggingface huggingface-pipelines huggingface-transformers inference-optimization multi-label-classification nlp onnx onnxruntime text-classification transformers

Last synced: 09 Jul 2025