Projects in Awesome Lists tagged with inference
A curated list of projects in awesome lists tagged with inference .
https://github.com/vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
amd cuda deepseek gpt hpu inference inferentia llama llm llm-serving llmops mlops model-serving pytorch qwen rocm tpu trainium transformer xpu
Last synced: 29 Jan 2026
https://github.com/ggml-org/whisper.cpp
Port of OpenAI's Whisper model in C/C++
inference openai speech-recognition speech-to-text transformer whisper
Last synced: 16 Jan 2026
https://github.com/hpcaitech/colossalai
Making large AI models cheaper, faster and more accessible
ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism
Last synced: 09 Sep 2025
https://github.com/hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
ai big-model data-parallelism deep-learning distributed-computing foundation-models heterogeneous-training hpc inference large-scale model-parallelism pipeline-parallelism
Last synced: 19 Mar 2025
https://github.com/deepspeedai/deepspeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 15 Jan 2026
https://github.com/microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 02 Apr 2025
https://github.com/deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
billion-parameters compression data-parallelism deep-learning gpu inference machine-learning mixture-of-experts model-parallelism pipeline-parallelism pytorch trillion-parameters zero
Last synced: 19 Oct 2025
https://github.com/google-ai-edge/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
android audio-processing c-plus-plus calculator computer-vision deep-learning framework graph-based graph-framework inference machine-learning mediapipe mobile-development perception pipeline-framework stream-processing video-processing
Last synced: 30 Jan 2026
https://google.github.io/mediapipe
Cross-platform, customizable ML solutions for live and streaming media.
android audio-processing c-plus-plus calculator computer-vision deep-learning framework graph-based graph-framework inference machine-learning mediapipe mobile-development perception pipeline-framework stream-processing video-processing
Last synced: 02 Apr 2025
https://github.com/sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
attention blackwell cuda deepseek diffusion glm gpt-oss inference llama llm minimax moe qwen qwen-image reinforcement-learning transformer vlm wan
Last synced: 16 May 2026
https://github.com/tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
android arm-neon artificial-intelligence caffe darknet deep-learning high-preformance inference ios keras mlir mxnet ncnn neural-network onnx pytorch riscv simd tensorflow vulkan
Last synced: 09 Sep 2025
https://github.com/Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
android arm-neon artificial-intelligence caffe darknet deep-learning high-preformance inference ios keras mlir mxnet ncnn neural-network onnx pytorch riscv simd tensorflow vulkan
Last synced: 14 Mar 2025
https://github.com/systran/faster-whisper
Faster Whisper transcription with CTranslate2
deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper
Last synced: 09 Sep 2025
https://github.com/SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
deep-learning inference openai quantization speech-recognition speech-to-text transformer whisper
Last synced: 24 Mar 2025
https://github.com/stas00/ml-engineering
Machine Learning Engineering Open Book
ai inference large-language-models llm machine-learning machine-learning-engineering mlops pytorch scalability slurm training transformers
Last synced: 14 May 2025
https://github.com/gvergnaud/ts-pattern
🎨 The exhaustive Pattern Matching library for TypeScript, with smart type inference.
branching conditions exhaustive inference javascript matching pattern pattern-matching ts type-inference typescript
Last synced: 14 May 2025
https://github.com/nvidia/tensorrt
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
deep-learning gpu-acceleration inference nvidia tensorrt
Last synced: 09 Sep 2025
https://github.com/aws/amazon-sagemaker-examples
Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
aws data-science deep-learning examples inference jupyter-notebook machine-learning mlops reinforcement-learning sagemaker training
Last synced: 13 May 2025
https://github.com/NVIDIA/TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
deep-learning gpu-acceleration inference nvidia tensorrt
Last synced: 20 Mar 2025
https://github.com/huggingface/text-generation-inference
Large Language Model Text Generation Inference
bloom deep-learning falcon gpt inference nlp pytorch starcoder transformer
Last synced: 13 May 2025
https://github.com/xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
artificial-intelligence chatglm deployment flan-t5 gemma ggml glm4 inference llama llama3 llamacpp llm machine-learning mistral openai-api pytorch qwen vllm whisper wizardlm
Last synced: 25 Apr 2026
https://github.com/triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
cloud datacenter deep-learning edge gpu inference machine-learning
Last synced: 24 Dec 2025
https://github.com/openvinotoolkit/openvino
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
ai computer-vision deep-learning deploy-ai diffusion-models generative-ai good-first-issue inference llm-inference natural-language-processing nlp openvino optimize-ai performance-boost recommendation-system speech-recognition stable-diffusion transformers yolo
Last synced: 12 May 2025
https://github.com/dusty-nv/jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
caffe computer-vision deep-learning digits embedded image-recognition inference jetson jetson-nano jetson-tx1 jetson-tx2 jetson-xavier jetson-xavier-nx machine-learning nvidia object-detection robotics segmentation tensorrt video-analytics
Last synced: 13 May 2025
https://github.com/oumi-ai/oumi
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
dpo evaluation fine-tuning inference llama llms sft vlms
Last synced: 29 Jan 2026
https://github.com/linzaer/ultra-light-fast-generic-face-detector-1mb
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
arm face-detection inference mnn ncnn
Last synced: 14 May 2025
https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB
💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
arm face-detection inference mnn ncnn
Last synced: 14 Mar 2025
https://github.com/gcanti/io-ts
Runtime type system for IO decoding/encoding
inference runtime types typescript validation
Last synced: 14 May 2025
https://gcanti.github.io/io-ts/
Runtime type system for IO decoding/encoding
inference runtime types typescript validation
Last synced: 08 Apr 2025
https://github.com/argmaxinc/argmax-oss-swift
On-device Speech AI for Apple Silicon
diarization inference ios macos pyannote qwen3-tts speech-recognition speech-to-text swift text-to-speech transformers visionos watchos whisper
Last synced: 19 Apr 2026
https://github.com/trusted-ai/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai
Last synced: 13 May 2025
https://github.com/Trusted-AI/adversarial-robustness-toolbox
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
adversarial-attacks adversarial-examples adversarial-machine-learning ai artificial-intelligence attack blue-team evasion extraction inference machine-learning poisoning privacy python red-team trusted-ai trustworthy-ai
Last synced: 23 Mar 2025
https://github.com/superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search
Last synced: 14 May 2025
https://github.com/gpustack/gpustack
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
ascend cuda deepseek distributed-inference genai high-performance-inference inference llama llm llm-inference llm-serving maas mindie openai qwen rocm sglang vllm
Last synced: 20 Apr 2026
https://github.com/autogptq/autogptq
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers
Last synced: 08 Apr 2025
https://github.com/AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers
Last synced: 14 Mar 2025
https://github.com/nvidia-ai-iot/torch2trt
An easy to use PyTorch to TensorRT converter
classification inference jetson-nano jetson-tx2 jetson-xavier pytorch tensorrt
Last synced: 14 May 2025
https://github.com/NVIDIA-AI-IOT/torch2trt
An easy to use PyTorch to TensorRT converter
classification inference jetson-nano jetson-tx2 jetson-xavier pytorch tensorrt
Last synced: 20 Mar 2025
https://github.com/argmaxinc/whisperkit
On-device Speech Recognition for Apple Silicon
inference ios macos speech-recognition swift transformers visionos watchos whisper
Last synced: 13 May 2025
https://github.com/tencent/tnn
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
coreml deep-learning face-detection hairsegmentaion inference mnn ncnn ocr openvino pytorch tengine tensorflow tensorrt
Last synced: 13 May 2025
https://github.com/Tencent/TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
coreml deep-learning face-detection hairsegmentaion inference mnn ncnn ocr openvino pytorch tengine tensorflow tensorrt
Last synced: 20 Mar 2025
https://github.com/argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
inference ios macos speech-recognition swift transformers visionos watchos whisper
Last synced: 28 Mar 2025
https://github.com/openvinotoolkit/open_model_zoo
Pre-trained Deep Learning models and demos (high quality and extremely fast)
caffemodel cnn-model deep-learning-models demo inference model model-zoo models onnx-models openvino openvino-model-zoo openvino-models openvino-toolkit pytorch-models tensorflow-models
Last synced: 13 May 2025
https://github.com/typedb/typedb
TypeDB: Built for systems, not records
database inference knowledge-base knowledge-representation logic polymorphic polymorphism reasoning strongly-typed type-system typedb typeql
Last synced: 02 May 2026
https://github.com/opencsgs/csghub
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both open-source and on-premise/SaaS solutions, with features comparable to Hugging Face. Gain full control over the lifecycle of LLMs, datasets, and agents, with Python SDK compatibility with Hugging Face. Join us! ⭐️
ai asset-management dataset deepseek deploy finetune git huggingface inference llm management-system model platform prompt ray space
Last synced: 16 Jun 2026
https://github.com/opennmt/ctranslate2
Fast inference engine for Transformer models
avx avx2 cpp cuda deep-learning deep-neural-networks gemm inference intrinsics machine-translation mkl neon neural-machine-translation onednn openmp opennmt parallel-computing quantization thrust transformer-models
Last synced: 08 Oct 2025
https://github.com/tencentmusic/cube-studio
cube studio开源云原生一站式机器学习/深度学习/大模型AI平台,支持sso登录,多租户,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持国产cpu/gpu/npu芯片,支持RDMA,支持pytorch/tf/mxnet/deepspeed/paddle/colossalai/horovod/spark/ray/volcano分布式
ai aihub argo automl gpt inference kubeflow kubernetes llmops mlops notebook pipeline pytorch spark vgpu workflow
Last synced: 06 Feb 2026
https://github.com/OpenNMT/CTranslate2
Fast inference engine for Transformer models
avx avx2 cpp cuda deep-learning deep-neural-networks gemm inference intrinsics machine-translation mkl neon neural-machine-translation onednn openmp opennmt parallel-computing quantization thrust transformer-models
Last synced: 02 Apr 2025
https://github.com/bytedance/lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
accelerate bart beam-search bert cuda diverse-decoding gpt inference multilingual-nmt sampling training transformer
Last synced: 14 May 2025
https://github.com/neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
computer-vision cpus deepsparse inference llm-inference machinelearning nlp object-detection onnx performance pretrained-models pruning quantization sparsification
Last synced: 14 May 2025
https://github.com/huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
graphcore habana inference intel onnx onnxruntime optimization pytorch quantization tflite training transformers
Last synced: 14 May 2025
https://github.com/openvinotoolkit/openvino_notebooks
📚 Jupyter notebook tutorials for OpenVINO™
computer-vision deep-learning inference machine-learning openvino
Last synced: 01 Jul 2025
https://github.com/zjhellofss/kuiperinfer
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
caffe convolution deep-learning deep-neural-networks diy graph-algorithms inference inference-engine maxpooling ncnn pnnx pytorch relu resnet sigmoid yolo yolov5
Last synced: 14 May 2025
https://github.com/zjhellofss/KuiperInfer
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
caffe convolution deep-learning deep-neural-networks diy graph-algorithms inference inference-engine maxpooling ncnn pnnx pytorch relu resnet sigmoid yolo yolov5
Last synced: 19 Mar 2025
https://github.com/raullenchai/rapid-mlx
The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Drop-in OpenAI replacement. Works with Claude Code, Cursor, Aider.
apple-silicon claude-code cursor deepseek fastapi hacktoberfest inference llm local-llm m1 m2 m3 macos mlx ollama-alternative openai-api python qwen tool-calling
Last synced: 12 Jun 2026
https://github.com/Andyyyy64/whichllm
Find the local LLM that actually runs and performs best on your hardware. Ranked by real, recency-aware benchmarks, not parameter count. One command, run it instantly.
ai apple-silicon benchmarks cli command-line-tool gguf gpu huggingface inference llm local-llm ollama python vram
Last synced: 09 Jun 2026
https://github.com/huggingface/huggingface.js
Use Hugging Face with JavaScript
api-client hub huggingface inference machine-learning
Last synced: 10 Jun 2026
https://github.com/google/xnnpack
High-efficiency floating-point neural network inference operators for mobile, server, and Web
convolutional-neural-network convolutional-neural-networks cpu inference inference-optimization matrix-multiplication mobile-inference multithreading neural-network neural-networks simd
Last synced: 19 Feb 2026
https://github.com/roboflow/inference
Turn any computer or edge device into a command center for your computer vision projects.
agents classification computer-vision deployment docker inference inference-api inference-server instance-segmentation jetson machine-learning object-detection onnx python tensorrt vit yolo11 yolov12 yolov5 yolov8
Last synced: 10 Apr 2026
https://github.com/tairov/llama2.mojo
Inference Llama 2 in one file of pure 🔥
inference llama llama2 modular mojo parallelize performance simd tensor transformer-architecture vectorization
Last synced: 15 May 2025
https://github.com/pytorch/ao
PyTorch native quantization and sparsity for training and inference
brrr cuda dtypes float8 inference llama mx offloading optimizer pytorch quantization sparsity training transformer
Last synced: 12 May 2025
https://github.com/microsoft/aici
AICI: Prompts as (Wasm) Programs
ai inference language-model llm llm-framework llm-inference llm-serving llmops model-serving rust transformer wasm wasmtime
Last synced: 14 May 2025
https://github.com/deepspeedai/deepspeed-mii
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
deep-learning inference pytorch
Last synced: 29 Apr 2025
https://github.com/microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
deep-learning inference pytorch
Last synced: 05 Apr 2025
https://github.com/deepspeedai/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
deep-learning inference pytorch
Last synced: 14 Mar 2025
https://github.com/google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
convolutional-neural-network convolutional-neural-networks cpu inference inference-optimization matrix-multiplication mobile-inference multithreading neural-network neural-networks simd
Last synced: 13 Mar 2025
https://github.com/tobegit3hub/tensorflow_template_application
TensorFlow template application for deep learning
cnn csv deep-learning inference libsvm lstm machine-learning mlp serving tensorboard tensorflow tfrecords wide-and-deep
Last synced: 15 May 2025
https://github.com/dstackai/dstack
dstack is an open-source alternative to Kubernetes and Slurm, designed to simplify GPU allocation and AI workload orchestration for ML teams across top clouds, on-prem clusters, and accelerators.
amd aws azure cloud docker fine-tuning gcp gpu inference k8s kubernetes llms machine-learning nvidia orchestration python slurm training
Last synced: 21 Jan 2026
https://github.com/maratyszcza/nnpack
Acceleration package for neural networks on multi-core CPUs
convolutional-layers cpu fast-fourier-transform high-performance high-performance-computing inference matrix-multiplication multithreading neural-network neural-networks simd winograd-transform
Last synced: 15 May 2025
https://github.com/Maratyszcza/NNPACK
Acceleration package for neural networks on multi-core CPUs
convolutional-layers cpu fast-fourier-transform high-performance high-performance-computing inference matrix-multiplication multithreading neural-network neural-networks simd winograd-transform
Last synced: 18 Mar 2025
https://github.com/els-rd/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
deep-learning deployment inference machine-learning natural-language-processing server
Last synced: 14 May 2025
https://github.com/ELS-RD/transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
deep-learning deployment inference machine-learning natural-language-processing server
Last synced: 03 Apr 2025
https://github.com/trymirai/uzu
A high-performance inference engine for AI models
ai high-performance inference llm metal rust tts
Last synced: 08 Jun 2026
https://github.com/Delta-ML/delta
DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/
asr custom-ops deep-learning emotion-recognition front-end inference nlp nlu ops seq2seq sequence-to-sequence serving speaker-verification speech speech-recognition tensorflow tensorflow-lite tensorflow-serving text-classification text-generation
Last synced: 07 Apr 2025
https://github.com/tencent/turbotransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
albert bert decoder gpt2 gpu huggingface-transformers inference machine-translation nlp pytorch roberta transformer
Last synced: 15 May 2025
https://github.com/Tencent/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
albert bert decoder gpt2 gpu huggingface-transformers inference machine-translation nlp pytorch roberta transformer
Last synced: 19 Mar 2025
https://github.com/ebhy/budgetml
Deploy a ML inference service on a budget in less than 10 lines of code.
api data-science deployment fastapi inference machine-learning mlops
Last synced: 15 May 2025
https://github.com/pykeio/ort
Fast ML inference & training for ONNX models in Rust
ai ai-training fine-tuning inference machine-learning onnx onnxruntime rust
Last synced: 19 Oct 2025
https://github.com/0xCrunchyy/10x
Optimized inference and fine-tuning framework for diffusion (image & video) models. Up to 3x faster & 80% less VRAM.
artificial-inteligence diffusion diffusion-models fine-tuning flux gpt inference lora pytorch sdxl
Last synced: 09 Jan 2026
https://github.com/kamalkraj/bert-ner
Pytorch-Named-Entity-Recognition-with-BERT
bert bert-ner conll-2003 cpp11 curl inference named-entity-recognition postman pretrained-models pytorch
Last synced: 16 May 2025
https://github.com/kamalkraj/BERT-NER
Pytorch-Named-Entity-Recognition-with-BERT
bert bert-ner conll-2003 cpp11 curl inference named-entity-recognition postman pretrained-models pytorch
Last synced: 07 May 2025
https://github.com/fentechsolutions/causaldiscoverytoolbox
Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
algorithm causal-discovery causal-inference causal-models causality graph graph-structure-recovery inference machine-learning python toolbox
Last synced: 15 May 2025
https://github.com/FenTechSolutions/CausalDiscoveryToolbox
Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.
algorithm causal-discovery causal-inference causal-models causality graph graph-structure-recovery inference machine-learning python toolbox
Last synced: 26 Mar 2025
https://github.com/RightNow-AI/picolm
Run a 1-billion parameter LLM on a $10 board with 256MB RAM
arm embedded inference llm openclaw picoclaw quantization raspberry-pi risc-v
Last synced: 19 Jun 2026
https://github.com/awslabs/multi-model-server
Multi Model Server is a tool for serving neural net models for inference
ai deep-learning inference mxnet neural-network onnx server
Last synced: 14 Jan 2026
https://github.com/huawei-noah/bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
android arm bolt caffe cnn cv deep-learning high-performance huawei inference ios mali mobile nlp noah onnx rnn tensorflow x86
Last synced: 16 May 2025
https://github.com/uber/neuropod
A uniform interface to run deep learning models from multiple frameworks
deep-learning deeplearning incubation inference keras machine-learning machinelearning pytorch tensorflow
Last synced: 11 Jun 2025
https://github.com/openintrostat/ims
📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference.
bootstrap-confidence-intervals inference modern-statistics openintro rstats simulation statistics
Last synced: 26 Jan 2026
https://github.com/alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
gpt inference llama llm llm-serving llmops model-serving
Last synced: 14 Oct 2025
https://github.com/OpenIntroStat/ims
📚 Introduction to Modern Statistics - A college-level open-source textbook with a modern approach highlighting multivariable relationships and simulation-based inference. For v1, see https://openintro-ims.netlify.app.
bootstrap-confidence-intervals inference modern-statistics openintro rstats simulation statistics
Last synced: 17 Apr 2025
https://github.com/Adlik/Adlik
Adlik: Toolkit for Accelerating Deep Learning Inference
compiler deep-learning docker-images inference inference-engine model-optimizer openvino tensorflow-serving tensorrt
Last synced: 20 Nov 2025
https://github.com/serizba/cppflow
Run TensorFlow models in C++ without installation and without Bazel
c cpp inference model neural-networks tensorflow tensorflow-cpp tensorflow-examples tensorflow-models
Last synced: 16 May 2025
https://github.com/triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Last synced: 15 May 2025
https://github.com/vllm-project/vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
ascend inference llm llm-serving llmops mlops model-serving transformer vllm
Last synced: 27 Feb 2026
https://github.com/efeslab/nanoflow
A throughput-oriented high-performance serving framework for LLMs
cuda inference llama2 llm llm-serving model-serving
Last synced: 16 May 2025
https://github.com/efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
cuda inference llama2 llm llm-serving model-serving
Last synced: 21 Apr 2025
https://github.com/pipeless-ai/pipeless
An open-source computer vision framework to build and deploy apps in minutes
artificial-intelligence cloud computer-vision deep-learning ffmpeg gstreamer inference inference-server machine-learning multimedia multimedia-applications object-detection perception pipeline-framework python stream-processing video video-processing vision-framework yolo
Last synced: 09 Apr 2026
https://github.com/theodo-group/genossgpt
One API for all LLMs either Private or Public (Anthropic, Llama V2, GPT 3.5/4, Vertex, GPT4ALL, HuggingFace ...) 🌈🐂 Replace OpenAI GPT with any LLMs in your app with one line.
api gpt gpt4all huggingface inference llama llm openai private public
Last synced: 04 Apr 2025