An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with inference-engine

A curated list of projects in awesome lists tagged with inference-engine .

https://github.com/FedML-AI/FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 04 Apr 2025

https://github.com/fedml-ai/fedml

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

ai-agent deep-learning distributed-training edge-ai federated-learning inference-engine machine-learning mlops model-deployment model-serving on-device-training

Last synced: 08 May 2025

https://github.com/zjhellofss/kuiperinfer

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

caffe convolution deep-learning deep-neural-networks diy graph-algorithms inference inference-engine maxpooling ncnn pnnx pytorch relu resnet sigmoid yolo yolov5

Last synced: 14 May 2025

https://github.com/zjhellofss/KuiperInfer

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

caffe convolution deep-learning deep-neural-networks diy graph-algorithms inference inference-engine maxpooling ncnn pnnx pytorch relu resnet sigmoid yolo yolov5

Last synced: 19 Mar 2025

https://github.com/tencent/feathercnn

FeatherCNN is a high performance inference engine for convolutional neural networks.

android arm-neon caffe convolutional-neural-networks inference-engine ios

Last synced: 12 Apr 2025

https://github.com/Tencent/FeatherCNN

FeatherCNN is a high performance inference engine for convolutional neural networks.

android arm-neon caffe convolutional-neural-networks inference-engine ios

Last synced: 18 Apr 2025

https://github.com/paddlepaddle/paddle.js

Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.

deep-learning inference-engine model ocr paddlepaddle webassembly webgl webgpu

Last synced: 15 May 2025

https://github.com/PaddlePaddle/Paddle.js

Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.

deep-learning inference-engine model ocr paddlepaddle webassembly webgl webgpu

Last synced: 20 Mar 2025

https://github.com/zhihu/zhilight

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda deepseek-r1 gpt inference-engine llama llm llm-inference llm-serving model-serving pytorch

Last synced: 15 May 2025

https://github.com/msnh2012/Msnhnet

🔥 (yolov3 yolov4 yolov5 unet ...)A mini pytorch inference framework which inspired from darknet.

darknet inference-engine jetson-nx mobilenetv2 mobilenetyolo pytorch yolov3 yolov4 yolov5

Last synced: 20 Apr 2025

https://github.com/jjang-ai/mlxstudio

MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)

ai ai-agents anthropic anthropic-api apple-silicon inference inference-engine llm lmstudio macbook macstudio mlx mlxllm mlxstudio omlx omlx-alternative openai-api

Last synced: 22 May 2026

https://github.com/tencent/forward

A library for high performance deep learning inference on NVIDIA GPUs.

cuda deep-learning forward gpu inference inference-engine keras neural-network onnx pytorch tensorflow tensorrt

Last synced: 05 Apr 2025

https://github.com/Tencent/Forward

A library for high performance deep learning inference on NVIDIA GPUs.

cuda deep-learning forward gpu inference inference-engine keras neural-network onnx pytorch tensorflow tensorrt

Last synced: 18 Apr 2025

https://github.com/PyCQA/astroid

A common base representation of python source code for pylint and other projects

ast closember hacktoberfest inference-engine parser static-analysis static-code-analysis

Last synced: 24 Apr 2025

https://github.com/pylint-dev/astroid

A common base representation of python source code for pylint and other projects

ast closember hacktoberfest inference-engine parser static-analysis static-code-analysis

Last synced: 11 Dec 2025

https://github.com/PaddlePaddle/Anakin

High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.

ai amd arm bitmain cambricon cross-platform high-performance inference-engine intel nvidia

Last synced: 20 Mar 2025

https://github.com/HoloClean/holoclean

A Machine Learning System for Data Enrichment.

data-enrichment data-science inference-engine machine-learning pytorch

Last synced: 02 May 2025

https://github.com/ulfurinn/wongi-engine

A rule engine written in Ruby.

inference-engine rete ruby rule-engine

Last synced: 23 Oct 2025

https://github.com/zjhellofss/KuiperLLama

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2

Last synced: 08 Sep 2025

https://github.com/zjhellofss/kuiperllama

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2

Last synced: 16 May 2025

https://github.com/andrewkchan/yalm

Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O

cpp cuda inference-engine llama llamacpp llm llm-inference machine-learning mistral

Last synced: 12 Apr 2025

https://github.com/chengzeyi/paraattention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

attention diffusers flux hunyuan-video inference inference-engine parallel-computing transformers

Last synced: 09 Apr 2025

https://github.com/rocm/mivisionx

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

amd-opencl amd-opencv amd-openvx computer-vision inference inference-engine khronos-openvx machine-learning neural-network nnef onnx opencl openvx openvx-extensions openvx-neural-network rocm ryzen virtual-reality windows-machine-learning winml

Last synced: 16 May 2025

https://github.com/ROCm/MIVisionX

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.

amd-opencl amd-opencv amd-openvx computer-vision inference inference-engine khronos-openvx machine-learning neural-network nnef onnx opencl openvx openvx-extensions openvx-neural-network rocm ryzen virtual-reality windows-machine-learning winml

Last synced: 18 Jul 2025

https://github.com/lofcz/llmtornado

The .NET library to consume 100+ APIs: OpenAI, Anthropic, Google, DeepSeek, Cohere, Mistral, Azure, xAI, Perplexity, Groq, Ollama, vLLM, and many more!

ai ai-api ai-apis anthropic-api azure-api cohere-api deepseek-api gemini-api google-api groq-api hacktoberfest inference inference-engine localllm mistral-api netcore ollama-api openai-api perplexity-api xai-api

Last synced: 14 Feb 2026

https://github.com/openvinotoolkit/openvino_contrib

Repository for OpenVINO's extra modules

arm inference-engine java nvidia-gpu openvino pytorch

Last synced: 08 Oct 2025

https://github.com/nrl-ai/daisykit

DaisyKit is an easy AI toolkit with face mask detection, pose detection, background matting, barcode detection, face recognition and more. - with NCNN, OpenCV, Python wrappers

background-matting barcode-detection computer-vision cpp deep-learning deployment embedded face-detection face-mask-detection hand-pose inference-engine machine-learning mobile ncnn neural-network no-code object-detection python vulkan

Last synced: 09 Mar 2026

https://github.com/Torsion-Audio/nn-inference-template

Neural network inference template for real-time cricital audio environments - presented at ADC23

audio audio-plugin deep-learning inference inference-engine juce libtorch machine-learning neural-network onnx onnxruntime pytorch tensorflow tensorflow-lite

Last synced: 08 May 2025

https://github.com/xuantie-rv/csi-nn2

An optimized neural network operator library for chips base on Xuantie CPU.

deep-learning inference-engine neural-network risc-v riscv riscv-assembly

Last synced: 28 Feb 2026

https://github.com/infinitensor/refactorgraph

分层解耦的深度学习推理引擎

ai-compiler dataflow-graph inference-engine

Last synced: 27 Oct 2025

https://github.com/BMW-InnovationLab/BMW-IntelOpenVINO-Detection-Inference-API

This is a repository for a No-Code object detection inference API using the OpenVINO. It's supported on both Windows and Linux Operating systems.

computer-vision cpu deeplearning detection-algorithm detection-api docker inference inference-engine neural-network nocode object-detection openvino openvino-model-zoo openvino-toolkit resnet rest-api

Last synced: 04 Apr 2025

https://github.com/image-py/planer

Powerful Light Artificial NEuRon inference framework for CNN

cnn deep-learning inference-engine

Last synced: 15 May 2025

https://github.com/opencog/ure

[NO LONGER MAINTAINED, SUPERSEDED BY https://github.com/trueagi-io/chaining]. Unified Rule Engine. Graph rewriting system for the AtomSpace. Used as reasoning engine for OpenCog.

backward-chaining backward-induction chainer forward-chaining graph-rewriting inference inference-engine inference-rules rule-engine rules-engine

Last synced: 05 Apr 2025

https://github.com/zouyee/dmlx

Big models. Small Macs. Zero excuses.

apple apple-silicon inference-engine llm llms mlx

Last synced: 30 May 2026

https://github.com/ryukinix/lisp-inference

An Inference Engine based on Propositional Calculus written in Common Lisp

common-lisp inference-engine inference-rules lisp-inference propositional-calculus propositional-logic truth-table

Last synced: 21 Jan 2026

https://github.com/yas-sim/handwritten-japanese-ocr

Handwritten Japanese OCR demo using touch panel to draw the input text using Intel OpenVINO toolkit

deep-learning dl-models handwritten-text-recognition inference-engine intel japanese ocr ocr-demo openvino python text-detection text-recognition text-regions touch-panel

Last synced: 22 Apr 2025

https://github.com/yas-sim/dbface-on-openvino

Describes how to run DBFace, a real-time, single-shot face detection model on Intel OpenVINO

deep-learning face-detection inference-engine intel onnx openvino python pytorch sample

Last synced: 07 Aug 2025

https://github.com/EugenHotaj/zig_gpt2

GPT-2 inference engine written in Zig

gpt-2 inference-engine zig

Last synced: 20 Apr 2025

https://github.com/kyegomez/exa

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.

inference-engine llama2 llama2-7b llamacpp llamas llm-inference llms opensource

Last synced: 09 Aug 2025

https://github.com/animator/titus2

Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+

analytics inference inference-engine ml-engine model-deployment model-evaluation model-serving pfa pfa-standard pmml python scoring scoring-engine titus

Last synced: 09 Mar 2026

https://github.com/zpye/SimpleInfer

A simple neural network inference framework

ai-framework cpp deep-learning inference-engine neural-network xmake

Last synced: 12 Mar 2025

https://github.com/milosgajdos/ncs

Movidius Neural Compute Stick V2.0 API Go bindings

deep-learning edge-computing go golang inference-engine movidius neural-networks usb

Last synced: 23 Apr 2025

https://github.com/m0dulo/InferSpore

🌱 A fully independent Large Language Model (LLM) inference engine, built leveraging cuBLAS and cub. 🧩

cuda inference-engine llama2 llm

Last synced: 25 Apr 2025

https://github.com/fritzo/pomagma

An inference engine for extensional untyped λ-calculus

inference-engine lambda-calculus theorem-proving

Last synced: 05 May 2025

https://github.com/aws-solutions-library-samples/guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks

Comprehensive, scalable ML inference architecture using Amazon EKS, leveraging Graviton processors for cost-effective CPU-based inference and GPU instances for accelerated inference. Guidance provides a complete end-to-end platform for deploying LLMs with agentic AI capabilities, including RAG and MCP

agentic-ai agentic-workflow huggingface inference inference-engine langfuse litellm-ai-gateway mcp-client mcp-server opensource-ai vllm

Last synced: 14 Oct 2025

https://github.com/denilgabani/people-counter-app

People counter app is used for monitoring people at specific area.

edge edge-ai inference-engine nodejs opencv openvino pedestrian-detection people-counter people-detection

Last synced: 10 Mar 2025

https://github.com/dieharders/obrew-studio-server

Obrew Server: A self-hostable machine learning engine. Build agents and schedule workflows private to you.

agents ai desktop-app gguf-models headless-server inference-engine llamacpp local-ai private-ai rag self-hosted text-generation

Last synced: 23 Feb 2026

https://github.com/gabriele-mastrapasqua/qwen3-tts

Pure C inference engine for Qwen3-TTS text-to-speech. No Python, no PyTorch — just C and BLAS. Supports 0.6B and 1.7B models, 9 voices, 10 languages.

c cpu-inference inference inference-engine multilingual pure-c qwen qwen3-tts simd speech-synthesis text-to-speech tts voiceclone voicecloning

Last synced: 01 Apr 2026

https://github.com/radiantone/inferencegraph

A knowledge graph based forward chain inferencing engine in typescript/node.

artificial-intelligence forward-chaining framework inference-engine inferences javascript knowledge-graph nodejs typescript

Last synced: 18 Aug 2025

https://github.com/ramborogers/cyber-inference

Cyber-Inference is a web GUI management tool for running OpenAI-compatible inference servers. Built on llama.cpp, it provides automatic model management, dynamic resource allocation, and a beautiful cyberpunk-themed interface designed for edge deployment.

ai ai-agents inference-api inference-engine llamacpp metal nvidia

Last synced: 24 May 2026

https://github.com/jerinphilip/slimt

Inference slice of marian for bergamot's tiny11 models. Faster to compile, and wield. Fewer model-archs than bergamot-translator.

cpp20 inference-engine machine-translation pybind11 python

Last synced: 10 Apr 2025

https://github.com/yas-sim/openvino-multi-ncs2-throughput-mode

Describing How to Use Throughput Mode to Run Inference Effectively on Multiple NCS2 Devices with Intel (r) OpenVINO toolkit

accelerator deep-learning inference inference-engine intel ncs ncs2 neural-compute-stick neural-compute-stick-2 openvino openvino-toolkit python throughput-performance

Last synced: 22 Apr 2025

https://github.com/kiritigowda/mivisionx-inference-tutorial

MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit.

amd caffe classification cnn detection inception inference inference-engine machine-learning mivisionx neural-network nnef onnx opencv openvx openvx-neural-net resnet segmentation tensorflow vggnet

Last synced: 11 Apr 2025

https://github.com/gmkung/cheemera

A Node.js backend that exposes a Typescript implementation of the deCheem inference engine.

ai expert-advisor expert-system gpt inference-api inference-engine

Last synced: 24 Jul 2025

https://github.com/catseye/cardboard-prolog

MIRROR of https://codeberg.org/catseye/Cardboard-Prolog : A bare-bones inference engine in 120 lines of purely functional Scheme

deductive-inference didactic inference-engine logic-programming prolog-interpreter pure-functional

Last synced: 27 Feb 2026

https://github.com/yas-sim/pyopenvino

Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

ai convolution convolutional-neural-network deep-learning experimental inference inference-engine inference-library mnist mnist-classification object-detection openvino python reference-implementation ssd-mobilenet

Last synced: 22 Apr 2025

https://github.com/andygeiss/diego

DIEGO is a data driven, forward-chaining, rule-based expert system written from scratch.

concepts conditions facts forward-chaining go golang inference-engine rules

Last synced: 30 Apr 2025

https://github.com/prithivsakthiur/vision-inference

What Happen Next ? Live Inference

css docker html inference-engine javascript live model

Last synced: 06 May 2025

https://github.com/yas-sim/openvino-onnx-importer-api

Demonstrate how to use ONNX importer API in Intel OpenVINO toolkit. This API allows user to load an ONNX model and run inference with OpenVINO Inference Engine.

deep-learning inference-engine intel ngraph onnx onnx-format onnx-model openvino openvino-toolkit

Last synced: 22 Apr 2025

https://github.com/exalsius/rca-llm

An evaluation framework for root cause analysis in large-scale LLM inference systems

inference-engine large-language-models load-testing root-cause-analysis

Last synced: 16 Jun 2026

https://github.com/ez-optimium/optimium

Your AI Catalyst: inference backend to maximize your model's inference performance

ai-compiler amd arm deep-learning inference inference-engine inference-optimization intel mediapipe neural-network raspberry-pi runtime tensorflow-lite

Last synced: 24 Jul 2025

https://github.com/animator/orange3-scoring

:tangerine: :dart: Score PMML and PFA models in Orange3

inference inference-engine orange orange3 pfa pfa-standard pmml scoring scoring-engine

Last synced: 14 Sep 2025

https://github.com/patrickcurl/sanechain

Filling in the missing gaps with langchain, and creating OO wrappers to simplify some workloads.

agent ai artificial-intelligence cohereai gpt gpt3 gpt4 inference-engine langchain language-model llama llama-index llamacpp llm llmops llms openai openai-api

Last synced: 07 Nov 2025

https://github.com/stefanolusardi/tiny_inference_engine

Client/Server system to perform distributed inference on high load systems.

ai cmake conan cpp deep-neural-networks docker grpc inference-client inference-engine inference-server kserve onnxruntime

Last synced: 23 Apr 2025

https://github.com/acrion/zelph

A sophisticated semantic network system capable of encoding inference rules within the network itself. Built for efficient memory usage and powerful logical reasoning, zelph can process the entire Wikidata knowledge graph (1.7TB) to detect contradictions and make logical deductions.

contradiction-detection cpp-library inference-engine knowledge-graph knowledge-representation logical-reasoning memory-optimization semantic-network semantic-web wikidata

Last synced: 21 Jan 2026

https://github.com/xxxn3m3s1sxxx/atlas-tq1_0

TQ1.0 ternary inference engine for BitNet b1.58 on CPU. Pack + run Falcon3-1B/3B/7B/10B, no GPU needed.

1bit-llm avx2 bitnet bitnet-b158 bonsai cpp cpu-inference edge-ai falcon3 high-performance inference-engine llm-inference local-llm quantization simd ternary ternary-weights

Last synced: 31 May 2026

https://github.com/zhihu/ZhiLight

A highly optimized inference acceleration engine for Llama and its variants.

cuda gpt inference-engine llama llm llm-serving pytorch

Last synced: 12 Aug 2025

https://github.com/leynier/inferfuzzy

Inferfuzzy es un biblioteca de Python para implementar Sistemas de Inferencia Difusa

cuba fuzzy-logic fuzzy-matching inference-engine inference-rules matcom matcom-uh python python-3 python3 typer

Last synced: 08 Oct 2025

https://github.com/kiritigowda/mivisionx-inference-analyzer

MIVisionX Python Inference Analyzer uses pre-trained ONNX/NNEF/Caffe models to analyze inference results and summarize individual image results

amd amdgpu caffe docker-images inceptionv4 inference inference-engine inference-optimization mivisionx mivisionx-inference-analyzer nnef nnir onnx opencl openvx resnet resnet-50 rocm squeezenet vgg

Last synced: 11 Apr 2025

https://github.com/yas-sim/openvino-workshop-en

Hands-on workshop contents to learn Intel distribution of OpenVINO toolkit - a deep learning inferencing library

classification deep-learning hands-on inference inference-engine intel intel-distribution openvino-toolkit openvno tutorial workshop workshop-contents

Last synced: 18 May 2026

https://github.com/tvanfossen/entropic

Local-first agentic inference engine in C/C++. Multi-tier model routing, grammar-constrained output, MCP tool servers. Embeddable via C ABI.

agentic-ai agentic-framework cpp cpp20 cuda edge-ai embedded-ai gbnf gguf grammar-constrained-decoding inference-engine llama-cpp llm local-llm mcp on-device-ai privacy-first tool-calling

Last synced: 30 May 2026