Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with llm-inference
A curated list of projects in awesome lists tagged with llm-inference .
https://github.com/nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Last synced: 13 Jan 2025
https://microsoft.github.io/autogen/
A programming framework for agentic AI 🤖
agent-based-framework agent-oriented-programming agentic agentic-agi chat chat-application chatbot chatgpt gpt gpt-35-turbo gpt-4 llm-agent llm-framework llm-inference llmops
Last synced: 14 Nov 2024
https://github.com/microsoft/autogen
A programming framework for agentic AI 🤖
agent-based-framework agent-oriented-programming agentic agentic-agi chat chat-application chatbot chatgpt gpt gpt-35-turbo gpt-4 llm-agent llm-framework llm-inference llmops
Last synced: 13 Jan 2025
https://github.com/liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
llm llm-inference llm-serving llm-training llmops
Last synced: 04 Dec 2024
https://github.com/lightning-ai/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
ai artificial-intelligence deep-learning large-language-models llm llm-inference llms
Last synced: 13 Jan 2025
https://github.com/Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
ai artificial-intelligence deep-learning large-language-models llm llm-inference llms
Last synced: 30 Oct 2024
https://github.com/lightning-ai/lit-gpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
ai artificial-intelligence deep-learning large-language-models llm llm-inference llms
Last synced: 13 Dec 2024
https://github.com/bentoml/openllm
Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
bentoml fine-tuning llama llama2 llama3-1 llama3-2 llama3-2-vision llm llm-inference llm-ops llm-serving llmops mistral mlops model-inference open-source-llm openllm vicuna
Last synced: 20 Jan 2025
https://github.com/mistralai/mistral-inference
Official inference library for Mistral models
Last synced: 13 Jan 2025
https://github.com/mistralai/mistral-inference?tab=readme-ov-file
Official inference library for Mistral models
Last synced: 27 Oct 2024
https://github.com/bentoml/OpenLLM
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint, locally and in the cloud.
ai bentoml falcon fine-tuning llama llama2 llm llm-inference llm-ops llm-serving llmops mistral ml mlops model-inference mpt open-source-llm openllm stablelm vicuna
Last synced: 26 Oct 2024
https://github.com/sjtu-ipads/powerinfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
bamboo-7b falcon large-language-models llama llm llm-inference local-inference
Last synced: 20 Jan 2025
https://github.com/SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
bamboo-7b falcon large-language-models llama llm llm-inference local-inference
Last synced: 27 Oct 2024
https://github.com/openvinotoolkit/openvino
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
ai computer-vision deep-learning deploy-ai diffusion-models generative-ai good-first-issue inference llm-inference natural-language-processing nlp openvino optimize-ai performance-boost recommendation-system speech-recognition stable-diffusion transformers yolo
Last synced: 13 Jan 2025
https://github.com/bentoml/bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and much more!
ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python
Last synced: 13 Jan 2025
https://github.com/bentoml/BentoML
The most flexible way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Inference Graph/Pipelines, Compound AI systems, Multi-Modal, RAG as a Service, and more!
deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python
Last synced: 24 Oct 2024
https://github.com/internlm/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind
Last synced: 20 Jan 2025
https://github.com/superduper-io/superduper
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search
Last synced: 14 Jan 2025
https://github.com/InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
codellama cuda-kernels deepspeed fastertransformer internlm llama llama2 llama3 llm llm-inference turbomind
Last synced: 28 Oct 2024
https://github.com/kubeflow/kfserving
Standardized Serverless ML Inference Platform on Kubernetes
artificial-intelligence genai hacktoberfest istio k8s knative kserve kubeflow kubernetes llm-inference machine-learning mlops model-interpretability model-serving pytorch service-mesh sklearn tensorflow xgboost
Last synced: 14 Jan 2025
https://github.com/kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
artificial-intelligence genai hacktoberfest istio k8s knative kserve kubeflow kubernetes llm-inference machine-learning mlops model-interpretability model-serving pytorch service-mesh sklearn tensorflow xgboost
Last synced: 13 Jan 2025
https://github.com/neuralmagic/deepsparse
Sparsity-aware deep learning inference runtime for CPUs
computer-vision cpus deepsparse inference llm-inference machinelearning nlp object-detection onnx performance pretrained-models pruning quantization sparsification
Last synced: 14 Jan 2025
https://github.com/databricks/dbrx
Code examples and resources for DBRX, a large language model developed by Databricks
databricks gen-ai generative-ai llm llm-inference llm-training mosaic-ai
Last synced: 17 Jan 2025
https://github.com/fasterdecoding/medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Last synced: 15 Jan 2025
https://github.com/NVIDIA/GenerativeAIExamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server
Last synced: 31 Oct 2024
https://github.com/nvidia/generativeaiexamples
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
gpu-acceleration large-language-models llm llm-inference microservice nemo rag retrieval-augmented-generation tensorrt triton-inference-server
Last synced: 16 Jan 2025
https://github.com/FasterDecoding/Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
Last synced: 16 Nov 2024
https://github.com/predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
fine-tuning gpt llama llm llm-inference llm-serving llmops lora model-serving pytorch transformers
Last synced: 20 Jan 2025
https://github.com/microsoft/aici
AICI: Prompts as (Wasm) Programs
ai inference language-model llm llm-framework llm-inference llm-serving llmops model-serving rust transformer wasm wasmtime
Last synced: 16 Jan 2025
https://github.com/liltom-eth/llama2-webui
Run any Llama 2 locally with gradio UI on GPU or CPU from anywhere (Linux/Windows/Mac). Use `llama2-wrapper` as your local llama2 backend for Generative Agents/Apps.
llama-2 llama2 llm llm-inference
Last synced: 16 Jan 2025
https://github.com/intel/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
4-bits chatbot chatpdf cpu gaudi2 gpu habana intel-optimized-llamacpp large-language-model llm-cpu llm-inference neural-chat neural-chat-7b neurips2023 rag retrieval speculative-decoding streamingllm xeon
Last synced: 15 Oct 2024
https://github.com/dstackai/dstack
dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
aws azure cloud fine-tuning gcp gpu k8s kubernetes llm-inference llm-training llmops llms machine-learning orchestration python training
Last synced: 15 Jan 2025
https://github.com/b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
distributed-computing distributed-llm llama2 llama3 llm llm-inference llms neural-network open-llm
Last synced: 16 Jan 2025
https://github.com/flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
cuda flash-attention gpu jit large-large-models llm-inference pytorch
Last synced: 16 Jan 2025
https://github.com/codelion/optillm
Optimizing inference proxy for LLMs
agent agentic-ai agentic-framework agentic-workflow agents api-gateway chain-of-thought genai large-language-models llm llm-inference llmapi mixture-of-experts moa monte-carlo-tree-search openai openai-api optimization prompt-engineering proxy-server
Last synced: 14 Jan 2025
https://github.com/ray-project/ray-llm
RayLLM - LLMs on Ray
distributed-systems large-language-models llm llm-inference llm-serving llmops ray serving transformers
Last synced: 15 Oct 2024
https://github.com/PySpur-Dev/PySpur
Graph-Based Editor for LLM Workflows
agent agents ai javascript llm llm-inference openai python react workflow
Last synced: 08 Jan 2025
https://github.com/PySpur-Dev/pyspur
Graph-Based Editor for LLM Workflows
agent agents ai javascript llm llm-inference openai python react workflow
Last synced: 02 Jan 2025
https://github.com/lean-dojo/LeanCopilot
LLMs as Copilots for Theorem Proving in Lean
formal-mathematics lean lean4 llm-inference machine-learning theorem-proving
Last synced: 20 Nov 2024
https://github.com/pyspur-dev/pyspur
Graph-Based Editor for LLM Workflows
agent ai javascript llm llm-inference openai python react workflow ycombinator
Last synced: 25 Dec 2024
https://github.com/SafeAILab/EAGLE
Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)
large-language-models llm-inference speculative-decoding
Last synced: 28 Oct 2024
https://github.com/katanemo/archgw
Arch is an intelligent prompt gateway. Engineered with (fast) LLMs for the secure handling, robust observability, and seamless integration of prompts with your APIs - outside business logic. Built by the core contributors of Envoy proxy, on Envoy.
ai-gateway envoy envoyproxy gateway generative-ai llm-gateway llm-inference llm-routing llmops llms openai prompt proxy proxy-server routing
Last synced: 22 Nov 2024
https://github.com/stoyan-stoyanov/llmflows
LLMFlows - Simple, Explicit and Transparent LLM Apps
ai chatgpt gpt-4 llm llm-inference llmops llms machine-learning openai prompt-engineering python question-answering vector-database
Last synced: 06 Nov 2024
https://github.com/mukel/llama3.java
Practical Llama 3 inference in Java
chatgpt genai gguf huggingface java llama llama3 llamacpp llm llm-inference llms openai simd transformers
Last synced: 18 Jan 2025
https://github.com/beam-cloud/beta9
Run serverless GPU workloads with fast cold starts on bare-metal servers, anywhere in the world
autoscaler cloudrun cuda developer-productivity distributed-computing faas fine-tuning functions-as-a-service generative-ai gpu lambda large-language-models llm llm-inference ml-platform paas self-hosted serverless serverless-containers
Last synced: 19 Jan 2025
https://github.com/run-ai/genv
GPU environment and cluster management with LLM support
bash container-runtime containers data-science deep-learning docker gpu gpus jupyter-notebook jupyterlab-extension k8s kubernetes llm-inference llms nvidia-gpu ollama ray vscode vscode-extension zsh
Last synced: 19 Jan 2025
https://github.com/rohan-paul/llm-finetuning-large-language-models
LLM (Large Language Model) FineTuning
gpt-3 gpt3-turbo large-language-models llama2 llm llm-finetuning llm-inference llm-serving llm-training mistral-7b open-source-llm pytorch
Last synced: 18 Jan 2025
https://github.com/hpcaitech/swiftinfer
Efficient AI Inference & Serving
artificial-intelligence deep-learning gpt inference llama llama2 llm-inference llm-serving
Last synced: 19 Jan 2025
https://github.com/flagai-open/aquila2
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
llm llm-inference llm-training
Last synced: 18 Jan 2025
https://github.com/kenza-ai/sagify
LLMs and Machine Learning done easily
ai-gateway anthropic cohere generative-ai langchain langchain-python large-language-model large-language-models llm llm-inference llmops open-source-llm openai sagemaker
Last synced: 19 Jan 2025
https://github.com/FlagAI-Open/Aquila2
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
llm llm-inference llm-training
Last synced: 06 Nov 2024
https://github.com/feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
attention-is-all-you-need deepspeed-ulysses llm-inference llm-training pytorch ring-attention
Last synced: 18 Jan 2025
https://github.com/vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer
Last synced: 16 Nov 2024
https://github.com/vectorch-ai/scalellm
A high-performance inference system for large language models, designed for production environments.
cuda efficiency gpu inference llama llama3 llm llm-inference model performance production serving speculative transformer
Last synced: 17 Jan 2025
https://github.com/EulerSearch/embedding_studio
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
embeddings embeddings-similarity fine-tuning llm-inference query-parser search-algorithm search-engine search-query-parser semantic-similarity unstructured-data unstructured-search vector-database
Last synced: 09 Dec 2024
https://github.com/preternatural-explore/mlx-swift-chat
A multi-platform SwiftUI frontend for running local LLMs with Apple's MLX framework.
ios llm-inference macos mlx mlx-swift swiftui
Last synced: 07 Nov 2024
https://github.com/rizerphe/local-llm-function-calling
A tool for generating function arguments and choosing what function to call with local LLMs
chatgpt-functions huggingface-transformers json-schema llm llm-inference openai-function-call openai-functions
Last synced: 18 Jan 2025
https://github.com/nvidia/star-attention
Efficient LLM Inference over Long Sequences
attention-mechanism large-language-models llm-inference
Last synced: 15 Jan 2025
https://github.com/eastriverlee/LLM.swift
LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.
gguf ios llm llm-inference macos swift tvos visionos watchos
Last synced: 23 Oct 2024
https://github.com/felladrin/minisearch
Minimalist web-searching platform with an AI assistant that runs directly from your browser. Uses WebLLM, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space
ai ai-search-engine artificial-intelligence generative-ai gpu-accelerated information-retrieval llm llm-inference metasearch metasearch-engine perplexity perplexity-ai question-answering rag retrieval-augmented-generation searxng web-llm web-search webapp wllama
Last synced: 15 Jan 2025
https://github.com/ray-project/ray-educational-materials
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
deep-learning distributed-machine-learning generative-ai llm llm-inference llm-serving ray ray-data ray-distributed ray-serve ray-train ray-tune
Last synced: 15 Nov 2024
https://github.com/ugorsahin/talkingheads
A library to communicate with ChatGPT, Claude, Copilot, Gemini, HuggingChat, and Pi
browser-automation chatgpt chatgpt-api claude copilot free gemini huggingchat llm-inference python selenium undetected-chromedriver
Last synced: 18 Jan 2025
https://github.com/morpheuslord/hackbot
AI-powered cybersecurity chatbot designed to provide helpful and accurate answers to your cybersecurity-related queries and also do code analysis and scan analysis.
ai automation chatbot cli-chat-app cybersecurity cybersecurity-education cybersecurity-tools llama-api llama2 llama2-7b llamacpp llm-inference runpod
Last synced: 20 Jan 2025
https://github.com/ugorsahin/TalkingHeads
A library to communicate with ChatGPT, Claude, Copilot, Gemini, HuggingChat, and Pi
browser-automation chatgpt chatgpt-api claude copilot free gemini huggingchat llm-inference python selenium undetected-chromedriver
Last synced: 07 Nov 2024
https://github.com/harleyszhang/llm_note
LLM notes, including model inference, transformer model structure, and llm framework code analysis notes
cuda-programming kv-cache llm llm-inference transformer-models triton-kernels vllm
Last synced: 21 Dec 2024
https://github.com/zjhellofss/kuiperllama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2
Last synced: 13 Jan 2025
https://github.com/zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
cpp cuda inference-engine llama2 llama3 llm llm-inference qwen qwen2
Last synced: 03 Jan 2025
https://github.com/bytedance/abq-llm
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
cuda llm-inference mlsys quantized-networks research
Last synced: 14 Jan 2025
https://github.com/inferflow/inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
baichuan2 bloom deepseek falcon gemma internlm llama2 llamacpp llm-inference m2m100 minicpm mistral mixtral mixture-of-experts model-quantization moe multi-gpu-inference phi-2 qwen
Last synced: 14 Jan 2025
https://github.com/promptslab/llmtuner
FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)
fine-tuning fine-tuning-llm finetune finetune-gpt finetune-llama finetune-llm finetune-llms finetune-whisper finetunechatgpt finetuning finetuning-large-language-models finetuning-rl llm llm-framework llm-inference llm-training llmops llmtuner whisper whisper-finetune
Last synced: 16 Jan 2025
https://github.com/ai-hypercomputer/jetstream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 20 Jan 2025
https://github.com/AI-Hypercomputer/JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
gemma gpt gpu inference jax large-language-models llama llama2 llm llm-inference llmops mlops model-serving pytorch tpu transformer
Last synced: 01 Nov 2024
https://github.com/Psycoy/MixEval
The official evaluation suite and dynamic data release for MixEval.
benchmark benchmark-mixture benchmarking-framework benchmarking-suite evaluation evaluation-framework foundation-models large-language-model large-language-models large-multimodal-models llm-evaluation llm-evaluation-framework llm-inference mixeval
Last synced: 16 Nov 2024
https://github.com/arc53/llm-price-compass
This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient LLM GPU selections and cost-effective AI models. LLM provider price comparison, gpu benchmarks to price per token calculation, gpu benchmark table
benchmark gpu hacktoberfest inference-comparison llm llm-comparison llm-inference llm-price
Last synced: 18 Jan 2025
https://github.com/expectedparrot/edsl
Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.
anthropic data-labeling deepinfra domain-specific-language experiments llama2 llm llm-agent llm-framework llm-inference market-research mixtral open-source openai python social-science surveys synthetic-data
Last synced: 18 Jan 2025
https://github.com/C0deMunk33/bespoke_automata
Bespoke Automata is a GUI and deployment pipline for making complex AI agents locally and offline
agents ai automation chatbots developer-tools llm-inference
Last synced: 06 Jan 2025
https://github.com/andrewkchan/yalm
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
cpp cuda inference-engine llama llamacpp llm llm-inference machine-learning mistral
Last synced: 18 Jan 2025
https://github.com/picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
compression efficient-inference gemma generative-ai language-model language-models large-language-model llama llama2 llama3 llm llm-inference llms mistral mixtral model-compression natural-language-processing quantization self-hosted
Last synced: 19 Jan 2025
https://github.com/uiuc-focal-lab/syncode
Efficient and general syntactical decoding for Large Language Models
large-language-models llm llm-inference parser
Last synced: 17 Nov 2024
https://github.com/fasterdecoding/rest
REST: Retrieval-Based Speculative Decoding, NAACL 2024
llm-inference retrieval speculative-decoding
Last synced: 18 Jan 2025
https://github.com/intel/neural-speed
An innovative library for efficient LLM inference via low-bit quantization
cpu fp4 fp8 gaudi2 gpu int4 int8 llamacpp llm-fine-tuning llm-inference low-bit mxformat nf4 sparsity
Last synced: 10 Oct 2024
https://github.com/efeslab/fiddler
Fast Inference of MoE Models with CPU-GPU Orchestration
llm llm-inference local-inference mixtral-8x7b mixture-of-experts
Last synced: 18 Jan 2025
https://github.com/MorpheusAIs/Morpheus
Morpheus - A Network For Powering Smart Agents - Compute + Code + Capital + Community
agents ai compute ethereum llm-inference llms smart-agents smart-contracts
Last synced: 13 Nov 2024
https://github.com/1b5d/llm-api
Run any Large Language Model behind a unified API
chatgpt gptq huggingface langchain llama llamacpp llm llm-inference machine-learning python
Last synced: 08 Jan 2025
https://github.com/cgbur/llama2.zig
Inference Llama 2 in one file of pure Zig
llama llama2 llm llm-inference simd zig ziglang
Last synced: 13 Jan 2025
https://github.com/Infini-AI-Lab/TriForce
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
acceleration efficiency inference llm llm-inference long-context speculative-decoding
Last synced: 19 Nov 2024
https://github.com/armbues/SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
apple-silicon dpo large-language-models llm llm-inference llm-training lora mlx
Last synced: 25 Nov 2024
https://github.com/bytedance/shadowkv
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
cpu-offload high-throughput llm-inference long-context low-rank research sparse-attention
Last synced: 14 Jan 2025
https://github.com/adriankhl/godot-llm
LLM in Godot
game-development gamedev gdextension godot godot-engine godotengine llamacpp llm-inference
Last synced: 22 Dec 2024
https://github.com/modelscope/dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
cpu cuda guided-decoding llm llm-inference native-engine
Last synced: 12 Jan 2025
https://github.com/chenhunghan/ialacol
🪶 Lightweight OpenAI drop-in replacement for Kubernetes
ai cloudnative cuda ggml gptq gpu helm kubernetes langchain llamacpp llm llm-inference llm-serving openai python
Last synced: 20 Jan 2025
https://github.com/vemonet/libre-chat
🦙 Free and Open Source Large Language Model (LLM) chatbot web UI and API. Self-hosted, offline capable and easy to setup. Powered by LangChain.
chatbot chatgpt langchain large-language-models llama2 llm llm-inference mixtral no-code open-source openapi question-answering self-hosted
Last synced: 17 Jan 2025
https://github.com/ictnlp/truthx
Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"
baichuan chatglm chatgpt explainable-ai gpt-4 hallucination hallucinations language-model llama llama2 llama3 llm llm-inference llms mistral representation safety truthfulness
Last synced: 22 Dec 2024
https://github.com/jieyuz2/ecoassistant
EcoAssistant: using LLM assistant more affordably and accurately
chatbot gpt large-language-models llm-inference nlp
Last synced: 29 Dec 2024
https://github.com/aniketmaurya/llm-inference
Large Language Model (LLM) Inference API and Chatbot
chatbot langchain llama llm llm-inference mistral
Last synced: 27 Sep 2024
https://github.com/sophgo/llm-tpu
Run generative AI models in sophgo BM1684X
bm1684x chatglm generative-ai large-language-models llama2 llama3 llm llm-inference llm-infernece qwen qwen-7b qwen1-5 qwen2
Last synced: 18 Jan 2025
https://github.com/genai-impact/ecologits
🌱 EcoLogits tracks the energy consumption and environmental footprint of using generative AI models through APIs.
genai generative-ai green-ai green-software llm llm-inference python sustainability sustainable-ai
Last synced: 15 Jan 2025
https://github.com/ooridata/toolio
GenAI & agent toolkit for Apple Silicon Mac, implementing JSON schema-steered structured output (3SO) and tool-calling in Python. For more on 3SO: https://huggingface.co/blog/ucheog/llm-power-steering
agentic ai apple-silicon client-server genai json-schema llm llm-inference mac mlx tool-calling tools
Last synced: 18 Jan 2025
https://github.com/KVignesh122/AssetNewsSentimentAnalyzer
A sentiment analyzer package for financial assets and securities utilizing GPT models.
commodity-trading financial-analysis forex-trading google-search-api investment-analysis llm-inference news-api sentiment-analysis
Last synced: 02 Nov 2024