Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Projects in Awesome Lists tagged with llm-inference

A curated list of projects in awesome lists tagged with llm-inference .

https://github.com/ooridata/toolio

GenAI & agent toolkit for Apple Silicon Mac, implementing JSON schema-steered structured output (3SO) and tool-calling in Python. For more on 3SO: https://huggingface.co/blog/ucheog/llm-power-steering

agentic ai apple-silicon client-server genai json-schema llm llm-inference mac mlx tool-calling tools

Last synced: 28 Dec 2024

https://github.com/webgptorg/promptbook

It's time for a paradigm shift! The future of software is in plain English ✨

autogpt llm-inference openai

Last synced: 29 Dec 2024

https://github.com/xtekky/gpt4local

Openai-style, fast & lightweight local language model inference w/ documents

ai chatbot chatbots chatgpt chatgpt-api documents gpt gpt-4 gpt4free language-model llm llm-inference local local-llm openai openai-api python

Last synced: 27 Oct 2024

https://github.com/felladrin/MiniSearch

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

ai artificial-intelligence generative-ai gpu-accelerated information-retrieval llm llm-inference machine-learning nlp question-answering ratchet-ml retrieval-augmented-generation search search-engine searxng typescript web-llm webapp wllama

Last synced: 29 Oct 2024

https://github.com/kddubey/cappr

Completion After Prompt Probability. Make your LLM make a choice

huggingface kv-cache llamacpp llm-inference probability prompt-engineering text-classification zero-shot

Last synced: 29 Dec 2024

https://github.com/picovoice/pico-cookbook

Recipes for on-device voice AI and local LLM

cookbook llm llm-inference local-llm on-device-ai recipes voice-ai voice-assistant

Last synced: 01 Jan 2025

https://github.com/mobile-artificial-intelligence/maid_llm

maid_llm is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)

facebook flutter-ai gemma ggml gguf llama llama2 llamacpp llm llm-inference local-ai meta mistral mixtral mobile-ai

Last synced: 02 Jan 2025

https://github.com/hkust-nlp/dart-math

Official implementation for the paper *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*

deep-learning llm llm-evaluation llm-inference llm-training mathematics nlp

Last synced: 20 Nov 2024

https://github.com/Mobile-Artificial-Intelligence/maid_llm

maid_llm is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)

facebook flutter-ai gemma ggml gguf llama llama2 llamacpp llm llm-inference local-ai meta mistral mixtral mobile-ai

Last synced: 25 Nov 2024

https://github.com/hoshinonyaruko/gensokyo-llm

开源的智能体项目 支持6种聊天平台 Onebotv11一对多连接 流式信息 agent 对话keyboard气泡生成 支持6种大模型接口(持续增加中) 具有将多种大模型接口转化为带有上下文的通用格式的能力.

ai-agents ai-agents-framework chatbot llm llm-api llm-inference onebot onebot-plugin onebot11 qqbot

Last synced: 12 Nov 2024

https://github.com/monk1337/auto-ollama

run ollama & gguf easily with a single command

autogguf autoollama gguf inference llama llm llm-inference lora mergelora mistral ollama openai

Last synced: 24 Nov 2024

https://github.com/Hoshinonyaruko/Gensokyo-llm

开源的智能体项目 支持6种聊天平台 Onebotv11一对多连接 流式信息 agent 对话keyboard气泡生成 支持6种大模型接口(持续增加中) 具有将多种大模型接口转化为带有上下文的通用格式的能力.

ai-agents ai-agents-framework chatbot llm llm-api llm-inference onebot onebot-plugin onebot11 qqbot

Last synced: 28 Oct 2024

https://github.com/mani-kantap/llm-inference-solutions

A collection of all available inference solutions for the LLMs

llm-inference llm-serving llmops

Last synced: 17 Nov 2024

https://github.com/lofcz/llmtornado

One .NET library to consume OpenAI, Anthropic, Cohere, Google, Azure, Groq, and self-hosed APIs.

anthropic-ai chatbot cohere command-r-plus gemini gpt-4v gpt4o groq koboldcpp llm-inference o1 o1-mini o1-preview ollama openai sdk sonnet sonnet3-5

Last synced: 02 Jan 2025

https://github.com/harleyszhang/lite_llama

The llama model inference lite framework by tirton.

llama llama3 llm llm-inference python3 triton-kernels

Last synced: 03 Dec 2024

https://github.com/ai-hypercomputer/jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

attention batching gemma inference llama llama2 llm llm-inference model-serving pytorch tpu

Last synced: 03 Jan 2025

https://github.com/harleyszhang/llm_counts

llm theoretical performance analysis tools and support params, flops, memory and latency analysis.

gpu-performance llama llm llm-inference profiler python3 transformer

Last synced: 21 Dec 2024

https://github.com/ilyasmoutawwakil/py-txi

A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.

embeddings llm-inference

Last synced: 01 Jan 2025

https://github.com/mariochavez/aoororachain

Aoororachain is Ruby chain tool to work with LLMs

artificial-intelligence large-language-models llm llm-inference

Last synced: 14 Nov 2024

https://github.com/opencsgs/llm-inference

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

deepspeed llama-cpp llm-inference ray transformer vllm

Last synced: 07 Nov 2024

https://github.com/Xyntopia/taskyon

Browser based Interface for Generative AI. Chat/Agent/Taskmanager Hybrid.

ai chatbot gpt gpt-4 gui llm llm-agent llm-inference

Last synced: 25 Oct 2024

https://github.com/praful932/llmsearch

Find better generation parameters for your LLM

llm llm-evaluation llm-inference nlp

Last synced: 27 Oct 2024

https://github.com/Praful932/llmsearch

Find better generation parameters for your LLM

llm llm-evaluation llm-inference nlp

Last synced: 08 Nov 2024

https://github.com/phospho-app/fastassert

Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.

docker llm llm-inference outlines vllm

Last synced: 09 Nov 2024

https://github.com/catallo/ht

ht - a shell command that answers your questions about shell commands

ai bash fish-shell gpt linux linux-shell llm llm-inference llms macos macos-shell macosx openai openai-api shell shellcode zsh

Last synced: 02 Nov 2024

https://github.com/pingcap/linguflow

LinguFlow, a low-code tool designed for LLM application development, simplifies the building, debugging, and deployment process for developers.

chatgpt gpt llm-framework llm-inference openai

Last synced: 06 Nov 2024

https://github.com/kyegomez/exa

Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minimal learning curve.

inference-engine llama2 llama2-7b llamacpp llamas llm-inference llms opensource

Last synced: 22 Dec 2024

https://github.com/waltonfuture/Diff-eRank

Code for https://arxiv.org/abs/2401.17139 (NeurIPS 2024)

evaluation-metrics llm llm-inference machine-learning mllm neurips-2024

Last synced: 26 Nov 2024

https://github.com/hscspring/llama.np

Inference Llama/Llama2 Modes in NumPy

llama llama2 llm llm-inference numpy

Last synced: 06 Dec 2024

https://github.com/zrzrzrzrzrzrzr/lm-fly

大模型推理框架加速,让 LLM 飞起来

llm llm-inference mlx openvino tensorrt-llm tgi vllm

Last synced: 28 Dec 2024

https://github.com/monocle2ai/monocle

Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.

generative-ai linux-foundation llm-agent llm-inference llms observability opentelemetry oss python telemetry tracing

Last synced: 20 Dec 2024

https://github.com/jayzhang42/sled

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433

decoding factuality google large-language-models llama llama2 llama3 llm llm-inference meta openai

Last synced: 22 Dec 2024

https://github.com/arcee-ai/arcee-python

The Arcee client for executing domain-adpated language model routines

ai llm llm-inference llm-training llmops

Last synced: 09 Nov 2024

https://github.com/brave-experiments/melt-public

codebase for "MELTing Point: Mobile Evaluation of Language Transformers"

android benchmarks energy-consumption ios jetson llamacpp llm-inference llmfarm mlc-llm

Last synced: 22 Dec 2024

https://github.com/commonroad/drplanner

🩺 : Elevate Your Planner, Perfect Your Motion 🌠

diagnosis-tool llm llm-inference motion-planning repairer

Last synced: 13 Nov 2024

https://github.com/tinybiggames/infero

An easy to use, high performant CUDA powered LLM inference library.

cuda llamacpp llm-inference win64 windows-10 windows-11

Last synced: 10 Oct 2024

https://github.com/hitz-zentroa/this-is-not-a-dataset

We introduce a large semi-automatically generated dataset of ~400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms that we use to evaluate LLMs

benchmark common-sense commonsense decoder huggingface llama llama2 llm llm-inference negation scorer transformer

Last synced: 15 Nov 2024

https://github.com/armbues/SiLLM-examples

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon

apple-silicon dpo large-language-models llm llm-inference llm-training lora mlx

Last synced: 07 Nov 2024

https://github.com/damo-nlp-sg/multipurpose-chatbot

A chatbot UI for RAG, multimodal, text completion. (support Transformers, llama.cpp, MLX, vLLM)

chatbot-application gradio-interface gradio-python-llm llm-inference

Last synced: 13 Nov 2024

https://github.com/notnaton/microllm

My own implementation to run inference on local LLM models

chatgpt llm llm-inference

Last synced: 27 Oct 2024

https://github.com/woheller69/llama_tk_chat

Simple chat interface for local AI using llama-cpp-python and llama-cpp-agent

gui llama-cpp-agent llama-cpp-python llm-inference

Last synced: 07 Nov 2024

https://github.com/build-on-aws/bedrock-agents-infer-models

Use natural language to run inference on various LLMs via Bedrock agents

bedrock generative-ai llm-inference

Last synced: 07 Nov 2024

https://github.com/woheller69/gpt4all-tk-chat

A TK based graphical user interface for gpt4all. It uses the python bindings. Run LLMs in a very slimmer environment and leave maximum resources for inference

ai gpt gpt4all gui-application llm-inference python

Last synced: 07 Nov 2024

https://github.com/allyson-ai/funcmaster

Function Calling LLMs that run locally on device.

llamacpp llm llm-inference python react-native

Last synced: 10 Oct 2024

https://github.com/aigptcode/advanced-prompt-hacking-tester

This code implements an Advanced Prompt Hacking Tester, which allows users to test the responses of an AI system by generating various types of prompts. It includes methods to generate random prompts, contextual adversarial prompts by modifying the original prompts semantically

ai api chatgpt chatgpt-api gemini-api hacking hacking-tool linux llm llm-inference microsoft openai openai-api openai-chatgpt python python3 windows

Last synced: 25 Nov 2024

https://github.com/aidatatools/llm_sentinel

A project (LLM Sentinel) that showcases NVIDIA's NeMo-Guardrails and LangChain for improving LLM safety

llm-inference llms safety

Last synced: 22 Dec 2024

https://github.com/jessonchan/chatalice

ChatAlice is a robust, cross-platform desktop application designed for MacOS, Windows, and Linux operating systems. It features support for API integration with major large language models (LLMs), notably ChatGPT, Claude, and others.

chatgpt-app claude-ai desktop-app llm-inference openai

Last synced: 12 Nov 2024

https://github.com/prithivsakthiur/strangerai

Turning Ideas to Product - StrangerAI - StrangerZone. Recommended to Deploy inside Huggingface Spaces SDK as GRADIO

api chat-application chatbot chatgpt llm-inference open-source openai openapi

Last synced: 17 Dec 2024

https://github.com/tbogdala/sentient_core

A terminal style user interface to chat with AI characters using llama LLMs for locally processed AI.

ai chat-application ggml llama llamacpp llm llm-inference rust terminal-ui

Last synced: 17 Nov 2024

https://github.com/chriamue/chat-flame-backend

ChatFlameBackend is an innovative backend solution for chat applications, leveraging the power of the Candle AI framework with a focus on the Mistral model

backend-api candle huggingface-inference-endpoint llama2 llm-inference mistral phi rust-lang

Last synced: 15 Dec 2024

https://github.com/tipani86/menagerai

Test various open source language models alongside with ChatGPT and compare the differences.

chatgpt llama2 llm llm-inference opensource streamlit

Last synced: 05 Dec 2024

https://github.com/amazon-science/tokenalign

Token Alignment via Character Matching for Subword Completion (ACL Findings 2024)

llm llm-inference

Last synced: 12 Nov 2024

https://github.com/tbogdala/mindmeld

A simple-to-use, open source GUI for local AI chat on desktop and mobile. Powered by llama.cpp.

ai flutter llamacpp llm llm-inference local-llm

Last synced: 22 Dec 2024

https://github.com/shreyansh26/llm-sampling

A collection of various LLM sampling methods implemented in pure Pytorch

llm llm-inference sampling-methods torch transformers

Last synced: 22 Dec 2024

https://github.com/amlana21/llm-stream-publish

How to stream LLM responses using AWS API Gateway Websockets and Lambda

aws devops llm-inference terraform

Last synced: 25 Dec 2024

https://github.com/unifyai/aibench-llm-endpoints

Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub

benchmark endpoints llm llm-inference python

Last synced: 14 Nov 2024

https://github.com/actualwitch/experiment

🔬 Experiment is a LLM chat UI with advanced tool use debugging facilities.

anthropic bun chat experiment-tracking inference isomorphic llm llm-inference openai react

Last synced: 24 Dec 2024

https://github.com/tinybiggames/phippsai

A library for local LLM Interfering using Ollama to build AI tools and agents in Delphi.

library llm-inference local-inference ollama ollama-api win64 windows-10 windows-11

Last synced: 05 Dec 2024

https://github.com/dwyl/rag-elixir-doc

Livebook to run a Phoenix_LiveView documentation Retrieval-Augmented Generation (RAG) enhanced LLM

cross-encoder elixir embeddings livebook llm-inference rag retrieval-augmented-generation sbert

Last synced: 12 Oct 2024

https://github.com/hrolive/poland-end-to-end-llm-bootcamp

This bootcamp is designed to give NLP researchers an end-to-end overview on the fundamentals of NVIDIA NeMo framework, complete solution for building large language models. It will also have hands-on exercises complimented by tutorials, code snippets, and presentations to help researchers kick-start with NeMo LLM Service and Guardrails.

gpt llama2 llm llm-inference llm-training nemo-guardrails nvidia nvidia-nemo p-tuning prompt-tuning tensorrt triton

Last synced: 09 Nov 2024

https://github.com/nickpotafiy/illama

A fast, lightweight, parallel inference server for Llama LLMs.

exllama exllamav2 flash-attention-2 inference llama llama2 llama3 llm-inference paged-attention server

Last synced: 10 Oct 2024

https://github.com/giuseppebellamacina/vulnerabilitybot

Vulnerability Bot with Database

cybersecurity-tool llm llm-inference

Last synced: 05 Dec 2024

https://github.com/muhtasham/simulator

🚀 A high-performance simulator for LLM inference optimization, modeling compute-bound prefill and memory-bound decode phases. Explore batching strategies, analyze throughput-latency trade-offs, and optimize inference deployments without real model overhead.

llm-inference

Last synced: 14 Dec 2024

https://github.com/amanpriyanshu/api-llm-hub

A static-page vanilla-js interface for various LLM APIs (OpenAI, Claude, Gemini, Together).

anthropic anthropic-claude claude claude-ai gemini gemini-api gpt gpt-3 gpt-4 javascript llm llm-inference llms openai openai-api package togetherai vanilla-javascript vanilla-js

Last synced: 28 Oct 2024

https://github.com/firojalam/llamalens

This repository contains the resources, code, and documentation for LlamaLens, a specialized multilingual large language model (LLM) designed to analyze news and social media content effectively. LlamaLens supports multiple languages, including Arabic, English, and Hindi, and is tailored for diverse tasks such as sentiment analysis, misinformation.

arabic downstream-tasks emotion-detection english hindi llm llm-inference llm-training newsmedia sentiment-classification social-media

Last synced: 28 Dec 2024

https://github.com/bsenst/llm-enhanced-ehr

Contribution to the LabLabAI AI Challenge Hackathon October 2023

ehr-notes langchain-python llm-inference streamlit-ui

Last synced: 19 Oct 2024

https://github.com/nptt9/illama

A fast, lightweight, parallel inference server for Llama LLMs.

exllama exllamav2 flash-attention-2 inference llama llama2 llama3 llm-inference paged-attention server

Last synced: 27 Oct 2024

https://github.com/CentML/llm-inference-bench

Lightweight and extensible LLM Inference serving benchmark tool written in Rust.

benchmarking llm-inference llm-serving

Last synced: 10 Nov 2024

https://github.com/shekharp1536/ollama-web

Ollama Web UI is a simple yet powerful web-based interface for interacting with large language models. It offers chat history, voice commands, voice output, model download and management, conversation saving, terminal access, multi-model chat, and more—all in one streamlined platform.

llama llama-cpp llama3 llm-inference ollama ollama-app ollama-chat ollama-client ollama-gui ollama-interface ollama-python ollama-ui ollama-webui python-llm-integration

Last synced: 22 Dec 2024

https://github.com/niansa/libjustlm

Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm

ai cpp17 cpp20 gpt-j llama llama2 llm llm-inference mpt python wrapper-library

Last synced: 13 Nov 2024

https://github.com/aigptcode/ai-battle-llama3-vs-qwen2

Welcome to our AI Battle! Ask a question and let our two AI models battle it out

ai android api llama3 llama3-meta-ai llm llm-inference qwen qwen2 windows

Last synced: 25 Nov 2024

https://github.com/atelierarith/docstringtranslationexobackend.jl

Translate Julia's docstrings using `exo`: Run your own AI cluster at home with everyday devices

exo julia julialang llm-inference

Last synced: 03 Dec 2024

https://github.com/kaust-generative-ai/local-deployment-of-generative-ai-models

Training materials on how to deploy generative AI models locally on your laptop or workstation.

ai carpentries-incubator deployment english generative-ai lesson llama-cpp llamafile llm-inference ollama pre-alpha python

Last synced: 17 Dec 2024

https://github.com/amajji/llm-rag-chatbot-with-langchain

Development and deployment of a question-answer LLM model using Llama2 with 7B parameters and RAG with LangChain

ai chatbot chatbot-application cpu db inference langchain llama-index llama2 llm llm-inference question-answering rag retrieval-augmented-generation streamlit streamlit-webapp vector-database

Last synced: 22 Dec 2024

https://github.com/sureshbeekhani/ai-quick-summaries

Developed an AI-powered web app using Streamlit and Google Gemini AI for generating concise summaries from PDFs, images, and text files. The app features real-time text summarization, file upload support, and a user-friendly interface.

chatbot gemini gpt image-and-pdf llm llm-inference python streamlit

Last synced: 07 Dec 2024

https://github.com/muhammad-fiaz/emsugi

EMSUGI is a future prediction & analysis project on various factor like flood, earth quake, disease occurred on your neighborhood location.

ai emergency-management-system flask flask-application gemini gemini-ai gemini-api gemini-client genai huggingface langchain large-language-models llm-inference llms open-source open-source-project opensource python python3 transformers

Last synced: 12 Nov 2024