Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with llm-inference
A curated list of projects in awesome lists tagged with llm-inference .
https://github.com/picovoice/llm-compression-benchmark
LLM Compression Benchmark
llm llm-compression llm-inference
Last synced: 22 Nov 2024
https://github.com/thefcraft/localgpt
clone of chatgpt usign html css js and flask
chatgpt chatgpt-clone chatgpt-gui flask llm-inference python
Last synced: 12 Jan 2025
https://github.com/firojalam/llamalens
This repository contains the resources, code, and documentation for LlamaLens, a specialized multilingual large language model (LLM) designed to analyze news and social media content effectively. LlamaLens supports multiple languages, including Arabic, English, and Hindi, and is tailored for diverse tasks such as sentiment analysis, misinformation.
arabic downstream-tasks emotion-detection english hindi llm llm-inference llm-training newsmedia sentiment-classification social-media
Last synced: 28 Dec 2024
https://github.com/gurpreetkaurjethra/llms-inference-and-fine-tuning
Estimate Memory Consumption of LLMs Inference and Fine Tuning
fine-tuning generative-ai large-language-models llm-inference llm-training llms memory-allocation
Last synced: 22 Nov 2024
https://github.com/nickpotafiy/illama
A fast, lightweight, parallel inference server for Llama LLMs.
exllama exllamav2 flash-attention-2 inference llama llama2 llama3 llm-inference paged-attention server
Last synced: 10 Oct 2024
https://github.com/giuseppebellamacina/vulnerabilitybot
Vulnerability Bot with Database
cybersecurity-tool llm llm-inference
Last synced: 05 Dec 2024
https://github.com/niansa/libjustlm
Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm
ai cpp17 cpp20 gpt-j llama llama2 llm llm-inference mpt python wrapper-library
Last synced: 13 Jan 2025
https://github.com/williamzebrowski/assistant-api
OpenAI Assistant API integrated with Elasticsearch, Logstash & Kibana
ai chatapp chatgpt conversational-ai data elasticsearch kibana llm-inference llms openai rag
Last synced: 11 Oct 2024
https://github.com/aadit3003/llm-rhyme-eval
English and Dutch rhyming datasets (5k word pairs each) for five types of rhymes. Three open-source LLMs (Llama2, Llama3, CrystalChat) are tested on these datasets, with prompt variation.
llama2 llama3 llm llm-evaluation llm-inference nlp orthography phonology rhyme rhyme-analysis
Last synced: 22 Dec 2024
https://github.com/aadit3003/llm-medical-personas
Examination of whether LLMs can maintain consistency over extended multiple text generation for 10 medical personas. 5 novel plausibility metrics proposed, and an ontology of common LLM errors.
ai bart flant5 gen llama2 llama2-7b llm llm-inference maximal-marginal-relevance medical mmr nlp nlp-medical-records question-answering
Last synced: 22 Dec 2024
https://github.com/sureshbeekhani/ai-quick-summaries
Developed an AI-powered web app using Streamlit and Google Gemini AI for generating concise summaries from PDFs, images, and text files. The app features real-time text summarization, file upload support, and a user-friendly interface.
chatbot gemini gpt image-and-pdf llm llm-inference python streamlit
Last synced: 07 Dec 2024
https://github.com/abhaskumarsinha/corpus2gpt
Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.
attention-mechanism jax keras large-language-models llm llm-inference llm-training python3 pytorch tensorflow
Last synced: 21 Nov 2024
https://github.com/rahulunair/simple_llm_inference
A simple example of LLM inference on Intel GPUs (XPUs)
intel-arc intel-gpu-max intelgpu ipex llm-inference transformers
Last synced: 13 Jan 2025
https://github.com/wtlow003/speculative-sampling
Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"
deepmind llm-inference speculative-decoding speculative-sampling
Last synced: 16 Jan 2025
https://github.com/paulpierre/vllm-docker
test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB
docker docker-compose llama llama3 llm llm-inference llms vllm
Last synced: 20 Jan 2025
https://github.com/johnclaw/chatllm.kt
kotlin api wrapper for llm-inference chatllm.cpp
api-wrapper bindings chatbot chatllm cpu-inference gemma ggml inference kotlin llama llm llm-inference llms mistral quantization qwen
Last synced: 20 Jan 2025
https://github.com/aaashrafhabib/advanced-rag-system-
End To End Advanced Rag Project Using Open Source LLM models and Groq Inferencing
generative-ai langchain llm-inference python rag
Last synced: 15 Jan 2025
https://github.com/mukeshmithrakumar/llm-poc-2024
Popular Large Language Models from scratch - 2024
gpt llama llm llm-inference llm-training transformer
Last synced: 17 Jan 2025
https://github.com/dawid-szewc/perplexity-cli
๐ง A simple command-line client for the Perplexity API. Ask questions and receive answers directly from the terminal! ๐๐๐
ai bash bash-script llm-inference perplexity perplexity-ai perplexity-api python python3 zsh
Last synced: 02 Dec 2024
https://github.com/drake9098/vulnerabilitybot
A client-server structure to make queries and send it to an AI model
cybersecurity llm llm-inference
Last synced: 28 Nov 2024
https://github.com/biosfood/intel-llm-guide
A guide on how to run LLMs on intel CPUs
guide intel llm llm-inference llm-serving machine-learning setup setup-development-environment tutorial
Last synced: 22 Dec 2024
https://github.com/dev-d-gr8/storyscape
A storytelling (generates stories with pictures) generative AI based iOS application based on custom fine tuned LLaMA 3.2 3B-Instruct model on Hindi stories (Provision to generate English stories via call to OpenAI GPT-4o).
app aws django docker generative-ai generative-art ios jenkins llm llm-inference llm-training llmops mobile-development python sagemaker swift swiftui
Last synced: 14 Jan 2025
https://github.com/eternalflame02/single-node-finetuning-of-tiny-llama-using-intel-xeon-spr
The project was undertaken as part of the Intel Unnati Industrial Training program for the year 2024. The primary objective of this project aligns with Problem Statement PS-04: Introduction to GenAI LLM Inference on CPUs and subsequent LLM Model Finetuning for the development of a Custom Chatbot.
intel-unnati llm-finetuning llm-inference python tinyllama
Last synced: 12 Dec 2024
https://github.com/tinybiggames/lumina
Local Generative AI
gen-ai gguf llama-cpp llm-inference local-ai pascal win64 windows-10 windows-11
Last synced: 02 Nov 2024
https://github.com/collab-uniba/irc-setfit-ollama-demo
Issue report classification demo with SetFit and Ollama for NASA's Flight System software repositories
docker issue-management llm-inference ollama-python setfit
Last synced: 10 Jan 2025
https://github.com/sergio11/streamlit_llm_langchain_applications
Explore innovative Language Model applications (LLMs) with Streamlit-based Proof of Concepts (POCs) ๐. These demos showcase open-source models using Groq for cloud-based inference and LangChain for efficient orchestration ๐. From writing assistants to blog post generators, experience AI-driven tools enhancing productivity and creativity ๐๐ก.
chromadb faiss faiss-vector-database groq-ai groq-api langchain langchain-python llama3 llm llm-framework llm-inference llms mistral-7b streamlit tavily
Last synced: 14 Dec 2024
https://github.com/abhaskumarsinha/Corpus2GPT
Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.
attention-mechanism jax keras large-language-models llm llm-inference llm-training python3 pytorch tensorflow
Last synced: 20 Oct 2024
https://github.com/es7/introduction-to-llms
In this repository I have explained the application of Large Language Models (LLMs). Starting from how to use LLMs in our own application till how to build a LLM.
computer-vision deep-learning huggingface llm llm-framework llm-inference llm-training machine-learning natural-language-processing prompt-engineering prompt-learning
Last synced: 11 Jan 2025
https://github.com/niansa/discord_llama
Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama
ai cpp20 discord-bot llama llama2 llamacpp llm llm-inference
Last synced: 13 Jan 2025
https://github.com/projects-mk/chat-with-documents-quickstart
Repository containing code for setting up RAG on your machine. Implemented OpenAI as well as HuggingFace llms and embedding models
huggingface langchain-python langfuse llm llm-inference ollama open-source openai rag
Last synced: 13 Nov 2024
https://github.com/picovoice/serverless-picollm
LLM Inference on AWS Lambda
aws-lambda llm llm-compression llm-inference serverless serverless-inference
Last synced: 22 Nov 2024
https://github.com/johnclaw/chatllm.lua
lua api wrapper for llm-inference chatllm.cpp
api-wrapper bindings chatbot chatllm cpu-inference gemma ggml inference llama llm llm-inference llms lua luajit mistral quantization qwen
Last synced: 20 Jan 2025
https://github.com/johnclaw/chatllm.d
D-lang api wrapper for llm-inference chatllm.cpp
api-wrapper bindings chatbot chatllm cpu-inference d-lang d-language dlang gemma ggml inference llama llm llm-inference llms mistral quantization qwen
Last synced: 20 Jan 2025
https://github.com/johnclaw/chatllm.rs
rust api wrapper for llm-inference chatllm.cpp
api-wrapper bindings chatbot chatllm cpu-inference gemma ggml inference llama llm llm-inference llms mistral quantization qwen rust
Last synced: 20 Jan 2025
https://github.com/ashmadev/react-ollama-ui
Awesome UI for interacting with your local LLMs
ai chatbot llm-inference ollama
Last synced: 19 Nov 2024
https://github.com/richardsonlima/synapsense
SynapSense: Python In-Context Learning for Large Language Models SynapSense is a cutting-edge Python library designed to streamline the implementation of In-Context Learning (ICL) with Large Language Models (LLMs).
ai genai llm llm-agent llm-inference llmops llms
Last synced: 13 Nov 2024
https://github.com/khaledsharif/openrag
openrag = ollama + dspy + chroma
llm-inference rag vector-database
Last synced: 23 Nov 2024
https://github.com/regular-baf/bafchat
Bringing local LLMs to a Minecraft front-end through commands.
ai api gpt llama llm llm-inference minecraft minecraft-mod redpajama
Last synced: 17 Jan 2025
https://github.com/siddhant-k-code/contextify
Python script designed to streamline the process of providing context to Large Language Models (LLMs) from your project files. It's particularly useful when working on coding tasks with LLMs, as it can automatically gather and format relevant code from your project.
context contextify llm llm-inference python
Last synced: 01 Dec 2024
https://github.com/nagababumo/finetune-mistral-using-ludwig-framework
finetuning generative-ai llm llm-inference ludwig mistral
Last synced: 14 Jan 2025
https://github.com/duck4i/hnlp-translate
Translate JSON localizations automatically - Helsinki NLP model
helsinki-nlp llm-inference transformers
Last synced: 24 Dec 2024
https://github.com/kira94-hkz/powerserve
High-speed and easy-use LLM serving framework for local deployment
llama llm llm-inference llm-serving npu qwen smallthinker smartphone
Last synced: 20 Jan 2025
https://github.com/djdhairya/pdf-gpt
Pdf-GPT designed by NVIDIA NIM
api artificial-intelligence chatnvi deep-learning gpu langchain llm llm-inference machine-learning machine-learning-algorithms nvidia pypdf
Last synced: 07 Jan 2025
https://github.com/gregyjames/stocksentllm
Fine tuning an llm to predict stock sentiment based on headlines.
huggingface huggingface-transformers llm llm-inference llm-training python sentiment sentiment-analysis sentiment-classification stocks
Last synced: 03 Dec 2024
https://github.com/elskow/multilang-saas-paraphrasing-tool
Forked version of https://github.com/alfazh123/ParaFaze with a State-of-the-Art of an over engineering :)
llm-inference nextjs page-router paraphrasing-tool python t5-model
Last synced: 12 Jan 2025
https://github.com/thansen0/fastllm.cpp
A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.
Last synced: 22 Dec 2024
https://github.com/superjamie/rocswap
llama.cpp + ROCm + llama-swap
ai ai-inference amd amd-gpu amdgpu llamacpp llamaswap llm llm-inference rocm
Last synced: 22 Dec 2024
https://github.com/hherpa/squidblock-ru-docs
SquidBlock - ะธะฝะฝะพะฒะฐัะธะพะฝะฝะฐั ะฟะปะฐััะพัะผะฐ, ะบะพัะพัะฐั ะพัะบััะฒะฐะตั ะฑะตะทะณัะฐะฝะธัะฝัะต ะฒะพะทะผะพะถะฝะพััะธ ะดะปั ัะฐะทัะฐะฑะพัะบะธ ะธ ะทะฐะฟััะบะฐ ะฑะปะพัะฝะพ-ะผะพะดัะปัะฝัั LLM-ัะธััะตะผ.
agent-oriented-programming agentic-agi ai autogen chat-application chatbot documentation gpt-35-turbo gpt-4 graph llm llm-agents llm-inference llmops node pipeline rag retrieval-augmented-generation
Last synced: 24 Dec 2024
https://github.com/saherpathan/invoicify-ai-cohere
A Flask application that extracts invoice details from uploaded PDFs and images using LLM inference API
cohereapi flask-application llm-inference ocr pdfplumber python3
Last synced: 20 Jan 2025
https://github.com/zvoverman/yt-video-summarizer
Firefox extension that summarizes youtube videos using AI
ai javascript llm-inference longformer-models
Last synced: 24 Nov 2024
https://github.com/nazago/meeting-minutes-generator
Script which takes a .wav audio file, performs speech-to-text using OpenAI/Whisper, and then, using Llama3, summarization and action point from the transcript generated
langchain-python llm-inference local-inference meeting-minutes ollama speech-to-text summarization whisper
Last synced: 02 Jan 2025
https://github.com/vicky87883/urlshortner
UrlShortner map's the larger url's into smaller one. This app is fully designed in python and used postgresql database for mapping url's.
django django-rest-framework large-language-models llm-inference postgresql-database url-parser url-shortener url-shortener-microservice
Last synced: 30 Nov 2024
https://github.com/jkevin2010/llms-for-dementia-detection
Fine-tuning Large Language Models (LLMs) for the early detection of Alzheimer's Disease and Related Dementias (ADRD)
alzheimer-disease-prediction chatgpt dementia-detection finetuning gpt-4 llm-inference machine-learning
Last synced: 07 Dec 2024
https://github.com/siris2314/ytsum
Summarize YT videos in one go using Mixtral
distil-whisper distil-whisper-large-v3 llm-inference llms mixtral-8x7b pypi-package togetherai
Last synced: 12 Oct 2024
https://github.com/abhinav330/msc-project
AI-Powered Chatbot for University Websites This project enhances the usability of university websites by providing an AI-driven chatbot powered by advanced Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG).
chatbot data-science data-visualization finetuning-llms gemma2 llama3 llama3-finetune llm llm-inference mistral-7b nlp ollama phi-3-mini rag research-project
Last synced: 02 Jan 2025
https://github.com/hrolive/large-language-models-on-supercomputers
Comprehensive exploration of LLMs, including cutting-edge techniques and tools such as parameter-efficient fine-tuning (PEFT), quantization, zero redundancy optimizers (ZeRO), fully sharded data parallelism (FSDP), DeepSpeed, and Huggingface accelerate.
deepspeed evaluation-metrics fsdp high-performance-computing hpc huggingface huggingface-transformers jupyter llm llm-inference llm-training monitoring peft python quantization slurm tokenization transformer unsloth
Last synced: 04 Jan 2025
https://github.com/ebowwa/resume-generator
employment hiring llm llm-inference llms resume resume-builder resume-creator
Last synced: 29 Nov 2024
https://github.com/iamaziz/llm-cost-estimator
Estimate the cost of using OpenAI models based on the number of input and output tokens.
cost-estimation llm-inference openai-api
Last synced: 04 Jan 2025
https://github.com/cai991108/machine-learning-and-language-model
This project explores GPT-2 and Llama models through pre-training, fine-tuning, and Chain-of-Thought (CoT) prompting. It includes memory-efficient optimizations (SGD, LoRA, BAdam) and evaluations on math datasets (GSM8K, NumGLUE, StimulEq, SVAMP).
chainofthought finetune-llm gpt2 llama llm llm-inference pretrained-language-model
Last synced: 20 Jan 2025
https://github.com/kristofferv98/agent_nexus
Agentic framework for dynamic function calling across latest LLMs (gpt-4o, gemini-2.0-flash, groq modes, and anthropic models). Converts Python functions into provider-specific schemas for autonomous tool use. Features unified API, JSON schema generation, and integrated tool execution handling.
agent-orchestration agents anthropic function-calling gemini gemini-2-0-flash-exp gemini-tools groq json-schema llm-inference multi-llm openai parallel-processing schema-generation tool-generator tool-integration tools
Last synced: 20 Dec 2024
https://github.com/naveenalla3000/quiz_generator
Ai Quiz generator
generative-ai langchain llm-inference solara
Last synced: 04 Jan 2025
https://github.com/rfdzan/t5-llm-training
creating a workflow to train t5 language models
language-model llm-inference llm-training pytorch
Last synced: 28 Dec 2024
https://github.com/sc0v0ne/explorelargelanguagemodels
Explore Large Language Models
gemini google hugging-face huggingface huggingface-transformers llama llama3 llm llm-inference llms meta ollama ollama-api ollama-client
Last synced: 20 Jan 2025
https://github.com/mohammad-nour-alawad/voice-interpreter-for-data-visualization
Django app for voice interpreter to manipulate and Visualize data.
data-visualization django javascript llm-inference voice-assistant
Last synced: 25 Nov 2024
https://github.com/saritaphd/medical-chatbot-using-llama2
This project is a medical chatbot powered by the open-source Llama 2 model and integrated with Pinecone for efficient vector search. The chatbot is designed to answer user queries based on information extracted from medical documents (PDFs).
chatbot-application generative-ai langchain llama2 llm-inference llm-training pinecone python
Last synced: 22 Dec 2024
https://github.com/omars44/open-assistant-demo
langchain open assistant demo using hugging face Hub (Inference API)
langchain langchain-python llm llm-inference open-assistant
Last synced: 19 Jan 2025
https://github.com/rs-py/howtofinetuneanllm
This is a step by step example of how to quickly finetune an LLM without access to robust hardware using simple text data. For a more in depth format read the article on medium that is linked below.
fine-tuning huggingface huggingface-transformers llama llm llm-inference
Last synced: 18 Jan 2025
https://github.com/t-mohamed-shafeek/llm-for-language-translation
This repository contains a simple and beginner-level noteboook which employs the mBART LLM/model for the translation of English text into Indian Languages.
huggingface-transformers language-tra llm llm-inference mbart mbart50 nlp nmt
Last synced: 14 Dec 2024
https://github.com/thansen0/fast-llm-api
A low latency, fault tolerant API for accessing LLM's.
Last synced: 22 Dec 2024
https://github.com/ripan-roy/mcts-chain-of-thought
This project shows the implementation of Monte Carlo Tree Search with a large language model to generate and evaluate reasoning chains or chain of thoughts for advanced problem-solving.
chain-of-thought large-language-models llama3 llm-evaluation llm-inference llms monte-carlo-tree-search ollama-api python
Last synced: 12 Jan 2025
https://github.com/riolaf05/langchain-fastapi-rag-platform
A platform to test multiple LLM models inside a RAG workflow to choose the best model for embedding and retrieval and the best prompt according to the use case
artificial-intelligence aws cloud iac iac-terraform infrastructure langchain langchain-python llm llm-inference llms python rag serverless terraform
Last synced: 20 Nov 2024
https://github.com/howardchiang2/easy_mode_llm_inference
็ฎๅ็LLama.cppไฝฟ็จ
Last synced: 22 Dec 2024
https://github.com/gali1/ollama-cli-or-webui
This project provides a dual-interface tool for generating text responses using large language models.
cli command-line flask huggingface huggingface-transformers interface interfaces llama llm llm-inference natural-language-processing nlp nlp-parsing rest-api service text-generation transformers transformers-models web
Last synced: 22 Dec 2024
https://github.com/mapluisch/llava-websocket-server
Python-based WebSocket for CLI LLaVA inference.
inference llama llama2 llava llm llm-inference python websocket websockets
Last synced: 12 Jan 2025
https://github.com/duck4i/node-llama
LLM inside your Node.JS
llamacpp llm llm-inference local nodejs
Last synced: 05 Jan 2025
https://github.com/hoehrmann/ietf-cert
Experimental autonomous AI LLM & RAG IETF reviewer
ai autonomous ietf llm-inference quality-assurance retrieval-augmented-generation
Last synced: 05 Dec 2024
https://github.com/pathak-ashutosh/clinical-risk-prediction
Clinical Risk Prediction using EHRs
clinical-data clinical-research fine-tuning healthcare large-language-models llm-inference machine-learning nlp python
Last synced: 02 Jan 2025
https://github.com/karan-parmar-007/blog-ai
Whisper AI is an innovative blogging platform that combines advanced artificial intelligence and cloud technologies to revolutionize the way users create, share, and engage with content. With a range of features designed to enhance the blogging experience, Whisper AI empowers users to express themselves freely while ensuring privacy and security.
artificial-intelligence bootstrap django html-css-javascript langchain large-language-models llama2 llamacpp llm-inference python3 security translation
Last synced: 22 Dec 2024
https://github.com/0xnu/fine_tune_llm_docker
Fine-tune large language models (LLMs) using the Hugging Face Transformers library.
docker llm llm-fine-tuning llm-finetuning llm-inference
Last synced: 15 Dec 2024
https://github.com/andrewkchan/deepseek.cpp
CPU inference for the DeepSeek family of large language models in pure C++
cpp deepseek llama llm llm-inference machine-learning transformers
Last synced: 20 Jan 2025
https://github.com/danielrosehill/the-llm-files
A blog about my adventures and discoveries working with large language models (LLMs).
large-language-models llm-benchmarking llm-inference llms
Last synced: 04 Dec 2024
https://github.com/mrseanryan/gpt-function-calling-bare-bones
Function Calling an LLM taking a bare-bones (no libraries) approach
function-calling llm llm-inference
Last synced: 28 Dec 2024
https://github.com/dluc/ai-cronjobs
A collection of simple cronjobs for macOS, leveraging LLMs.
ai cronjob llm-inference macos
Last synced: 09 Dec 2024
https://github.com/chloelavrat/personal-assistant
Access different personal assistants with HuggingChat and Streamlit
assistant-chat-bots huggingchat llm llm-agent llm-inference streamlit summary
Last synced: 02 Jan 2025
https://github.com/fahmiaziz98/large_language_model
This repository contains my practice in learning llms, specifically BERT, T5, GPT-2
bert fine-tuning gpt-2 huggingface large-language-models llm-inference openai streamlit t5
Last synced: 20 Nov 2024
https://github.com/leozqin/hops
A load-balancing reverse proxy server that enables you to address a fleet of diverse Ollama instances as a single one
llama llm llm-inference load-balancer ollama ollama-api reverse-proxy self-hosted selfhosted
Last synced: 20 Jan 2025
https://github.com/tinybiggames/askllm
LLM API Access for Delphi Delphiโข
ai claude-api llm-inference llms openai-api win64 windows-10 windows-11
Last synced: 05 Dec 2024
https://github.com/tybrucechen/llm-based-web-chat-robot
Web Chat Robot based on LLama3.2-1B Model at Server-side Deployment with Continuous Conversation
audio-transcribing llama3 llm llm-inference text-generation
Last synced: 11 Dec 2024
https://github.com/viratsrivastava/lone.alpha.dev-platform
Developer's Alpha version for Rine Platform Software
artificial-intelligence artificial-intelligence-algorithms artificial-neural-networks dev llm-inference llmops mlops
Last synced: 06 Jan 2025
https://github.com/wtlow003/ngram-decoding
(Re)-implementation of "Prompt Lookup Decoding" by Apoorv Saxena, with extended ideas from LLMA Decoding.
llm-inference n-gram ngram-decoding prompt-lookup-decoding speculative-decoding
Last synced: 16 Jan 2025