Projects in Awesome Lists tagged with llm-inference

https://github.com/picovoice/llm-compression-benchmark

LLM Compression Benchmark

llm llm-compression llm-inference

Last synced: 22 Nov 2024

https://github.com/thefcraft/localgpt

clone of chatgpt usign html css js and flask

chatgpt chatgpt-clone chatgpt-gui flask llm-inference python

Last synced: 12 Jan 2025

https://github.com/firojalam/llamalens

This repository contains the resources, code, and documentation for LlamaLens, a specialized multilingual large language model (LLM) designed to analyze news and social media content effectively. LlamaLens supports multiple languages, including Arabic, English, and Hindi, and is tailored for diverse tasks such as sentiment analysis, misinformation.

arabic downstream-tasks emotion-detection english hindi llm llm-inference llm-training newsmedia sentiment-classification social-media

Last synced: 28 Dec 2024

https://github.com/gurpreetkaurjethra/llms-inference-and-fine-tuning

Estimate Memory Consumption of LLMs Inference and Fine Tuning

fine-tuning generative-ai large-language-models llm-inference llm-training llms memory-allocation

Last synced: 22 Nov 2024

https://github.com/nickpotafiy/illama

A fast, lightweight, parallel inference server for Llama LLMs.

exllama exllamav2 flash-attention-2 inference llama llama2 llama3 llm-inference paged-attention server

Last synced: 10 Oct 2024

https://github.com/giuseppebellamacina/vulnerabilitybot

Vulnerability Bot with Database

cybersecurity-tool llm llm-inference

Last synced: 05 Dec 2024

https://github.com/niansa/libjustlm

Super easy to use library for doing LLaMA/GPT-J stuff! - Mirror of: https://gitlab.com/niansa/libjustlm

ai cpp17 cpp20 gpt-j llama llama2 llm llm-inference mpt python wrapper-library

Last synced: 13 Jan 2025

https://github.com/williamzebrowski/assistant-api

OpenAI Assistant API integrated with Elasticsearch, Logstash & Kibana

ai chatapp chatgpt conversational-ai data elasticsearch kibana llm-inference llms openai rag

Last synced: 11 Oct 2024

https://github.com/aadit3003/llm-rhyme-eval

English and Dutch rhyming datasets (5k word pairs each) for five types of rhymes. Three open-source LLMs (Llama2, Llama3, CrystalChat) are tested on these datasets, with prompt variation.

llama2 llama3 llm llm-evaluation llm-inference nlp orthography phonology rhyme rhyme-analysis

Last synced: 22 Dec 2024

https://github.com/aadit3003/llm-medical-personas

Examination of whether LLMs can maintain consistency over extended multiple text generation for 10 medical personas. 5 novel plausibility metrics proposed, and an ontology of common LLM errors.

ai bart flant5 gen llama2 llama2-7b llm llm-inference maximal-marginal-relevance medical mmr nlp nlp-medical-records question-answering

Last synced: 22 Dec 2024

https://github.com/sureshbeekhani/ai-quick-summaries

Developed an AI-powered web app using Streamlit and Google Gemini AI for generating concise summaries from PDFs, images, and text files. The app features real-time text summarization, file upload support, and a user-friendly interface.

chatbot gemini gpt image-and-pdf llm llm-inference python streamlit

Last synced: 07 Dec 2024

https://github.com/abhaskumarsinha/corpus2gpt

Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.

attention-mechanism jax keras large-language-models llm llm-inference llm-training python3 pytorch tensorflow

Last synced: 21 Nov 2024

https://github.com/rahulunair/simple_llm_inference

A simple example of LLM inference on Intel GPUs (XPUs)

intel-arc intel-gpu-max intelgpu ipex llm-inference transformers

Last synced: 13 Jan 2025

https://github.com/wtlow003/speculative-sampling

Implementation of Speculative Sampling in "Accelerating Large Language Model Decoding with Speculative Sampling"

deepmind llm-inference speculative-decoding speculative-sampling

Last synced: 16 Jan 2025

https://github.com/paulpierre/vllm-docker

test Llama-3.2-11B-Vision-Instruct 4-bit quant quickly on an a100 40GB

docker docker-compose llama llama3 llm llm-inference llms vllm

Last synced: 20 Jan 2025

https://github.com/johnclaw/chatllm.kt

kotlin api wrapper for llm-inference chatllm.cpp

api-wrapper bindings chatbot chatllm cpu-inference gemma ggml inference kotlin llama llm llm-inference llms mistral quantization qwen

Last synced: 20 Jan 2025

https://github.com/aaashrafhabib/advanced-rag-system-

End To End Advanced Rag Project Using Open Source LLM models and Groq Inferencing

generative-ai langchain llm-inference python rag

Last synced: 15 Jan 2025

https://github.com/mukeshmithrakumar/llm-poc-2024

Popular Large Language Models from scratch - 2024

gpt llama llm llm-inference llm-training transformer

Last synced: 17 Jan 2025

https://github.com/dawid-szewc/perplexity-cli

🧠 A simple command-line client for the Perplexity API. Ask questions and receive answers directly from the terminal! 🚀🚀🚀

ai bash bash-script llm-inference perplexity perplexity-ai perplexity-api python python3 zsh

Last synced: 02 Dec 2024

https://github.com/drake9098/vulnerabilitybot

A client-server structure to make queries and send it to an AI model

cybersecurity llm llm-inference

Last synced: 28 Nov 2024

https://github.com/biosfood/intel-llm-guide

A guide on how to run LLMs on intel CPUs

guide intel llm llm-inference llm-serving machine-learning setup setup-development-environment tutorial

Last synced: 22 Dec 2024

https://github.com/dev-d-gr8/storyscape

A storytelling (generates stories with pictures) generative AI based iOS application based on custom fine tuned LLaMA 3.2 3B-Instruct model on Hindi stories (Provision to generate English stories via call to OpenAI GPT-4o).

app aws django docker generative-ai generative-art ios jenkins llm llm-inference llm-training llmops mobile-development python sagemaker swift swiftui

Last synced: 14 Jan 2025

https://github.com/eternalflame02/single-node-finetuning-of-tiny-llama-using-intel-xeon-spr

The project was undertaken as part of the Intel Unnati Industrial Training program for the year 2024. The primary objective of this project aligns with Problem Statement PS-04: Introduction to GenAI LLM Inference on CPUs and subsequent LLM Model Finetuning for the development of a Custom Chatbot.

intel-unnati llm-finetuning llm-inference python tinyllama

Last synced: 12 Dec 2024

https://github.com/tinybiggames/lumina

Local Generative AI

gen-ai gguf llama-cpp llm-inference local-ai pascal win64 windows-10 windows-11

Last synced: 02 Nov 2024

https://github.com/collab-uniba/irc-setfit-ollama-demo

Issue report classification demo with SetFit and Ollama for NASA's Flight System software repositories

docker issue-management llm-inference ollama-python setfit

Last synced: 10 Jan 2025

https://github.com/sergio11/streamlit_llm_langchain_applications

Explore innovative Language Model applications (LLMs) with Streamlit-based Proof of Concepts (POCs) 🚀. These demos showcase open-source models using Groq for cloud-based inference and LangChain for efficient orchestration 🌐. From writing assistants to blog post generators, experience AI-driven tools enhancing productivity and creativity 📚💡.

chromadb faiss faiss-vector-database groq-ai groq-api langchain langchain-python llama3 llm llm-framework llm-inference llms mistral-7b streamlit tavily

Last synced: 14 Dec 2024

https://github.com/abhaskumarsinha/Corpus2GPT

Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.

attention-mechanism jax keras large-language-models llm llm-inference llm-training python3 pytorch tensorflow

Last synced: 20 Oct 2024

https://github.com/es7/introduction-to-llms

In this repository I have explained the application of Large Language Models (LLMs). Starting from how to use LLMs in our own application till how to build a LLM.

computer-vision deep-learning huggingface llm llm-framework llm-inference llm-training machine-learning natural-language-processing prompt-engineering prompt-learning

Last synced: 11 Jan 2025

https://github.com/niansa/discord_llama

Multi-Model and multi-tasking llama Discord Bot - Mirror of: https://gitlab.com/niansa/discord_llama

ai cpp20 discord-bot llama llama2 llamacpp llm llm-inference

Last synced: 13 Jan 2025

https://github.com/projects-mk/chat-with-documents-quickstart

Repository containing code for setting up RAG on your machine. Implemented OpenAI as well as HuggingFace llms and embedding models

huggingface langchain-python langfuse llm llm-inference ollama open-source openai rag

Last synced: 13 Nov 2024

https://github.com/picovoice/serverless-picollm

LLM Inference on AWS Lambda

aws-lambda llm llm-compression llm-inference serverless serverless-inference

Last synced: 22 Nov 2024

https://github.com/johnclaw/chatllm.lua

lua api wrapper for llm-inference chatllm.cpp

api-wrapper bindings chatbot chatllm cpu-inference gemma ggml inference llama llm llm-inference llms lua luajit mistral quantization qwen

Last synced: 20 Jan 2025

https://github.com/johnclaw/chatllm.d

D-lang api wrapper for llm-inference chatllm.cpp

api-wrapper bindings chatbot chatllm cpu-inference d-lang d-language dlang gemma ggml inference llama llm llm-inference llms mistral quantization qwen

Last synced: 20 Jan 2025

https://github.com/johnclaw/chatllm.rs

rust api wrapper for llm-inference chatllm.cpp

api-wrapper bindings chatbot chatllm cpu-inference gemma ggml inference llama llm llm-inference llms mistral quantization qwen rust

Last synced: 20 Jan 2025

https://github.com/ashmadev/react-ollama-ui

Awesome UI for interacting with your local LLMs

ai chatbot llm-inference ollama

Last synced: 19 Nov 2024

https://github.com/richardsonlima/synapsense

SynapSense: Python In-Context Learning for Large Language Models SynapSense is a cutting-edge Python library designed to streamline the implementation of In-Context Learning (ICL) with Large Language Models (LLMs).

ai genai llm llm-agent llm-inference llmops llms

Last synced: 13 Nov 2024

https://github.com/khaledsharif/openrag

openrag = ollama + dspy + chroma

llm-inference rag vector-database

Last synced: 23 Nov 2024

https://github.com/alexlnkp/remi

A basic LLM chatbot named Remi

chatbot llm-inference

Last synced: 19 Dec 2024

https://github.com/regular-baf/bafchat

Bringing local LLMs to a Minecraft front-end through commands.

ai api gpt llama llm llm-inference minecraft minecraft-mod redpajama

Last synced: 17 Jan 2025

https://github.com/siddhant-k-code/contextify

Python script designed to streamline the process of providing context to Large Language Models (LLMs) from your project files. It's particularly useful when working on coding tasks with LLMs, as it can automatically gather and format relevant code from your project.

context contextify llm llm-inference python

Last synced: 01 Dec 2024

https://github.com/nagababumo/finetune-mistral-using-ludwig-framework

finetuning generative-ai llm llm-inference ludwig mistral

Last synced: 14 Jan 2025

https://github.com/duck4i/hnlp-translate

Translate JSON localizations automatically - Helsinki NLP model

helsinki-nlp llm-inference transformers

Last synced: 24 Dec 2024

https://github.com/kira94-hkz/powerserve

High-speed and easy-use LLM serving framework for local deployment

llama llm llm-inference llm-serving npu qwen smallthinker smartphone

Last synced: 20 Jan 2025

https://github.com/djdhairya/pdf-gpt

Pdf-GPT designed by NVIDIA NIM

api artificial-intelligence chatnvi deep-learning gpu langchain llm llm-inference machine-learning machine-learning-algorithms nvidia pypdf

Last synced: 07 Jan 2025

https://github.com/gregyjames/stocksentllm

Fine tuning an llm to predict stock sentiment based on headlines.

huggingface huggingface-transformers llm llm-inference llm-training python sentiment sentiment-analysis sentiment-classification stocks

Last synced: 03 Dec 2024

https://github.com/elskow/multilang-saas-paraphrasing-tool

Forked version of https://github.com/alfazh123/ParaFaze with a State-of-the-Art of an over engineering :)

llm-inference nextjs page-router paraphrasing-tool python t5-model

Last synced: 12 Jan 2025

https://github.com/thansen0/fastllm.cpp

A low latency, fault tolerant API for accessing LLM's written in C++ using llama.cpp.

llamacpp llm llm-inference

Last synced: 22 Dec 2024

https://github.com/superjamie/rocswap

llama.cpp + ROCm + llama-swap

ai ai-inference amd amd-gpu amdgpu llamacpp llamaswap llm llm-inference rocm

Last synced: 22 Dec 2024

https://github.com/hherpa/squidblock-ru-docs

SquidBlock - инновационная платформа, которая открывает безграничные возможности для разработки и запуска блочно-модульных LLM-систем.

agent-oriented-programming agentic-agi ai autogen chat-application chatbot documentation gpt-35-turbo gpt-4 graph llm llm-agents llm-inference llmops node pipeline rag retrieval-augmented-generation

Last synced: 24 Dec 2024

https://github.com/saherpathan/invoicify-ai-cohere

A Flask application that extracts invoice details from uploaded PDFs and images using LLM inference API

cohereapi flask-application llm-inference ocr pdfplumber python3

Last synced: 20 Jan 2025

https://github.com/zvoverman/yt-video-summarizer

Firefox extension that summarizes youtube videos using AI

ai javascript llm-inference longformer-models

Last synced: 24 Nov 2024

https://github.com/nazago/meeting-minutes-generator

Script which takes a .wav audio file, performs speech-to-text using OpenAI/Whisper, and then, using Llama3, summarization and action point from the transcript generated

langchain-python llm-inference local-inference meeting-minutes ollama speech-to-text summarization whisper

Last synced: 02 Jan 2025

https://github.com/vicky87883/urlshortner

UrlShortner map's the larger url's into smaller one. This app is fully designed in python and used postgresql database for mapping url's.

django django-rest-framework large-language-models llm-inference postgresql-database url-parser url-shortener url-shortener-microservice

Last synced: 30 Nov 2024

https://github.com/jkevin2010/llms-for-dementia-detection

Fine-tuning Large Language Models (LLMs) for the early detection of Alzheimer's Disease and Related Dementias (ADRD)

alzheimer-disease-prediction chatgpt dementia-detection finetuning gpt-4 llm-inference machine-learning

Last synced: 07 Dec 2024

https://github.com/siris2314/ytsum

Summarize YT videos in one go using Mixtral

distil-whisper distil-whisper-large-v3 llm-inference llms mixtral-8x7b pypi-package togetherai

Last synced: 12 Oct 2024

https://github.com/abhinav330/msc-project

AI-Powered Chatbot for University Websites This project enhances the usability of university websites by providing an AI-driven chatbot powered by advanced Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG).

chatbot data-science data-visualization finetuning-llms gemma2 llama3 llama3-finetune llm llm-inference mistral-7b nlp ollama phi-3-mini rag research-project

Last synced: 02 Jan 2025

https://github.com/hrolive/large-language-models-on-supercomputers

Comprehensive exploration of LLMs, including cutting-edge techniques and tools such as parameter-efficient fine-tuning (PEFT), quantization, zero redundancy optimizers (ZeRO), fully sharded data parallelism (FSDP), DeepSpeed, and Huggingface accelerate.

deepspeed evaluation-metrics fsdp high-performance-computing hpc huggingface huggingface-transformers jupyter llm llm-inference llm-training monitoring peft python quantization slurm tokenization transformer unsloth

Last synced: 04 Jan 2025

https://github.com/howardchiang2/vllm

在colab上快速验证vllm

llm llm-inference

Last synced: 13 Jan 2025

https://github.com/ebowwa/resume-generator

employment hiring llm llm-inference llms resume resume-builder resume-creator

Last synced: 29 Nov 2024

https://github.com/iamaziz/llm-cost-estimator

Estimate the cost of using OpenAI models based on the number of input and output tokens.

cost-estimation llm-inference openai-api

Last synced: 04 Jan 2025

https://github.com/cai991108/machine-learning-and-language-model

This project explores GPT-2 and Llama models through pre-training, fine-tuning, and Chain-of-Thought (CoT) prompting. It includes memory-efficient optimizations (SGD, LoRA, BAdam) and evaluations on math datasets (GSM8K, NumGLUE, StimulEq, SVAMP).

chainofthought finetune-llm gpt2 llama llm llm-inference pretrained-language-model

Last synced: 20 Jan 2025

https://github.com/kristofferv98/agent_nexus

Agentic framework for dynamic function calling across latest LLMs (gpt-4o, gemini-2.0-flash, groq modes, and anthropic models). Converts Python functions into provider-specific schemas for autonomous tool use. Features unified API, JSON schema generation, and integrated tool execution handling.

agent-orchestration agents anthropic function-calling gemini gemini-2-0-flash-exp gemini-tools groq json-schema llm-inference multi-llm openai parallel-processing schema-generation tool-generator tool-integration tools

Last synced: 20 Dec 2024

https://github.com/naveenalla3000/quiz_generator

Ai Quiz generator

generative-ai langchain llm-inference solara

Last synced: 04 Jan 2025

https://github.com/rfdzan/t5-llm-training

creating a workflow to train t5 language models

language-model llm-inference llm-training pytorch

Last synced: 28 Dec 2024

https://github.com/sc0v0ne/explorelargelanguagemodels

Explore Large Language Models

gemini google hugging-face huggingface huggingface-transformers llama llama3 llm llm-inference llms meta ollama ollama-api ollama-client

Last synced: 20 Jan 2025

https://github.com/mohammad-nour-alawad/voice-interpreter-for-data-visualization

Django app for voice interpreter to manipulate and Visualize data.

data-visualization django javascript llm-inference voice-assistant

Last synced: 25 Nov 2024

https://github.com/saritaphd/medical-chatbot-using-llama2

This project is a medical chatbot powered by the open-source Llama 2 model and integrated with Pinecone for efficient vector search. The chatbot is designed to answer user queries based on information extracted from medical documents (PDFs).

chatbot-application generative-ai langchain llama2 llm-inference llm-training pinecone python

Last synced: 22 Dec 2024

https://github.com/omars44/open-assistant-demo

langchain open assistant demo using hugging face Hub (Inference API)

langchain langchain-python llm llm-inference open-assistant

Last synced: 19 Jan 2025

https://github.com/rs-py/howtofinetuneanllm

This is a step by step example of how to quickly finetune an LLM without access to robust hardware using simple text data. For a more in depth format read the article on medium that is linked below.

fine-tuning huggingface huggingface-transformers llama llm llm-inference

Last synced: 18 Jan 2025

https://github.com/t-mohamed-shafeek/llm-for-language-translation

This repository contains a simple and beginner-level noteboook which employs the mBART LLM/model for the translation of English text into Indian Languages.

huggingface-transformers language-tra llm llm-inference mbart mbart50 nlp nmt

Last synced: 14 Dec 2024

https://github.com/thansen0/fast-llm-api

A low latency, fault tolerant API for accessing LLM's.

llm llm-inference

Last synced: 22 Dec 2024

https://github.com/ripan-roy/mcts-chain-of-thought

This project shows the implementation of Monte Carlo Tree Search with a large language model to generate and evaluate reasoning chains or chain of thoughts for advanced problem-solving.

chain-of-thought large-language-models llama3 llm-evaluation llm-inference llms monte-carlo-tree-search ollama-api python

Last synced: 12 Jan 2025

https://github.com/riolaf05/langchain-fastapi-rag-platform

A platform to test multiple LLM models inside a RAG workflow to choose the best model for embedding and retrieval and the best prompt according to the use case

artificial-intelligence aws cloud iac iac-terraform infrastructure langchain langchain-python llm llm-inference llms python rag serverless terraform

Last synced: 20 Nov 2024

https://github.com/howardchiang2/easy_mode_llm_inference

简单的LLama.cpp使用

llama2 llm llm-inference

Last synced: 22 Dec 2024

https://github.com/gali1/ollama-cli-or-webui

This project provides a dual-interface tool for generating text responses using large language models.

cli command-line flask huggingface huggingface-transformers interface interfaces llama llm llm-inference natural-language-processing nlp nlp-parsing rest-api service text-generation transformers transformers-models web

Last synced: 22 Dec 2024

https://github.com/mapluisch/llava-websocket-server

Python-based WebSocket for CLI LLaVA inference.

inference llama llama2 llava llm llm-inference python websocket websockets

Last synced: 12 Jan 2025

https://github.com/duck4i/node-llama

LLM inside your Node.JS

llamacpp llm llm-inference local nodejs

Last synced: 05 Jan 2025

https://github.com/hoehrmann/ietf-cert

Experimental autonomous AI LLM & RAG IETF reviewer

ai autonomous ietf llm-inference quality-assurance retrieval-augmented-generation

Last synced: 05 Dec 2024

https://github.com/pathak-ashutosh/clinical-risk-prediction

Clinical Risk Prediction using EHRs

clinical-data clinical-research fine-tuning healthcare large-language-models llm-inference machine-learning nlp python

Last synced: 02 Jan 2025

https://github.com/karan-parmar-007/blog-ai

Whisper AI is an innovative blogging platform that combines advanced artificial intelligence and cloud technologies to revolutionize the way users create, share, and engage with content. With a range of features designed to enhance the blogging experience, Whisper AI empowers users to express themselves freely while ensuring privacy and security.

artificial-intelligence bootstrap django html-css-javascript langchain large-language-models llama2 llamacpp llm-inference python3 security translation