Projects in Awesome Lists tagged with llava

https://github.com/ollama/ollama

Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.

gemma gemma2 go golang llama llama2 llama3 llava llm llms mistral ollama phi3

Last synced: 16 Dec 2024

https://github.com/haotian-liu/llava

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

Last synced: 16 Dec 2024

https://github.com/haotian-liu/LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot chatgpt foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava multi-modality multimodal vision-language-model visual-language-learning

Last synced: 25 Oct 2024

https://github.com/sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference llama llama2 llama3 llama3-1 llava llm llm-serving moe pytorch transformer vlm

Last synced: 16 Dec 2024

https://github.com/fanghua-yu/supir

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.

deep-learning diffusion-models llava pytorch pytorch-lightning restoration sdxl stable-diffusion super-resolution

Last synced: 17 Dec 2024

https://github.com/Fanghua-Yu/SUPIR

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.

deep-learning diffusion-models llava pytorch pytorch-lightning restoration sdxl stable-diffusion super-resolution

Last synced: 30 Oct 2024

https://github.com/modelscope/ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)

agent deploy dpo internvl liger llama llama3 llava llm lora megatron minicpm-v modelscope multimodal peft pre-training qwen2 qwen2-vl reflection sft

Last synced: 17 Dec 2024

https://github.com/internlm/xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

agent baichuan chatbot chatglm2 chatglm3 conversational-ai internlm large-language-models llama2 llama3 llava llm llm-training mixtral msagent peft phi3 qwen supervised-finetuning

Last synced: 16 Dec 2024

https://github.com/InternLM/xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

agent baichuan chatbot chatglm2 chatglm3 conversational-ai internlm large-language-models llama2 llama3 llava llm llm-training mixtral msagent peft phi3 qwen supervised-finetuning

Last synced: 28 Oct 2024

https://github.com/modelscope/data-juicer

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch sora streamlit

Last synced: 18 Dec 2024

https://github.com/scisharp/llamasharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel

Last synced: 17 Dec 2024

https://github.com/SciSharp/LLamaSharp

A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

chatbot gpt llama llama-cpp llama2 llama3 llamacpp llava llm multi-modal semantic-kernel

Last synced: 28 Oct 2024

https://github.com/chenking2020/findthechatgpter

ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文本大模型、多模态大模型等，为大家提供一些便利

agi alpaca autogpt baichuan belle ceval chatglm chatgpt codi guanaco learderboard linly llama llama2 llava lora minigpt4 self-instruct vicuna wizadlm

Last synced: 21 Dec 2024

https://github.com/chenking2020/FindTheChatGPTer

ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文本大模型、多模态大模型等，为大家提供一些便利

agi alpaca autogpt baichuan belle ceval chatglm chatgpt codi guanaco learderboard linly llama llama2 llava lora minigpt4 self-instruct vicuna wizadlm

Last synced: 16 Nov 2024

https://github.com/open-compass/vlmevalkit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 21 Dec 2024

https://github.com/mbzuai-oryx/video-chatgpt

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining

Last synced: 19 Dec 2024

https://github.com/open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 28 Nov 2024

https://github.com/mbzuai-oryx/Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining

Last synced: 24 Oct 2024

https://github.com/unum-cloud/uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

bert clip clustering contrastive-learning cross-attention huggingface-transformers image-search language-vision llava multi-lingual multimodal neural-network openai openclip pretrained-models pytorch representation-learning semantic-search transformer vector-search

Last synced: 18 Dec 2024

https://github.com/mbzuai-oryx/llava-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

conversation llama-3-llava llama-3-vision llama3 llama3-llava llama3-vision llava llava-llama3 llava-phi3 llm lmms phi-3-llava phi-3-vision phi3 phi3-llava phi3-vision vision-language

Last synced: 20 Dec 2024

https://github.com/mbzuai-oryx/LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

conversation llama-3-llava llama-3-vision llama3 llama3-llava llama3-vision llava llava-llama3 llava-phi3 llm lmms phi-3-llava phi-3-vision phi3 phi3-llava phi3-vision vision-language

Last synced: 08 Nov 2024

https://github.com/jhc13/taggui

Tag manager and captioner for image datasets

cogvlm florence-2 image-captioning image-tagging llava pyside6 stable-diffusion tag-manager

Last synced: 12 Dec 2024

https://github.com/psychip/machina

OpenCV+YOLO+LLAVA powered video surveillance system

camera llava ollama-api opencv python rtsp yolo

Last synced: 21 Dec 2024

https://github.com/TinyLLaVA/TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

large-multimodal-models llama llava nlp tinyllama transformers vision-language

Last synced: 13 Nov 2024

https://github.com/blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

apple-silicon florence2 idefics llava llm local-ai mlx molmo paligemma pixtral vision-framework vision-language-model vision-transformer

Last synced: 19 Dec 2024

https://github.com/nvlabs/eagle

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia

Last synced: 21 Dec 2024

https://github.com/Blaizzy/mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

apple-silicon florence2 idefics llava llm local-ai mlx molmo paligemma pixtral vision-framework vision-language-model vision-transformer

Last synced: 25 Nov 2024

https://github.com/NVlabs/EAGLE

EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

demo eagle gpt4 huggingface large-language-models llama llama3 llava llm lmm lvlm mllm nvdia

Last synced: 26 Sep 2024

https://github.com/nrl-ai/llama-assistant

AI-powered assistant to help you with your daily tasks, powered by Llama 3.2. It can recognize your voice, process natural language, and perform various actions based on your commands: summarizing text, rephasing sentences, answering questions, writing emails, and more.

llama llama-3-2 llama3 llava moondream owen personal-assistant private-gpt

Last synced: 15 Dec 2024

https://github.com/jakobdylanc/llmcord

A Discord LLM chat bot that supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

bot chatbot chatgpt discord gpt gpt-4 gpt-4o grok groq llama llama3 llava llm lmstudio mistral ollama oobabooga openai vllm xai

Last synced: 15 Dec 2024

https://github.com/apocas/restai

RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex, Ollama and HF Pipelines. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama. Precise embeddings usage and tuning.

embeddings fastapi langchain llama llamaindex llava llm ollama openai openaiapi python rag stable-diffusion transformers

Last synced: 14 Dec 2024

https://github.com/wisconsinaivision/vip-llava

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

chatbot clip cvpr2024 foundation-models gpt-4 gpt-4-vision llama llama2 llava multi-modal vision-language visual-prompting

Last synced: 15 Dec 2024

https://github.com/internlm/internevo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

910b deepspeed-ulysses flash-attention gemma internlm internlm2 llama3 llava llm-framework llm-training multi-modal pipeline-parallelism pytorch ring-attention sequence-parallelism tensor-parallelism transformers-models zero3

Last synced: 14 Dec 2024

https://github.com/InternLM/InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

910b deepspeed-ulysses flash-attention gemma internlm internlm2 llama3 llava llm-framework llm-training multi-modal pipeline-parallelism pytorch ring-attention sequence-parallelism tensor-parallelism transformers-models zero3

Last synced: 30 Oct 2024

https://github.com/vietanhdev/llama-assistant

AI-powered assistant to help you with your daily tasks, powered by Llama 3.2. It can recognize your voice, process natural language, and perform various actions based on your commands: summarizing text, rephasing sentences, answering questions, writing emails, and more.

llama llama-3-2 llama3 llava moondream owen personal-assistant private-gpt

Last synced: 14 Oct 2024

https://github.com/developersdigest/ai-devices

AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more

function-calling gpt-4-vision groq langchain langsmith llama3 llava llm openai serper tts whisper

Last synced: 16 Dec 2024

https://github.com/jakobdylanc/llmcord.py

A Discord LLM chat bot that supports any OpenAI compatible API. Run a local model with ollama, oobabooga, Jan and more

ai bot chatbot chatgpt discord gpt gpt-4 gpt-4o groq litellm llama llama3 llava llm llmcord lmstudio mistral ollama oobabooga openai

Last synced: 10 Oct 2024

https://github.com/FuxiaoLiu/LRV-Instruction?tab=readme-ov-file

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

chatgpt evaluation evaluation-metrics foundation-models gpt gpt-4 hallucination iclr iclr2024 llama llava multimodal object-detection prompt-engineering vicuna vision vision-and-language vqa

Last synced: 01 Nov 2024

https://github.com/gokayfem/comfyui_vlm_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

comfyui custom-nodes image-captioning img2sfx img2text joytag llava llm mllm nodes phi15 siglip vlm

Last synced: 17 Dec 2024

https://github.com/gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

comfyui custom-nodes image-captioning img2sfx img2text joytag llava llm mllm nodes phi15 siglip vlm

Last synced: 22 Nov 2024

https://github.com/mbzuai-oryx/videogpt-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining

Last synced: 20 Dec 2024

https://github.com/mbzuai-oryx/VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining

Last synced: 12 Dec 2024

https://github.com/paddlepaddle/paddlemix

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

aigc blip2 clip controlnet dit eva-clip image-to-text llava minigpt4 multimodal ppdiffusers qwen-vl sd-xl sora stable-diffusion stablevideodiffusion text-to-image text-to-video

Last synced: 20 Dec 2024

https://github.com/rlhf-v/rlaif-v

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

chatbot gpt-4v llava llava-next minicpm-v multimodal rlaif-v vision-language-learning

Last synced: 17 Sep 2024

https://github.com/RLHF-V/RLAIF-V

RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

chatbot gpt-4v llava llava-next minicpm-v multimodal rlaif-v vision-language-learning

Last synced: 03 Nov 2024

https://github.com/gbaptista/ollama-ai

A Ruby gem for interacting with Ollama's API that allows you to run open source AI LLMs (Large Language Models) locally.

ai alpaca bakllava dolphin llama llama2 llava llm mistral mistral-ai mixtral nano-bots ollama ollama-api openorca vicuna

Last synced: 14 Dec 2024

https://github.com/sshh12/multi_token

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

large-context large-language-models large-multimodal-models llava llm multi-modality multimodal vision-language-model

Last synced: 17 Nov 2024

https://github.com/trzy/llava-cpp-server

LLaVA server (llama.cpp).

llama llama2 llava llm multimodal vision-transformer

Last synced: 17 Dec 2024

https://github.com/zjysteven/lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning foundation-models instruction-tuning large-language-model large-multimodal-models llava llava-next multimodal multimodal-large-language-models qwen-vl vision-language visual-instruction-tuning

Last synced: 21 Dec 2024

https://github.com/mgonzs13/llama_ros

llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2

cpp embeddings ggml gguf gpt langchain llama llamacpp llava llavacpp llm rerank reranking ros2 vlm

Last synced: 21 Dec 2024

https://github.com/tianyi-lab/hallusionbench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark benchmarks gpt-4 gpt-4v hallucination large-language-models large-vision-language-models llava llm lmm vlms

Last synced: 09 Oct 2024

https://github.com/aimagelab/llava-more

LLaVA-MORE: Enhancing Visual Instruction Tuning with LLaMA 3.1

llama3 llama3-1 llama3-vision llava llava-llama3 llms multimodal-llms vision-and-language

Last synced: 16 Dec 2024

https://github.com/shikiw/modality-integration-rate

The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".

chatbot gpt-4o large-multimodal-models llama llava multimodal vision-language-learning vision-language-model

Last synced: 17 Dec 2024

https://github.com/thomas-yanxin/karmavlm

🧘🏻‍♂️KarmaVLM (相生)：A family of high efficiency and powerful visual language model.

llama2 llava multimodel qwen2 vision-language-model visual-language-learning vlm

Last synced: 06 Dec 2024

https://github.com/niutrans/vision-llm-alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

alignment dpo llama3-vision llava llm mllm multi-model ppo reward rlhf sft vision

Last synced: 18 Nov 2024

https://github.com/fly-apps/ollama-open-webui

Self-host a ChatGPT-style web interface for Ollama 🦙

ai gemma gpu llama3 llava mistral mixtral ollama ollama-webui

Last synced: 17 Dec 2024

https://github.com/ashleykleynhans/llava-docker

Docker image for LLaVA: Large Language and Vision Assistant

ai chatbot chatgpt docker docker-image foundation-models gpt-4 instruction-tuning llama llama-2 llama2 llava llm multimodal runpod vision-language-model visual-language-learning

Last synced: 25 Nov 2024

https://github.com/buaadreamer/chinese-llava-med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

ai chinese gpt4v huggingface-datasets llama-factory llava medical minigpt4 mllm multimodal qwen1-5 transformers

Last synced: 06 Dec 2024

https://github.com/kwaivgi/uniaa

Unified Multi-modal IAA Baseline and Benchmark

benchmark dataset image-aesthetic-assessment llava mllm

Last synced: 09 Nov 2024

https://github.com/herrera-luis/vision-core-ai

Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.

bakllava llamacpp llava whisper-ai

Last synced: 02 Dec 2024

https://github.com/wisconsinaivision/yollava

🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant

llava llm llms lmm lmms multi-modal-models personalization personalized

Last synced: 10 Nov 2024

https://github.com/mapluisch/llava-cli-with-multiple-images

LLaVA inference with multiple images at once for cross-image analysis.

image-concatenation image-processing inference llama2 llama2-13b llava lmm lmms pillow python python3 pytorch visual-question-answering vqa

Last synced: 13 Nov 2024

https://github.com/paradoxzw/llava-uhd-better

A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo

large-language-models large-multimodal-models llava multimodal

Last synced: 19 Dec 2024

https://github.com/uminosachi/open-llm-webui

This repository contains a web application designed to execute relatively compact, locally-operated Large Language Models (LLMs).

chatbot ggml gradio huggingface language-model llama llama2 llama3 llava llava-llama3 llm nlp transformers

Last synced: 10 Oct 2024

https://github.com/buaadreamer/mllm-finetuning-demo

使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

finetune-llm huggingface-datasets llama-factory llava lora mllm paligemma pretraining supervised-finetuning transformers yi-vl

Last synced: 06 Dec 2024

https://github.com/HyperMink/inferenceable

Scalable AI Inference Server for CPU and GPU with Node.js | Utilizes llama.cpp and parts of llamafile C/C++ core under the hood.

ai inference llama llama3 llava llms nodejs phi3 vision

Last synced: 29 Nov 2024

https://github.com/blib-la/captain

Your all-in-one platform to build and use AI apps effortlessly on your own computer.

ai artificial-intelligence blip booru-tags captioning-images clip dataset-generation datasets generative-ai human-in-the-loop llava llm lora model-training sdk sdxl stable-diffusion toolkit

Last synced: 06 Dec 2024

https://github.com/buaadreamer/spn4cir

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

acmmm2024 blip blip2 clip composed-image-retrieval cross-modal-retrieval data-generation image-retrieval llama llava memory-bank multi-modal-retrieval multimodal-learning transformer

Last synced: 06 Dec 2024

https://github.com/fmxexpress/ai-vision-chat

Chat with large languages models about the contents of an image via this native desktop client for Windows, macOS, and Linux.

ai delphi delphi-sample desktop-app linux llava llm macos replicate-api vicuna vision-language-model windows

Last synced: 15 Oct 2024

https://github.com/ashleykleynhans/supir-docker

Docker image for SUPIR (Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild)

deep-learning diffusion-models docker docker-image llava pytorch pytorch-lightning restoration runpod sdxl stable-diffusion super-resolution upscaling

Last synced: 05 Nov 2024

https://github.com/zjysteven/vlm-visualizer

Visualizing the attention of vision-language models

attention attention-mechanism llava multi-modal vision-language vision-language-model

Last synced: 02 Nov 2024

https://github.com/zhudotexe/kani-vision

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

extension gpt-vision kani large-language-models llava multimodal-llm vision-language-model

Last synced: 27 Oct 2024

https://github.com/xyproto/describeimage

Describe images by using LLMs

command-line-utility describe-image large-language-model llava llm llm-manager ollama ollama-client

Last synced: 09 Nov 2024

https://github.com/iamaziz/chat_with_images

Streamlit app to chat with images using Multi-modal LLMs.

llava llms multimodal-llm streamlit

Last synced: 09 Nov 2024

https://github.com/leo5imon/magi

Meme search engine for the real shitposters, powered by AI & Llava 13b.

ai image-classification llava memes replicate search-engine

Last synced: 18 Dec 2024

https://github.com/piotrbania/ai_image_search

AI assisted image search, checks your images on hard drive and tries to find whether they match the thing you are looking for (this is OFFLINE processing, no data leaves your computer)

ai image-processing imagerecognition llava model nexa

Last synced: 20 Nov 2024

https://github.com/saharmor/monsterbooth

Turn yourself into a Halloween-styled character and get an original roast with the power of AI.

generative-ai gpt4 llava multimodal

Last synced: 13 Nov 2024

https://github.com/cloudmedicio/describe-media-library

Describe images in the WordPress Media Library using a local large language model. Generates titles, captions, descriptions, and alt tags.

accessibility alt-text alt-text-generator apple-silicon computer-vision gpt4 intel-mac linux llama2 llamacpp llava localwp media-library ollama openai plugin search-engine-optimization seo wordpress wp-cli

Last synced: 05 Nov 2024

https://github.com/autodistill/autodistill-llava

LLaVA base model for use with Autodistill.

autodistill computer-vision llava multimodal-llm

Last synced: 08 Nov 2024

https://github.com/ergonomech/ollama-model-interaction

A simple Gradio-based app for interacting with Ollama models, supporting image analysis, text completion, and model pullin

gradio llava ollama ollama-api vision-api

Last synced: 03 Dec 2024

https://github.com/agarzon/ollama-image-caption

caption flux llava ollama stable-diffusion

Last synced: 10 Oct 2024

https://github.com/muhfaridansutariya/llava-1.5-liveness-7b

Resigned Yann-LeCun

gradio llava openai

Last synced: 07 Dec 2024

https://github.com/ugurkantech/archnetai

ArchNetAI is a Python library that leverages the Ollama API for generating AI-powered content.

ai json-schema llama llava ollama ollama-api phi3 python

Last synced: 14 Oct 2024

https://github.com/maheshj01/image-captioning-using-llava-and-llama3

lmage Caption Generator using llava and llama3 through the ollama library

llama3 llava ollama vision

Last synced: 28 Nov 2024

https://github.com/gurpreetkaurjethra/multimodal-ai-app-using-llava-7b

Multimodal AI App using Llava 7B and Gradio

ai generative-ai gradio large-language-models llava llavacpp llm multimodal voice-assistant whisper

Last synced: 22 Nov 2024

https://github.com/ibnaleem/mikael

the open-sourced repository for Mikael, a Discord chatbot trained on Mistral and LLaVA language models

artificial-intelligence chatbot discord-bot discord-py gpt-4 large-language-models llava mistral mistral-7b mistral-ai multimodal multimodal-deep-learning

Last synced: 15 Dec 2024

https://github.com/robert-mcdermott/LLM-Image-Classification

Image Classification Testing with LLMs

image-classification llava llm multimodal

Last synced: 20 Oct 2024

https://github.com/ravi-teja-konda/tunedllavadelights

Explore the rich flavors of Indian desserts with TunedLlavaDelights. Utilizing the in Llava fine-tuning, our project unveils detailed nutritional profiles, taste notes, and optimal consumption times for beloved sweets. Dive into a fusion of AI innovation and culinary tradition

chatgpt dalle2 dessert finetuning gpt4 gpt4v llama2 llava multi-modality multimodal nutrition nutrition-information stable-diffusion tranformers vision-language-learning vision-language-model

Last synced: 15 Nov 2024

https://github.com/rakeshkanneeswaran/luminaide_an_ai_integrated_ide

A personalized IDE developed from scratch, featuring AI integration with models like LLaMA, Mistral, and LLava. Includes a real-time terminal, intuitive file explorer, and Ace Editor to enhance productivity and streamline coding workflows.

express ide llama llama2 llava mistral react websocket

Last synced: 03 Dec 2024

https://github.com/fiqryq/caption-llava

A simple yet effective CLI application built on Node.js, using Ollama Vision LLava for auto generate caption based on your image.

llava nodejs ollama-api

Last synced: 14 Dec 2024

https://github.com/notyusheng/multimodal-large-language-model

Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.

docker large-language-models llava llm multimodal multimodal-large-language-models ollama pretrained sphinx-doc

Last synced: 09 Nov 2024

https://github.com/robert-mcdermott/llm-image-classification

Image Classification Testing with LLMs

image-classification llava llm multimodal

Last synced: 30 Nov 2024

https://github.com/dsacms/mural-ollama

Multimodal LLM Mural Assistant with Ollama

ai llava llm llms multimodal mural ollama open-source pyqt6 python

Last synced: 10 Oct 2024

https://github.com/gbenson/llava-on-ec2

LLaVA on EC2

aws ec2 llama llava terraform

Last synced: 19 Dec 2024

https://github.com/yuanze-lin/olympus

The official code for "Olympus: A Universal Task Router for Computer Vision Tasks"

chatbot chatgpt deeplearning foundation-models instruction-tuning llava llms mllms multi-modality multimodal pytorch vision-language-model

Last synced: 14 Dec 2024

https://github.com/hasnain3142/ai-adsense

AI AdSense is a cutting-edge application designed to detect human presence, analyze their features, and generate personalized advertisements using advanced AI technologies.

llava llm nodejs react

Last synced: 17 Nov 2024