Projects in Awesome Lists tagged with clip

https://github.com/mikel-brostrom/boxmot

BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models

boosttrack botsort bytetrack clip deep-learning deepocsort improvedassociation machine-learning mot mots multi-object-tracking multi-object-tracking-segmentation ocsort oriented-bounding-box-tracking osnet segmentation strongsort tensorrt tracking-by-detection yolo

Last synced: 24 Dec 2025

https://github.com/cvhub520/x-anylabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

annotation-tool classification clip deep-learning deeplearning depth-estimation grounding-dino image-segmentation labeling-tool llm matting object-detection onnx paddle pose-estimation pytorch resnet sam vlm yolo

Last synced: 13 May 2025

https://github.com/ofa-sys/chinese-clip

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language

Last synced: 29 Apr 2025

https://github.com/OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

chinese clip computer-vision contrastive-loss coreml-models deep-learning image-text-retrieval multi-modal multi-modal-learning nlp pretrained-models pytorch transformers vision-and-language-pre-training vision-language

Last synced: 02 Apr 2025

https://github.com/marqo-ai/marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

chatgpt clip deep-learning gpt hacktoberfest hnsw information-retrieval knn large-language-models machine-learning machinelearning multi-modal natural-language-processing search-engine semantic-search tensor-search transformers vector-search vision-language visual-search

Last synced: 07 Jan 2026

https://github.com/easychen/pushdeer

开放源码的无App推送服务，iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备

app clip notification-service push

Last synced: 12 Apr 2025

https://github.com/CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

clip deep-learning deeplearning labeling-tool llm onnx paddle pytorch resnet sam yolo

Last synced: 20 Mar 2025

https://github.com/open-mmlab/mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

beit clip constrastive-learning convnext deep-learning image-classification mae masked-image-modeling mobilenet moco multimodal pretrained-models pytorch resnet self-supervised-learning swin-transformer vision-transformer

Last synced: 24 Dec 2025

https://github.com/yuanzhoulvpi2017/zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

bert chatglm-6b clip gpt gpt2 huggingface-transformers llama llama2 llava nlp pytorch text-generation transformers

Last synced: 14 May 2025

https://github.com/jingyi0000/vlm_survey

Collection of AWESOME vision-language models for vision tasks

clip computer-vision deep-learning knowledge-distillation multi-modal-model survey transfer-learning vision-language-model

Last synced: 14 Oct 2025

https://github.com/pharmapsychotic/clip-interrogator

Image to prompt with BLIP and CLIP

clip pytorch

Last synced: 14 May 2025

https://github.com/rom1504/clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

ai clip deep-learning knn multimodal semantic-search

Last synced: 14 May 2025

https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fknn5.laion.ai&index=laion5B&useMclip=false

Easily compute clip embeddings and build a clip retrieval system with them

ai clip deep-learning knn multimodal semantic-search

Last synced: 08 May 2025

https://github.com/open-compass/vlmevalkit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 13 May 2025

https://github.com/cambrian-mllm/cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

chatbot clip computer-vision dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning

Last synced: 14 May 2025

https://github.com/qin2dim/hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with multimodal large language model.

agent ai-agents captcha captcha-solver captcha-solving chatgpt clip gemini hcaptcha hcaptcha-solver llm openai playwright yolo

Last synced: 13 May 2025

https://github.com/QIN2DIM/hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.

clip computer-vision hcaptcha hcaptcha-solver image-segmentation multi-modal multi-modal-learning object-detection onnx onnx-models onnxruntime opencv-python playwright solver yolo yolov5 zero-shot-classification

Last synced: 28 Mar 2025

https://github.com/mbzuai-oryx/video-chatgpt

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining

Last synced: 08 Oct 2025

https://github.com/mbzuai-oryx/Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining

Last synced: 12 Mar 2025

https://github.com/open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 20 Jul 2025

https://github.com/unum-cloud/uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

bert clip clustering contrastive-learning cross-attention huggingface-transformers image-search language-vision llava multi-lingual multimodal neural-network openai openclip pretrained-models pytorch representation-learning semantic-search transformer vector-search

Last synced: 14 May 2025

https://github.com/skalskip/vlms-zero-to-hero

This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.

bert-model clip computer-vision embeddings gpt gpt-2 lora natural-language-processing seq2seq vision-language-model word2vec

Last synced: 06 Oct 2025

https://github.com/EdVince/Stable-Diffusion-NCNN

Stable Diffusion in NCNN with c++, supported txt2img and img2img

android clip cpp diffusion executable img2img mnn ncnn onnx stable-diffusion tensorrt tnn txt2img

Last synced: 13 Apr 2025

https://github.com/haltakov/natural-language-image-search

Search photos on Unsplash using natural language

clip computer-vision image-search machine-learning photos unsplash

Last synced: 01 Apr 2025

https://github.com/haltakov/natural-language-youtube-search

Search inside YouTube videos using natural language

clip computer-vision machine-learning search youtube

Last synced: 15 Mar 2025

https://github.com/omerbt/text2live

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

clip eccv2022 generative-model image-editing image-manipulation single-image single-video text-driven-editing text2live video-editing

Last synced: 13 Apr 2025

https://github.com/omerbt/Text2LIVE

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

clip eccv2022 generative-model image-editing image-manipulation single-image single-video text-driven-editing text2live video-editing

Last synced: 28 Mar 2025

https://github.com/hila-chefer/transformer-mm-explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

clip detr explainability explainable-ai interpretability lxmert transformer transformers visualbert visualization vqa

Last synced: 12 Apr 2025

https://github.com/hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

clip detr explainability explainable-ai interpretability lxmert transformer transformers visualbert visualization vqa

Last synced: 03 Apr 2025

https://github.com/ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

activitynet clip didemo lsmdc msrvtt msvd multimodal multimodal-learning multimodality ranking retrieval retrieval-model search video-clip-retrieval video-text-retrieval

Last synced: 03 Apr 2025

https://github.com/eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

clip text-to-image text-to-video

Last synced: 07 Apr 2025

https://github.com/pengsongyou/openscene

[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

3d-scene-understanding clip cvpr2023 llm matterport3d nuscenes point-cloud-segmentation point-clouds scannet semantic-segmentation

Last synced: 20 Mar 2025

https://github.com/Sense-GVT/DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

big-model clip image-text multi-model self-supervised vision-language-pretraining zero-shot

Last synced: 03 Apr 2025

https://github.com/pablosichert/react-truncate

React component for truncating multi-line spans and adding an ellipsis.

clip ellipsis react truncate

Last synced: 16 May 2025

https://github.com/leondgarse/keras_cv_attention_models

Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam

attention clip coco ddpm detection imagenet keras model recognition segment-anything stable-diffusion tensorflow tf tf2 visualizing

Last synced: 08 Apr 2025

https://github.com/microsoft/llm2clip

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 11 Apr 2025

https://github.com/v-iashin/video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

audio-features clip feature-extraction i3d ig65m laion multi-gpu optical-flow parallel pytorch r2plus1d raft resnet s3d swin timm vggish video-features visual-features vit

Last synced: 02 Apr 2025

https://github.com/harperreed/photo-similarity-search

Super simple MLX (apple silicon) CLIP based photo similarity web app

ai clip ml mlx osx

Last synced: 04 Apr 2025

https://github.com/devhotteok/TwitchLink

Twitch Stream & Video & Clip Downloader/Recorder. This GUI downloader helps you download and record Twitch videos, including broadcasts and VODs.

broadcast clip downloader gui live m3u8 m3u8-downloader recorder stream twitch twitch-downloader video vod

Last synced: 16 May 2025

https://github.com/yangjianxin1/CLIP-Chinese

中文CLIP预训练模型

chinese clip

Last synced: 21 Jul 2025

https://github.com/yangjianxin1/clip-chinese

中文CLIP预训练模型

chinese clip

Last synced: 06 Apr 2025

https://github.com/greyovo/picquery

🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)

android clip image-text-retrieval image-text-search jetpack-compose material-design-3 openai

Last synced: 16 May 2025

https://github.com/xmed-lab/CLIP_Surgery

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

clip explainability interpretability multilabel multimodal open-vocabulary sam segment-anything segmentation vision-transformer

Last synced: 16 Mar 2025

https://github.com/iceclear/clip-iqa

[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images

clip iqa

Last synced: 06 Apr 2025

https://github.com/microsoft/LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 10 Aug 2025

https://github.com/OpenGVLab/Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

chatgpt clip llm robotics segment-anything

Last synced: 06 May 2025

https://github.com/Chrisvin/EasyReveal

Android Easy Reveal Library

android android-library clip easy easyreveal library reveal reveal-animations

Last synced: 12 Apr 2025

https://github.com/zcf0508/autocut-client

AutoCut Client

autocut clip electron video vue

Last synced: 16 May 2025

https://github.com/opengvlab/instruct2act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

chatgpt clip llm robotics segment-anything

Last synced: 20 Apr 2025

https://github.com/wisconsinaivision/vip-llava

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

chatbot clip cvpr2024 foundation-models gpt-4 gpt-4-vision llama llama2 llava multi-modal vision-language visual-prompting

Last synced: 06 Apr 2025

https://github.com/baaivision/eve

EVE Series: Encoder-Free Vision-Language Models from BAAI

clip encoder-free-vlm instruction-following large-language-models llm mllm multimodal-large-language-models vision-language-models vlm

Last synced: 12 Apr 2025

https://github.com/poloclub/diffusion-explainer

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

clip deep-learning generative-model interactive-visualization machine-learning stable-diffusion unet visual-learning visualization

Last synced: 13 May 2025

https://liruiw.github.io/gensim/

Generating Robotic Simulation Tasks via Large Language Models

clip gpt-4 llm pybullet simulation

Last synced: 08 Apr 2025

https://github.com/paddlepaddle/passl

PASSL包含 SimCLR，MoCo v1/v2，BYOL，CLIP，PixPro，simsiam, SwAV, BEiT，MAE 等图像自监督算法以及 Vision Transformer，DEiT，Swin Transformer，CvT，T2T-ViT，MLP-Mixer，XCiT，ConvNeXt，PVTv2 等基础视觉算法

beit clip convnext cvt deep-learning deit mae moco moco-v2 paddle pixpro pvt self-supervised-learning simclr swav swin-transformer vision-transformer vit xcit

Last synced: 04 Apr 2025

https://github.com/mertyg/vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

blip clip compositionality multimodal pytorch vision-language

Last synced: 25 Sep 2025

https://github.com/baaivision/diva

[ICLR 2025] Diffusion Feedback Helps CLIP See Better

clip diffusion visual-perception

Last synced: 08 Oct 2025

https://github.com/mbzuai-oryx/videogpt-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining

Last synced: 07 Apr 2025

https://github.com/yxuansu/MAGIC

Language Models Can See: Plugging Visual Controls in Text Generation

clip gpt-2 image-captioning multimodal plug-and-play-language-models story-generation text-generation unsupervised-learning zero-shot

Last synced: 27 Apr 2025

https://github.com/j-min/clip-caption-reward

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

clip image-captioning reinforcement-learning vision-and-language

Last synced: 10 Apr 2025

https://github.com/taited/clip-score

Quick scripts to calculate CLIP text-image similarity

batch clip clip-score pytorch

Last synced: 16 May 2025

https://github.com/hila-chefer/targetclip

[ECCV 2022] Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

clip computer-graphics eccv2022 image-editing image-generation image-manipulation stylegan2

Last synced: 08 May 2025

https://github.com/kyegomez/navit

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit

Last synced: 16 May 2025

https://github.com/mbzuai-oryx/VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining

Last synced: 10 Aug 2025

https://github.com/chao1224/MoleculeSTM

Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6)

clip computation-chemistry drug-discovery editing foundation-model molecule-editing moleculeclip moleculestm pretraining retrieval

Last synced: 09 May 2025

https://github.com/chao1224/moleculestm

Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6)

clip computation-chemistry drug-discovery editing foundation-model molecule-editing moleculeclip moleculestm pretraining retrieval

Last synced: 13 Apr 2025

https://github.com/zer0int/clip-fine-tune

Fine-tuning code for CLIP models

clip comfyui fine-tune fine-tuning finetune openai sdxl textencoder

Last synced: 28 Apr 2025

https://github.com/haofanwang/natural-language-joint-query-search

Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.

attention clip computer-vision image-retrieval image-search multi-modal-search unsplash visualizations

Last synced: 20 Aug 2025

https://github.com/florent37/flutter-shapeofview

Give a custom shape to any flutter widget, Material Design 2 ready

arc behavior circle clip dart diagonal elevation flutter material shape star

Last synced: 13 Apr 2025

https://github.com/paddlepaddle/paddlemix

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

aigc blip2 clip controlnet dit eva-clip image-to-text llava minigpt4 multimodal ppdiffusers qwen-vl sd-xl sora stable-diffusion stablevideodiffusion text-to-image text-to-video

Last synced: 04 Apr 2025

https://github.com/seeed-projects/tutorial-of-ai-kit-with-raspberry-pi-from-zero-to-hero

This repository provides a comprehensive step-by-step guide to building AI projects using the Raspberry Pi AI Kit.

clip computer-vision hailo8 instance-segmentation object-detection ollama pose-estimation raspberry-pi

Last synced: 04 Apr 2025

https://github.com/Imageomics/bioclip

This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].

clip computer-vision imageomics knowledge-guided-machine-learning taxonomy

Last synced: 05 Apr 2025

https://github.com/pengtaojiang/segment-anything-clip

Connecting segment-anything's output masks with the CLIP model; Awesome-Segment-Anything-Works

classification clip segment-anything semantic-segmentation

Last synced: 04 Apr 2025

https://github.com/josephrocca/clip-image-sorter

Sort a folder of images according to their similarity with provided text in your browser (uses a browser-ported version of OpenAI's CLIP model and the web's new File System Access API)

clip file-system-access-api openai openai-clip

Last synced: 03 Apr 2025

https://github.com/miccunifi/SEARLE

[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion

circo cirr clip composed-image-retrieval fashion-iq knowledge-distillation multimodal-learning pytorch textual-inversion

Last synced: 03 Apr 2025

https://github.com/laion-ai/scaling-laws-openclip

Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)

clip deep-learning few-shot-learning fine-tuning laion openclip pre-training pytorch scaling-laws transfer-learning zero-shot-classification zero-shot-retrieval

Last synced: 07 May 2025

https://github.com/fcjian/PromptDet

PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022

clip computer-vision eccv2022 novel-categories object-detection prompt-learning pseudo-labeling regional-prompt self-training vocabulary web-image zero-shot-learning

Last synced: 15 Jun 2025

https://github.com/minimaxir/imgbeddings

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

ai clip embeddings image-processing images onnx transformers

Last synced: 09 Apr 2025

https://github.com/ai-forever/ru-clip

CLIP implementation for Russian language

clip computer-vision nlp

Last synced: 20 Jun 2025

https://github.com/eddieoz/youtube-clips-automator

MARCELO: an AI powered bot to automate the editing and thumbnail creation for your Youtube clips channel

ai audio-processing automation bot clip computer-vision editing thumbnail video video-processing youtube

Last synced: 20 Oct 2025

https://github.com/Shishkebaboo/VodRecovery

The purpose of this script is to obtain videos or clips that are either marked as "sub-only" or have been deleted on Twitch.

broadcast clip clips commad-line commandline console development ffmpeg live m3u8 m3u8-playlist m3u8-videos mp4 python recover twitch twitchclips twitchtv vodrecovery

Last synced: 18 Jul 2025

https://github.com/jamjamjon/usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

clip cuda florence2 grounding-dino imshow moondream ocr onnx onnxruntime rust-yolo sam sapiens smolvlm tensorrt yolo yolo-rs yolo-rust yolov10 yolov11 yolov8

Last synced: 16 May 2025

https://github.com/ylqi/Count-Anything

This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.

clip count-anything segment-anything

Last synced: 23 Aug 2025