An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with clip

A curated list of projects in awesome lists tagged with clip .

https://github.com/easychen/pushdeer

开放源码的无App推送服务,iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备

app clip notification-service push

Last synced: 12 Apr 2025

https://github.com/CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

clip deep-learning deeplearning labeling-tool llm onnx paddle pytorch resnet sam yolo

Last synced: 20 Mar 2025

https://github.com/yuanzhoulvpi2017/zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

bert chatglm-6b clip gpt gpt2 huggingface-transformers llama llama2 llava nlp pytorch text-generation transformers

Last synced: 14 May 2025

https://github.com/pharmapsychotic/clip-interrogator

Image to prompt with BLIP and CLIP

clip pytorch

Last synced: 14 May 2025

https://github.com/rom1504/clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

ai clip deep-learning knn multimodal semantic-search

Last synced: 14 May 2025

https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2Fknn5.laion.ai&index=laion5B&useMclip=false

Easily compute clip embeddings and build a clip retrieval system with them

ai clip deep-learning knn multimodal semantic-search

Last synced: 08 May 2025

https://github.com/open-compass/vlmevalkit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 13 May 2025

https://github.com/qin2dim/hcaptcha-challenger

🥂 Gracefully face hCaptcha challenge with multimodal large language model.

agent ai-agents captcha captcha-solver captcha-solving chatgpt clip gemini hcaptcha hcaptcha-solver llm openai playwright yolo

Last synced: 13 May 2025

https://github.com/mbzuai-oryx/video-chatgpt

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining

Last synced: 08 Oct 2025

https://github.com/mbzuai-oryx/Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot clip gpt-4 llama llava mulit-modal vicuna video-chatboat video-conversation vision-language vision-language-pretraining

Last synced: 12 Mar 2025

https://github.com/open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

chatgpt claude clip computer-vision evaluation gemini gpt gpt-4v gpt4 large-language-models llava llm multi-modal openai openai-api pytorch qwen vit vqa

Last synced: 20 Jul 2025

https://github.com/unum-cloud/uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

bert clip clustering contrastive-learning cross-attention huggingface-transformers image-search language-vision llava multi-lingual multimodal neural-network openai openclip pretrained-models pytorch representation-learning semantic-search transformer vector-search

Last synced: 14 May 2025

https://github.com/skalskip/vlms-zero-to-hero

This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.

bert-model clip computer-vision embeddings gpt gpt-2 lora natural-language-processing seq2seq vision-language-model word2vec

Last synced: 06 Oct 2025

https://github.com/EdVince/Stable-Diffusion-NCNN

Stable Diffusion in NCNN with c++, supported txt2img and img2img

android clip cpp diffusion executable img2img mnn ncnn onnx stable-diffusion tensorrt tnn txt2img

Last synced: 13 Apr 2025

https://github.com/haltakov/natural-language-image-search

Search photos on Unsplash using natural language

clip computer-vision image-search machine-learning photos unsplash

Last synced: 01 Apr 2025

https://github.com/haltakov/natural-language-youtube-search

Search inside YouTube videos using natural language

clip computer-vision machine-learning search youtube

Last synced: 15 Mar 2025

https://github.com/omerbt/text2live

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

clip eccv2022 generative-model image-editing image-manipulation single-image single-video text-driven-editing text2live video-editing

Last synced: 13 Apr 2025

https://github.com/omerbt/Text2LIVE

Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

clip eccv2022 generative-model image-editing image-manipulation single-image single-video text-driven-editing text2live video-editing

Last synced: 28 Mar 2025

https://github.com/hila-chefer/transformer-mm-explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

clip detr explainability explainable-ai interpretability lxmert transformer transformers visualbert visualization vqa

Last synced: 12 Apr 2025

https://github.com/hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

clip detr explainability explainable-ai interpretability lxmert transformer transformers visualbert visualization vqa

Last synced: 03 Apr 2025

https://github.com/ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

activitynet clip didemo lsmdc msrvtt msvd multimodal multimodal-learning multimodality ranking retrieval retrieval-model search video-clip-retrieval video-text-retrieval

Last synced: 03 Apr 2025

https://github.com/eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

clip text-to-image text-to-video

Last synced: 07 Apr 2025

https://github.com/Sense-GVT/DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

big-model clip image-text multi-model self-supervised vision-language-pretraining zero-shot

Last synced: 03 Apr 2025

https://github.com/pablosichert/react-truncate

React component for truncating multi-line spans and adding an ellipsis.

clip ellipsis react truncate

Last synced: 16 May 2025

https://github.com/leondgarse/keras_cv_attention_models

Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam

attention clip coco ddpm detection imagenet keras model recognition segment-anything stable-diffusion tensorflow tf tf2 visualizing

Last synced: 08 Apr 2025

https://github.com/microsoft/llm2clip

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 11 Apr 2025

https://github.com/v-iashin/video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

audio-features clip feature-extraction i3d ig65m laion multi-gpu optical-flow parallel pytorch r2plus1d raft resnet s3d swin timm vggish video-features visual-features vit

Last synced: 02 Apr 2025

https://github.com/harperreed/photo-similarity-search

Super simple MLX (apple silicon) CLIP based photo similarity web app

ai clip ml mlx osx

Last synced: 04 Apr 2025

https://github.com/devhotteok/TwitchLink

Twitch Stream & Video & Clip Downloader/Recorder. This GUI downloader helps you download and record Twitch videos, including broadcasts and VODs.

broadcast clip downloader gui live m3u8 m3u8-downloader recorder stream twitch twitch-downloader video vod

Last synced: 16 May 2025

https://github.com/yangjianxin1/CLIP-Chinese

中文CLIP预训练模型

chinese clip

Last synced: 21 Jul 2025

https://github.com/yangjianxin1/clip-chinese

中文CLIP预训练模型

chinese clip

Last synced: 06 Apr 2025

https://github.com/greyovo/picquery

🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)

android clip image-text-retrieval image-text-search jetpack-compose material-design-3 openai

Last synced: 16 May 2025

https://github.com/xmed-lab/CLIP_Surgery

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

clip explainability interpretability multilabel multimodal open-vocabulary sam segment-anything segmentation vision-transformer

Last synced: 16 Mar 2025

https://github.com/iceclear/clip-iqa

[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images

clip iqa

Last synced: 06 Apr 2025

https://github.com/microsoft/LLM2CLIP

LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.

clip fundation-models multimodality

Last synced: 10 Aug 2025

https://github.com/OpenGVLab/Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

chatgpt clip llm robotics segment-anything

Last synced: 06 May 2025

https://github.com/zcf0508/autocut-client

AutoCut Client

autocut clip electron video vue

Last synced: 16 May 2025

https://github.com/opengvlab/instruct2act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

chatgpt clip llm robotics segment-anything

Last synced: 20 Apr 2025

https://github.com/wisconsinaivision/vip-llava

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

chatbot clip cvpr2024 foundation-models gpt-4 gpt-4-vision llama llama2 llava multi-modal vision-language visual-prompting

Last synced: 06 Apr 2025

https://liruiw.github.io/gensim/

Generating Robotic Simulation Tasks via Large Language Models

clip gpt-4 llm pybullet simulation

Last synced: 08 Apr 2025

https://github.com/paddlepaddle/passl

PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法

beit clip convnext cvt deep-learning deit mae moco moco-v2 paddle pixpro pvt self-supervised-learning simclr swav swin-transformer vision-transformer vit xcit

Last synced: 04 Apr 2025

https://github.com/mertyg/vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

blip clip compositionality multimodal pytorch vision-language

Last synced: 25 Sep 2025

https://github.com/baaivision/diva

[ICLR 2025] Diffusion Feedback Helps CLIP See Better

clip diffusion visual-perception

Last synced: 08 Oct 2025

https://github.com/mbzuai-oryx/videogpt-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining

Last synced: 07 Apr 2025

https://github.com/j-min/clip-caption-reward

PyTorch code for "Fine-grained Image Captioning with CLIP Reward" (Findings of NAACL 2022)

clip image-captioning reinforcement-learning vision-and-language

Last synced: 10 Apr 2025

https://github.com/taited/clip-score

Quick scripts to calculate CLIP text-image similarity

batch clip clip-score pytorch

Last synced: 16 May 2025

https://github.com/hila-chefer/targetclip

[ECCV 2022] Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

clip computer-graphics eccv2022 image-editing image-generation image-manipulation stylegan2

Last synced: 08 May 2025

https://github.com/kyegomez/navit

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

attention-mechanism clip gpt4 multimodal multimodal-deep-learning multimodal-learning multimodality vit

Last synced: 16 May 2025

https://github.com/mbzuai-oryx/VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

chatbot clip dual-encoder gpt4 gpt4o image-encoder llama3 llava multimodal phi-3-mini vicuna video-chatbot video-conversation video-encoder vision-language vision-language-pretraining

Last synced: 10 Aug 2025

https://github.com/chao1224/MoleculeSTM

Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6)

clip computation-chemistry drug-discovery editing foundation-model molecule-editing moleculeclip moleculestm pretraining retrieval

Last synced: 09 May 2025

https://github.com/chao1224/moleculestm

Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6)

clip computation-chemistry drug-discovery editing foundation-model molecule-editing moleculeclip moleculestm pretraining retrieval

Last synced: 13 Apr 2025

https://github.com/haofanwang/natural-language-joint-query-search

Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.

attention clip computer-vision image-retrieval image-search multi-modal-search unsplash visualizations

Last synced: 20 Aug 2025

https://github.com/florent37/flutter-shapeofview

Give a custom shape to any flutter widget, Material Design 2 ready

arc behavior circle clip dart diagonal elevation flutter material shape star

Last synced: 13 Apr 2025

https://github.com/paddlepaddle/paddlemix

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

aigc blip2 clip controlnet dit eva-clip image-to-text llava minigpt4 multimodal ppdiffusers qwen-vl sd-xl sora stable-diffusion stablevideodiffusion text-to-image text-to-video

Last synced: 04 Apr 2025

https://github.com/seeed-projects/tutorial-of-ai-kit-with-raspberry-pi-from-zero-to-hero

This repository provides a comprehensive step-by-step guide to building AI projects using the Raspberry Pi AI Kit.

clip computer-vision hailo8 instance-segmentation object-detection ollama pose-estimation raspberry-pi

Last synced: 04 Apr 2025

https://github.com/Imageomics/bioclip

This is the repository for the BioCLIP model and the TreeOfLife-10M dataset [CVPR'24 Oral, Best Student Paper].

clip computer-vision imageomics knowledge-guided-machine-learning taxonomy

Last synced: 05 Apr 2025

https://github.com/pengtaojiang/segment-anything-clip

Connecting segment-anything's output masks with the CLIP model; Awesome-Segment-Anything-Works

classification clip segment-anything semantic-segmentation

Last synced: 04 Apr 2025

https://github.com/josephrocca/clip-image-sorter

Sort a folder of images according to their similarity with provided text in your browser (uses a browser-ported version of OpenAI's CLIP model and the web's new File System Access API)

clip file-system-access-api openai openai-clip

Last synced: 03 Apr 2025

https://github.com/miccunifi/SEARLE

[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion

circo cirr clip composed-image-retrieval fashion-iq knowledge-distillation multimodal-learning pytorch textual-inversion

Last synced: 03 Apr 2025

https://github.com/laion-ai/scaling-laws-openclip

Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)

clip deep-learning few-shot-learning fine-tuning laion openclip pre-training pytorch scaling-laws transfer-learning zero-shot-classification zero-shot-retrieval

Last synced: 07 May 2025

https://github.com/minimaxir/imgbeddings

Python package to generate image embeddings with CLIP without PyTorch/TensorFlow

ai clip embeddings image-processing images onnx transformers

Last synced: 09 Apr 2025

https://github.com/ai-forever/ru-clip

CLIP implementation for Russian language

clip computer-vision nlp

Last synced: 20 Jun 2025

https://github.com/eddieoz/youtube-clips-automator

MARCELO: an AI powered bot to automate the editing and thumbnail creation for your Youtube clips channel

ai audio-processing automation bot clip computer-vision editing thumbnail video video-processing youtube

Last synced: 20 Oct 2025

https://github.com/Shishkebaboo/VodRecovery

The purpose of this script is to obtain videos or clips that are either marked as "sub-only" or have been deleted on Twitch.

broadcast clip clips commad-line commandline console development ffmpeg live m3u8 m3u8-playlist m3u8-videos mp4 python recover twitch twitchclips twitchtv vodrecovery

Last synced: 18 Jul 2025

https://github.com/jamjamjon/usls

A Rust library integrated with ONNXRuntime, providing a collection of Computer Vison and Vision-Language models.

clip cuda florence2 grounding-dino imshow moondream ocr onnx onnxruntime rust-yolo sam sapiens smolvlm tensorrt yolo yolo-rs yolo-rust yolov10 yolov11 yolov8

Last synced: 16 May 2025

https://github.com/ylqi/Count-Anything

This method uses Segment Anything and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.

clip count-anything segment-anything

Last synced: 23 Aug 2025

https://github.com/hv0905/nekoimagegallery

An AI-powered natural language & reverse Image Search Engine powered by CLIP & qdrant.

clip computer-vision image-search image-search-engine search-engine transformers

Last synced: 06 Apr 2025

https://github.com/HFAiLab/clip-gen

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

clip pytorch text-to-image text2image

Last synced: 03 Apr 2025

https://github.com/ajatt-tools/videoclip

🍗 Easily create videoclips with mpv.

addon ajatt audioclip clip mpv mpv-script videoclip

Last synced: 02 Nov 2025

https://github.com/skalskip/transformers

Everything you need to know about Transformers! 🤖

attention-mechanism clip detr gpt transformers visual-transformer

Last synced: 11 Jul 2025

https://github.com/soulteary/simple-image-search-engine

图片搜索引擎,很简单。三步构建属于你自己的图片搜索引擎,掌握向量数据库和以图搜图、文本搜索图片。

clip docker image-search-engine image-similarity picture-search redis redis-vector-search search-engine vector-database vits

Last synced: 06 Jul 2025

https://github.com/DRSY/MoTIS

[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

ai clip cross-modal image-search ios-swift k-means k-means-clustering knn knowledge-distillation lsh naacl random-projection retrieval semantic-search vector-search

Last synced: 08 May 2025

https://github.com/wangrongding/WebCut

🎬 基于 web 端的音视频编辑器。(A web-based audio and video editor.)

audio audio-editor audio-processing clip cut video video-editor video-processing wasm webcodecs

Last synced: 24 Mar 2025

https://github.com/wangrongding/webcut

🎬 基于 web 端的音视频编辑器。(A web-based audio and video editor.)

audio audio-editor audio-processing clip cut video video-editor video-processing wasm webcodecs

Last synced: 29 Oct 2025

https://github.com/Ajatt-Tools/videoclip

🍗 Easily create videoclips with mpv.

addon ajatt audioclip clip mpv mpv-script videoclip

Last synced: 10 Jul 2025

https://github.com/tnwei/vqgan-clip-app

Local image generation using VQGAN-CLIP or CLIP guided diffusion

clip deep-learning generative-art guided-diffusion image-generation streamlit text2image vqgan-clip

Last synced: 13 Apr 2025

https://github.com/foolwood/drl

[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

clip interaction-nets text-video-search-engine transformer video-retrieval

Last synced: 30 Aug 2025

https://github.com/nvidia-ai-iot/clip-distillation

Zero-label image classification via OpenCLIP knowledge distillation

clip distillation inference jetson knowledge nvidia qat sparsity tensorrt

Last synced: 13 Oct 2025

https://github.com/aerobounce/trim.lua

Trim mode for mpv — Turn mpv into Lossless Audio / Video Editor

clip concat ffmpeg lossless lua lua-script mpv mpv-script trim video video-editor video-processing

Last synced: 10 Jul 2025

https://github.com/marqo-ai/marqo-fashionclip

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

clip embeddings fashion-classifier fashionclip informationretrieval multimodal recomendations search transformers vectorsearch vision-transformer

Last synced: 30 Jul 2025

https://github.com/pansyjs/video-editing-timeline

Timeline for video editing(为视频编辑而写时间线)

clip cut editing timeline video video-clip video-cut video-editing

Last synced: 10 Apr 2025