{"id":19074045,"url":"https://github.com/zer0int/clip-xai-gui","last_synced_at":"2025-10-24T17:51:35.722Z","repository":{"id":229373928,"uuid":"776567914","full_name":"zer0int/CLIP-XAI-GUI","owner":"zer0int","description":"CLIP GUI - XAI app ~ explainable (and guessable) AI with ViT \u0026 ResNet models","archived":false,"fork":false,"pushed_at":"2024-09-13T19:29:36.000Z","size":3629,"stargazers_count":20,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"CLIP-vision","last_synced_at":"2025-04-28T19:52:38.309Z","etag":null,"topics":["attention","attention-visualization","clip","game","gradient-ascent","gui","image-to-text","vision-transformer","vit","xai"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zer0int.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-23T21:31:46.000Z","updated_at":"2025-04-11T08:32:54.000Z","dependencies_parsed_at":"2025-04-18T07:43:25.464Z","dependency_job_id":"108f98ed-bb00-4028-95cb-bdf478e91bcb","html_url":"https://github.com/zer0int/CLIP-XAI-GUI","commit_stats":null,"previous_names":["zer0int/clip-xai-gui"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zer0int/CLIP-XAI-GUI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zer0int%2FCLIP-XAI-GUI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zer0int%2FCLIP-XAI-GUI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zer0int%2FCLIP-XAI-GUI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zer0int%2FCLIP-XAI-GUI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zer0int","download_url":"https://codeload.github.com/zer0int/CLIP-XAI-GUI/tar.gz/refs/heads/CLIP-vision","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zer0int%2FCLIP-XAI-GUI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280841087,"owners_count":26400398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-24T02:00:06.418Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","attention-visualization","clip","game","gradient-ascent","gui","image-to-text","vision-transformer","vit","xai"],"created_at":"2024-11-09T01:49:29.662Z","updated_at":"2025-10-24T17:51:35.686Z","avatar_url":"https://github.com/zer0int.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n![CLIP-gui-banner2](https://github.com/zer0int/CLIP-XAI-GUI/assets/132047210/208fce6e-221b-4ff3-b7ee-3795a97b4fb6)\n\n### Change 4/May/2024:\n\n- Added AMP (Automatic Mixed Precision); uses torch.cuda.amp / autocast + GradScaler\n- ViT models are now much smaller - ViT-L/14 fits into 24 GB VRAM!\n- Just do \"python run_clipapp-amp.py\" to launch the GUI / use AMP for a CLIP 'opinion'.\n\n![before-after](https://github.com/zer0int/CLIP-XAI-GUI/assets/132047210/11b6b703-3f64-42df-9177-31143834d6c2)\n\n-----\n\n## CLIP GUI - XAI app ~ explainable (and guessable) ViT \u0026 ResNet\n\nThis is a GUI for OpenAI's CLIP ViT and ResNet models, where you can:\n- Upload an image, get a CLIP 'opinion' (text) about the image\n- --\u003e Gradient Ascent -\u003e optimize text embeddings for cosine similarity with image embedding -\u003e tokenizer -\u003e CLIP 'opinion' words\n- Guess where CLIP was 'looking' for a given predicted word by setting a ROI (optional) \u0026 see what CLIP was 'looking' at\n- --\u003e \"GradCAM\" - like heatmap of salient features / attention visualization\n\n## Installation \u0026 Running\n\n- **Prerequisite**: [OpenAI CLIP](https://github.com/openai/CLIP)\n- Check / install `requirements.txt`\n- From the console, use \"python run_clipapp.py\" -\u003e GUI\n\n- Default CLIP ViT-B/32 takes ~15 seconds to generate an 'opinion' (RTX 4090), 4 GB VRAM.\n- Gigantic models \u003e\u003e 24 GB VRAM can use NVIDIA Driver CUDA SysMem Fallback Policy to run, but largest models ~ 30 Minutes for 1 opinion (not recommended)\n- You can get a smaller model's \"opinion\" and force that on a bigger model (should work for all \u003e=6 GB VRAM), or add your own words to visualize.\n- Check the console to see what CLIP is \"MatMulling\" about while you wait to get a CLIP opinion.\n- Click the image to place a ROI and \"guess where CLIP was looking\" (gamification, optional).\n- Images and texts are saved to the \"clipapp\" subfolder.\n- Check out the examples in \"image-examples\" to get started with some interesting images (hilarious 'opinion', typographic attack vulnerability, ...).\n- Use square images for best results\n\n\n## Credits / Built On\n\n- [OpenAI / CLIP](https://github.com/openai/CLIP)\n- ViT heatmaps built on: [Transformer-MM-Explainability](https://github.com/hila-chefer/Transformer-MM-Explainability)\n- ResNet heatmaps built on: [GradCAM Visualization](https://github.com/kevinzakka/clip_playground)\n- Original CLIP Gradient Ascent Script: Used with permission by Twitter / X: [@advadnoun](https://twitter.com/advadnoun)\n- Special thanks to GPT-4 for coding 90% of the run_clipapp.py =)\n\n## Warning about Bias and Fairness in CLIP Output\n\nCLIP 'opinions' may contain biased rants (especially when non-English text is in the image), slurs, and profanity. Use responsibly / at your own discretion.\nFor more information, refer to the [CLIP Model Card](https://github.com/openai/CLIP/blob/main/model-card.md).\n\n## Known Issues\n- No threading, scripts that invoke models run on main thread (check console to verify thread is not *actually* hanging)\n\n## Examples\n\n![Screenshot 2024-03-23 163731](https://github.com/zer0int/CLIP-XAI-GUI/assets/132047210/17f4bc5f-51e3-4c87-96b5-682a5fcaa794)\n\n![example_git](https://github.com/zer0int/CLIP-XAI-GUI/assets/132047210/170b20e2-9ce1-4b12-bb86-706af89db156)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzer0int%2Fclip-xai-gui","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzer0int%2Fclip-xai-gui","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzer0int%2Fclip-xai-gui/lists"}