Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/YoadTew/zero-shot-image-to-text
Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
https://github.com/YoadTew/zero-shot-image-to-text
Last synced: 4 months ago
JSON representation
Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
- Host: GitHub
- URL: https://github.com/YoadTew/zero-shot-image-to-text
- Owner: YoadTew
- Created: 2021-11-26T15:56:44.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-09-17T13:35:30.000Z (over 2 years ago)
- Last Synced: 2024-08-01T13:30:04.296Z (7 months ago)
- Language: Python
- Homepage:
- Size: 1.65 MB
- Stars: 259
- Watchers: 7
- Forks: 42
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Pytorch Implementation of [Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic](https://arxiv.org/abs/2111.14447) [CVPR 2022]
### Check out our follow-up work - [Zero-Shot Video Captioning with Evolving Pseudo-Tokens](https://github.com/YoadTew/zero-shot-video-to-text)!
[[Paper]](https://arxiv.org/abs/2111.14447) [[Notebook]](https://www.kaggle.com/yoavstau/zero-shot-image-to-text/notebook) [[Caption Demo]](https://replicate.com/yoadtew/zero-shot-image-to-text) [[Arithmetic Demo]](https://replicate.com/yoadtew/arithmetic) [[Visual Relations Dataset]](https://drive.google.com/file/d/1hf5_zPI3hfMLNMTllZtWXcjf6ZoSTGcI)⭐ ***New:*** Run captioning configuration it in the [browser](https://replicate.com/yoadtew/zero-shot-image-to-text) using replicate.ai UI.
## Approach
data:image/s3,"s3://crabby-images/fb552/fb5525fec5a4f2c8b0e24cc72fe27be3e58af5bf" alt=""## Example of capabilities
data:image/s3,"s3://crabby-images/1260e/1260eb59dbd7d69b972d0ffc33720ecc017dc642" alt=""## Example of Visual-Semantic Arithmetic
data:image/s3,"s3://crabby-images/dcaa7/dcaa701d6aca30063c7f2b7c3da4fd6c05c595f9" alt=""## Usage
### To run captioning on a single image:
```bash
$ python run.py
--reset_context_delta
--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"
```### To run model on visual arithmetic:
```bash
$ python run.py
--reset_context_delta
--end_factor 1.06
--fusion_factor 0.95
--grad_norm_factor 0.95
--run_type arithmetics
--arithmetics_imgs "example_images/arithmetics/woman2.jpg" "example_images/arithmetics/king2.jpg" "example_images/arithmetics/man2.jpg"
--arithmetics_weights 1 1 -1
```### To run model on real world knowledge:
```bash
$ python run.py
--reset_context_delta --cond_text "Image of"
--end_factor 1.04
--caption_img_path "example_images/real_world/simpsons.jpg"
```### To run model on OCR:
```bash
$ python run.py
--reset_context_delta --cond_text "Image of text that says"
--end_factor 1.04
--caption_img_path "example_images/OCR/welcome_sign.jpg"
```### For runtime speedup using multiple gpus, use the --multi_gpu flag:
```bash
$ CUDA_VISIBLE_DEVICES=0,1,2,3,4 python run.py
--reset_context_delta
--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"
--multi_gpu
```## Citation
Please cite our work if you use it in your research:
```
@article{tewel2021zero,
title={Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic},
author={Tewel, Yoad and Shalev, Yoav and Schwartz, Idan and Wolf, Lior},
journal={arXiv preprint arXiv:2111.14447},
year={2021}
}
```