Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/YoadTew/zero-shot-image-to-text

Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
https://github.com/YoadTew/zero-shot-image-to-text

Last synced: 4 months ago
JSON representation

Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Host: GitHub
URL: https://github.com/YoadTew/zero-shot-image-to-text
Owner: YoadTew
Created: 2021-11-26T15:56:44.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-09-17T13:35:30.000Z (over 2 years ago)
Last Synced: 2024-08-01T13:30:04.296Z (7 months ago)
Language: Python
Homepage:
Size: 1.65 MB
Stars: 259
Watchers: 7
Forks: 42
Open Issues: 12
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # Pytorch Implementation of [Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic](https://arxiv.org/abs/2111.14447) [CVPR 2022]

### Check out our follow-up work - [Zero-Shot Video Captioning with Evolving Pseudo-Tokens](https://github.com/YoadTew/zero-shot-video-to-text)!

[[Paper]](https://arxiv.org/abs/2111.14447) [[Notebook]](https://www.kaggle.com/yoavstau/zero-shot-image-to-text/notebook) [[Caption Demo]](https://replicate.com/yoadtew/zero-shot-image-to-text) [[Arithmetic Demo]](https://replicate.com/yoadtew/arithmetic) [[Visual Relations Dataset]](https://drive.google.com/file/d/1hf5_zPI3hfMLNMTllZtWXcjf6ZoSTGcI)

⭐ ***New:*** Run captioning configuration it in the [browser](https://replicate.com/yoadtew/zero-shot-image-to-text) using replicate.ai UI.

## Approach

![](git_images/Architecture.jpg)

## Example of capabilities

![](git_images/teaser.jpg)

## Example of Visual-Semantic Arithmetic

![](git_images/relations.jpg)

## Usage

### To run captioning on a single image:

```bash

$ python run.py 

--reset_context_delta

--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"

```

### To run model on visual arithmetic:

```bash

$ python run.py 

--reset_context_delta

--end_factor 1.06

--fusion_factor 0.95

--grad_norm_factor 0.95

--run_type arithmetics

--arithmetics_imgs "example_images/arithmetics/woman2.jpg" "example_images/arithmetics/king2.jpg" "example_images/arithmetics/man2.jpg"

--arithmetics_weights 1 1 -1

```

### To run model on real world knowledge:

```bash

$ python run.py

--reset_context_delta --cond_text "Image of" 

--end_factor 1.04 

--caption_img_path "example_images/real_world/simpsons.jpg"

```

### To run model on OCR:

```bash

$ python run.py

--reset_context_delta --cond_text "Image of text that says" 

--end_factor 1.04 

--caption_img_path "example_images/OCR/welcome_sign.jpg"

```

### For runtime speedup using multiple gpus, use the --multi_gpu flag:

```bash

$ CUDA_VISIBLE_DEVICES=0,1,2,3,4 python run.py 

--reset_context_delta

--caption_img_path "example_images/captions/COCO_val2014_000000097017.jpg"

--multi_gpu

```

## Citation

Please cite our work if you use it in your research:

```

@article{tewel2021zero,

  title={Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic},

  author={Tewel, Yoad and Shalev, Yoav and Schwartz, Idan and Wolf, Lior},

  journal={arXiv preprint arXiv:2111.14447},

  year={2021}

}

```