https://github.com/imkett/zerogen

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation
https://github.com/imkett/zerogen

captioning controllable-text-generation decoding gpt2 multimodal nlpcc vision-language zero-shot

Last synced: about 1 year ago
JSON representation

[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation

Host: GitHub
URL: https://github.com/imkett/zerogen
Owner: ImKeTT
Created: 2023-06-30T01:34:46.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2023-10-07T02:55:34.000Z (over 2 years ago)
Last Synced: 2025-03-26T20:51:29.732Z (over 1 year ago)
Topics: captioning, controllable-text-generation, decoding, gpt2, multimodal, nlpcc, vision-language, zero-shot
Language: Python
Homepage: https://arxiv.org/abs/2306.16649
Size: 2.94 MB
Stars: 12
Watchers: 1
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles

Official PyTorch implementation of ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles (https://arxiv.org/abs/2306.16649), accepted to NLPCC 2023.

![teaser](./teaser.jpg)

## Setup

Make sure you have installed:

```bash

transformers

nltk

scikit-learn

torch

numpy

tqdm

```

## Data and Model Weights

### Data Structure

The [extra data](https://drive.google.com/drive/folders/1XHviYZnrX3KNqSKvUwkoHsxmeSFP5Jgn?usp=sharing) contains:

1. Objects, textual features, ect. for MSCOCO, Flickr30k, Flickr10k, VisNews.

2. The training/test data for Flickr10k and VisNews.

3. `evaluation` suite for captioning and text control evaluations.

4. `npy_data` folder for extracted GloVe features.

### Data Processing and Preparation

For processing these data and obtain the whole test data:

1. For the test data (images and captions) of MSCOCO and Flickr30k, please refer to the downloading details from [this repository](https://github.com/yxuansu/MAGIC). Put the datasets to the path you wish and change the `DATA_DIR` in `config.json` file accordingly.

2. For the test images of ViseNews, please refer to their official [repository](https://github.com/FuxiaoLiu/VisualNews-Repository) to donwload. Move the `visnews` folder to your data path, and images to the same `visnews` directory.

3. Move all files in `flickr30_data_zerogen` and `mscoco_data_zerogen` to the Flicrk30k and MSCOCO folders, respectively.

4. Move `flickr10_data_zerogen` and `visnews_data_zerogen` to data directory.

5. Put the `evaluation` folder to the current directory.

Note that, for all data employed, please follow their licenses for any other purpose.

### Model Weights

| Task               | Weight                                                   |

| :----------------- | :------------------------------------------------------- |

| MSCOCO             | https://huggingface.co/cambridgeltl/magic\_mscoco        |

| Flickr30k          | https://huggingface.co/cambridgeltl/magic\_flickr30k     |

| Flickr10k-romantic | https://huggingface.co/PahaII/ZeroGen-flickr10k-romantic |

| Flickr10k-humor    | https://huggingface.co/PahaII/ZeroGen-flickr10k-humor    |

| VisNews            | https://huggingface.co/PahaII/ZeroGen-visnews            |

## ZeroGen Generation

```bash

TASK=mscoco

LENGTH=16

ALPHA=1.0

BETA=1.0

ETA=0.10

K=45

ALPHA_HAT=2.5

BETA_HAT=1.0

N=1

python run_zerogen.py --alpha ${ALPHA} --beta ${BETA} --eta ${ETA} --k ${K} --condition_method add \

                       --task ${TASK} --decoding_len ${LENGTH} --alpha_scale --alpha_activasize ${ALPHA_HAT}  \

                       --beta_scale --beta_activesize 0.2 --beta_upper ${BETA_HAT} --n_obj ${N} --kw_mode max --k2t

```

Here are recommended parameters for ZeroGen generation:

| Task               | $k$ | $\alpha$ | $\beta$ | $\eta$ | $\hat{\alpha}$ | $\hat{\beta}$ | $N$ | length

| :----------------- | :---- | :---------- | :--------- | :-------- | :----------------- | :---------------- | :---- | :---- |

| MSCOCO             | 45    | 1\.0        | 1\.0       | 0\.10     | 2\.5               | 1\.0              | 1~5   | 16

| Flickr30k          | 25    | 2\.0        | 1\.0       | 0\.10     | 2\.0               | 0\.5              | 1~5   | 16

| Flickr10k-romantic | 45    | 1\.0        | 1\.0       | 0\.10     | 3\.0               | 0\.5              | 1     | 25

| Flickr10k-humor    | 45    | 1\.0        | 1\.0       | 0\.10     | 2\.5               | 0\.5              | 1     | 25

| VisNews            | 5     | 8\.0        | 1\.0       | 0\.65     | 8\.0               | 0\.5              | 40    | 64

We also support the inference of sequence-to-sequence models like [FlanT5](https://huggingface.co/google/flan-t5-base), just add `--seq2seq` flag and specify the model name via `--language_model_name` argument.

## Baseline Models

For [CapDec](https://github.com/DavidHuji/CapDec), [ZeroCap](https://github.com/YoadTew/zero-shot-image-to-text), [MAGIC](https://github.com/yxuansu/MAGIC) baselines in captioning tasks, please refer to their official repositories.

For PPLM+MAGIC baseline in controllable news generation task, we provide a minimal implementation in the `Pplm_Magic` folder.

## Citation

If you find our work useful, please consider cite our paper and star the repo :)

```bibtex

@article{tu2023zerogen,

  title={ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles},

  author={Tu, Haoqin and Yang, Bowen and Zhao, Xianfeng},

  journal={arXiv preprint arXiv:2306.16649},

  year={2023}

}

```

Please [email](tuisaac163@gmail.com) me or open an issue if you have further questions. We thank open sourced codes related to zero-shot captioning and plug-and-play models, which inspired our work!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/imkett/zerogen

Awesome Lists containing this project

README