{"id":21741212,"url":"https://github.com/imkett/zerogen","last_synced_at":"2025-04-13T03:42:00.869Z","repository":{"id":178605147,"uuid":"660424274","full_name":"ImKeTT/ZeroGen","owner":"ImKeTT","description":"[NLPCC'23] ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles PyTorch Implementation","archived":false,"fork":false,"pushed_at":"2023-10-07T02:55:34.000Z","size":3080,"stargazers_count":12,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T20:51:29.732Z","etag":null,"topics":["captioning","controllable-text-generation","decoding","gpt2","multimodal","nlpcc","vision-language","zero-shot"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2306.16649","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ImKeTT.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-06-30T01:34:46.000Z","updated_at":"2024-12-09T14:41:05.000Z","dependencies_parsed_at":null,"dependency_job_id":"a6f5b9d0-8c1e-4929-b3d4-4b29ad07af3a","html_url":"https://github.com/ImKeTT/ZeroGen","commit_stats":null,"previous_names":["imkett/zerogen"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImKeTT%2FZeroGen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImKeTT%2FZeroGen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImKeTT%2FZeroGen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ImKeTT%2FZeroGen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ImKeTT","download_url":"https://codeload.github.com/ImKeTT/ZeroGen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248660977,"owners_count":21141380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captioning","controllable-text-generation","decoding","gpt2","multimodal","nlpcc","vision-language","zero-shot"],"created_at":"2024-11-26T06:17:23.692Z","updated_at":"2025-04-13T03:42:00.846Z","avatar_url":"https://github.com/ImKeTT.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles\nOfficial PyTorch implementation of ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles (https://arxiv.org/abs/2306.16649), accepted to NLPCC 2023.\n\n![teaser](./teaser.jpg)\n\n## Setup\n\nMake sure you have installed:\n```bash\ntransformers\nnltk\nscikit-learn\ntorch\nnumpy\ntqdm\n```\n\n## Data and Model Weights\n\n### Data Structure\n\nThe [extra data](https://drive.google.com/drive/folders/1XHviYZnrX3KNqSKvUwkoHsxmeSFP5Jgn?usp=sharing) contains:\n\n1. Objects, textual features, ect. for MSCOCO, Flickr30k, Flickr10k, VisNews.\n2. The training/test data for Flickr10k and VisNews.\n3. `evaluation` suite for captioning and text control evaluations.\n4. `npy_data` folder for extracted GloVe features.\n\n### Data Processing and Preparation\n\nFor processing these data and obtain the whole test data:\n\n1. For the test data (images and captions) of MSCOCO and Flickr30k, please refer to the downloading details from [this repository](https://github.com/yxuansu/MAGIC). Put the datasets to the path you wish and change the `DATA_DIR` in `config.json` file accordingly.\n2. For the test images of ViseNews, please refer to their official [repository](https://github.com/FuxiaoLiu/VisualNews-Repository) to donwload. Move the `visnews` folder to your data path, and images to the same `visnews` directory.\n3. Move all files in `flickr30_data_zerogen` and `mscoco_data_zerogen` to the Flicrk30k and MSCOCO folders, respectively.\n4. Move `flickr10_data_zerogen` and `visnews_data_zerogen` to data directory.\n5. Put the `evaluation` folder to the current directory.\n\nNote that, for all data employed, please follow their licenses for any other purpose.\n\n\n### Model Weights\n\n| Task               | Weight                                                   |\n| :----------------- | :------------------------------------------------------- |\n| MSCOCO             | https://huggingface.co/cambridgeltl/magic\\_mscoco        |\n| Flickr30k          | https://huggingface.co/cambridgeltl/magic\\_flickr30k     |\n| Flickr10k-romantic | https://huggingface.co/PahaII/ZeroGen-flickr10k-romantic |\n| Flickr10k-humor    | https://huggingface.co/PahaII/ZeroGen-flickr10k-humor    |\n| VisNews            | https://huggingface.co/PahaII/ZeroGen-visnews            |\n\n## ZeroGen Generation\n\n```bash\nTASK=mscoco\nLENGTH=16\nALPHA=1.0\nBETA=1.0\nETA=0.10\nK=45\nALPHA_HAT=2.5\nBETA_HAT=1.0\nN=1\n\npython run_zerogen.py --alpha ${ALPHA} --beta ${BETA} --eta ${ETA} --k ${K} --condition_method add \\\n                       --task ${TASK} --decoding_len ${LENGTH} --alpha_scale --alpha_activasize ${ALPHA_HAT}  \\\n                       --beta_scale --beta_activesize 0.2 --beta_upper ${BETA_HAT} --n_obj ${N} --kw_mode max --k2t\n```\n\nHere are recommended parameters for ZeroGen generation:\n\n| Task               | $k$ | $\\alpha$ | $\\beta$ | $\\eta$ | $\\hat{\\alpha}$ | $\\hat{\\beta}$ | $N$ | length\n| :----------------- | :---- | :---------- | :--------- | :-------- | :----------------- | :---------------- | :---- | :---- |\n| MSCOCO             | 45    | 1\\.0        | 1\\.0       | 0\\.10     | 2\\.5               | 1\\.0              | 1~5   | 16\n| Flickr30k          | 25    | 2\\.0        | 1\\.0       | 0\\.10     | 2\\.0               | 0\\.5              | 1~5   | 16\n| Flickr10k-romantic | 45    | 1\\.0        | 1\\.0       | 0\\.10     | 3\\.0               | 0\\.5              | 1     | 25\n| Flickr10k-humor    | 45    | 1\\.0        | 1\\.0       | 0\\.10     | 2\\.5               | 0\\.5              | 1     | 25\n| VisNews            | 5     | 8\\.0        | 1\\.0       | 0\\.65     | 8\\.0               | 0\\.5              | 40    | 64\n\nWe also support the inference of sequence-to-sequence models like [FlanT5](https://huggingface.co/google/flan-t5-base), just add `--seq2seq` flag and specify the model name via `--language_model_name` argument.\n\n## Baseline Models\n\nFor [CapDec](https://github.com/DavidHuji/CapDec), [ZeroCap](https://github.com/YoadTew/zero-shot-image-to-text), [MAGIC](https://github.com/yxuansu/MAGIC) baselines in captioning tasks, please refer to their official repositories.\n\nFor PPLM+MAGIC baseline in controllable news generation task, we provide a minimal implementation in the `Pplm_Magic` folder.\n\n## Citation\n\nIf you find our work useful, please consider cite our paper and star the repo :)\n\n```bibtex\n@article{tu2023zerogen,\n  title={ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles},\n  author={Tu, Haoqin and Yang, Bowen and Zhao, Xianfeng},\n  journal={arXiv preprint arXiv:2306.16649},\n  year={2023}\n}\n```\n\nPlease [email](tuisaac163@gmail.com) me or open an issue if you have further questions. We thank open sourced codes related to zero-shot captioning and plug-and-play models, which inspired our work!","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimkett%2Fzerogen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimkett%2Fzerogen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimkett%2Fzerogen/lists"}