{"id":21892956,"url":"https://github.com/chunyuanli/optimus","last_synced_at":"2025-04-06T01:08:13.161Z","repository":{"id":49318441,"uuid":"223615114","full_name":"ChunyuanLI/Optimus","owner":"ChunyuanLI","description":"Optimus: the first large-scale pre-trained VAE language model","archived":false,"fork":false,"pushed_at":"2023-09-06T17:33:07.000Z","size":1114,"stargazers_count":384,"open_issues_count":23,"forks_count":39,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-03-30T00:06:32.235Z","etag":null,"topics":["language-model","pretrained-models","vae","vae-pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChunyuanLI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-11-23T15:59:45.000Z","updated_at":"2025-03-09T22:35:58.000Z","dependencies_parsed_at":"2023-01-25T15:01:14.379Z","dependency_job_id":"a410d860-d35f-4629-b57d-9ba554923fe0","html_url":"https://github.com/ChunyuanLI/Optimus","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChunyuanLI%2FOptimus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChunyuanLI%2FOptimus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChunyuanLI%2FOptimus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChunyuanLI%2FOptimus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChunyuanLI","download_url":"https://codeload.github.com/ChunyuanLI/Optimus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247419860,"owners_count":20936012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-model","pretrained-models","vae","vae-pytorch"],"created_at":"2024-11-28T13:00:08.073Z","updated_at":"2025-04-06T01:08:13.131Z","avatar_url":"https://github.com/ChunyuanLI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Optimus: the first pre-trained Big VAE language model \u003cimg src=\"doc/figs/logo_optimus.png\" width=\"100\" align=\"right\"\u003e  \n \nThis repository contains source code necessary to reproduce the results presented in the EMNLP 2020 paper [Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space](https://arxiv.org/abs/2004.04092).\n\n\n|\u003cimg src=\"doc/figs/optimus_scheme.png\" width=\"350\"\u003e | \u003cimg src=\"doc/figs/headfig_optimus.png\" width=\"800\"\u003e \n|-------------------------|:-------------------------:|\n| The network architecture of Optimus: encoder for representation learning and decoder for generation | Sentences are organized and manipulated in a pre-trained compact and smooth latent space \n\n\nFor more on this project, see the [Microsoft Research Blog post](https://www.microsoft.com/en-us/research/blog/a-deep-generative-model-trifecta-three-advances-that-work-towards-harnessing-large-scale-power/).\n\n\n## News\n\nMay 21, 2020: Releasing a [`demo`](http://40.71.23.172:8899/) for latent space manipulation, including sentence interpolation and analogy. Check out the [`website`](http://40.71.23.172:8899/).\n\nMay 20, 2020: The latent space manipulation code is cleaned and released. See instructions at [`optimius_for_snli.md`](doc/optimius_for_snli.md).\n\nMay 13, 2020: The fine-tuning code for langauge modeling is released. See instructions  at [`optimus_finetune_language_models.md`](doc/optimus_finetune_language_models.md)\n\n## Contents\nThere are four steps to use this codebase to reproduce the results in the paper.\n\n1. [Dependencies](#dependencies)\n2. [Prepare datasets](#prepare-datasets)\n3. [Model training](#Model-training)\n    1. Pre-training on setences in Wikipedia\n    2. Languange Modeling\n    3. Guided Language Generation\n    4. Low-resource Language Understanding\n4. [Collect and plot results](#collect-and-plot-results)\n\n\n## Dependencies\n\nPull docker from Docker Hub at: `chunyl/pytorch-transformers:v2`. Please see the instruction at [`doc/env.md`](doc/env.md)\n\nThe project is organized into the following structures, with ensential files \u0026 folders visualized.  `output` saves the models checkpoints.\n```\n├── Optimus\n   └── code\n       ├── examples\n           ├── big_ae\n               ├── modules\n                   ├── vae.py\n                   └── ...\n               ├── run_lm_vae_pretraining_phdist_beta.py\n               ├── run_lm_vae_training.py\n               └── ...\n\t   ├── pytorch_transformers\n               ├── modeling_bert.py\n               ├── modeling_gpt2.py\n               └── ...\n       ├── scripts\n           ├── scripts_docker\n\t   ├── scripts_local\n\t   ├── scripts_philly\n   └── data\n       └── datasets\n           ├── wikipedia_json_64_filtered\n               └── ...\n\t   ├── snli_data\n           └── ...\n   └── output\n       ├── pretrain\n       ├── LM\n       └── ...       \n```\n\n## Prepare Datasets\n\nPlease download or preparation the data via following the instructions at [`data/download_datasets.md`](data/download_datasets.md). \n\n## Model Training\n\n**1. Pre-training on setences in Wikipedia**\n\nWe pre-trained our models on Philly (a Microsoft internal compute cluster), the code is specialized for multi-node multi-GPU compute on this platform. The pre-training main python is [`run_lm_vae_pretraining_phdist_beta.py`](code/examples/big_ae/run_lm_vae_pretraining_phdist_beta.py). You may need to adjust the distributed training scripts. \n\n**2. Languange Modeling**\n\nTo have a fair comparison with existing VAE languange models, we consider a model with latent dimension 32. The pre-trained model is fine-tuned on four commonly datasets for one epoch. Please see the details at [`doc/optimus_finetune_language_models.md`](doc/optimus_finetune_language_models.md)\n\n**3. Guided Language Generation**\n\n\n**Latent Space Manipulation** To ensure good performance, we consider a model with latent dimension 768. The pre-trained model is fine-tuned on SNLI dataset, where sentences show related patterns. Please see the details at \nPlease see the details at [`doc/optimius_for_snli.md`](doc/optimius_for_snli.md)\n\n**4. Low-resource Language Understanding**\n\n## Collect and Plot Results\n\nOnce the networks are trained and the results are saved, we extracted key results using Python script. The results can be plotted using the included IPython notebook `plots/main_plots.ipynb`.\nStart the IPython Notebook server:\n\n```\n$ cd plots\n$ ipython notebook\n```\n\nSelect the `main_plots.ipynb` notebook and execute the included\ncode. Note that without modification, we have copyed our extracted results into the notebook, and script will output figures in the paper. If you've run your own training and wish to plot results, you'll have to organize your results in the same format instead.\n\n\n## Questions?\n\nPlease drop me ([Chunyuan](http://chunyuan.li/)) a line if you have any questions.\n\n\n```\n@inproceedings{li2020_Optimus,\n  title={Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space},\n  author={Li, Chunyuan and Gao, Xiang and Li, Yuan and Li, Xiujun and Peng, Baolin and Zhang, Yizhe and Gao, Jianfeng},\n  booktitle={EMNLP},\n  year={2020}\n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchunyuanli%2Foptimus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchunyuanli%2Foptimus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchunyuanli%2Foptimus/lists"}