{"id":20435479,"url":"https://github.com/jacksonchen1998/cold-start-reinforcement-learning-with-softmax-policy-gradient","last_synced_at":"2026-04-15T21:32:29.067Z","repository":{"id":171693326,"uuid":"648272358","full_name":"jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient","owner":"jacksonchen1998","description":"Unofficial implementation code","archived":false,"fork":false,"pushed_at":"2023-07-06T05:38:51.000Z","size":271,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-05T06:43:12.309Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jacksonchen1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-01T15:29:57.000Z","updated_at":"2023-08-31T04:31:30.000Z","dependencies_parsed_at":"2023-07-27T17:16:49.443Z","dependency_job_id":null,"html_url":"https://github.com/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient","commit_stats":null,"previous_names":["jacksonchen1998/cold-start-reinforcement-learning-with-softmax-policy-gradient"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FCold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FCold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FCold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FCold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jacksonchen1998","download_url":"https://codeload.github.com/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jacksonchen1998%2FCold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31861383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"ssl_error","status_checked_at":"2026-04-15T15:24:39.138Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T08:34:44.244Z","updated_at":"2026-04-15T21:32:29.051Z","avatar_url":"https://github.com/jacksonchen1998.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient\n\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Release](https://img.shields.io/github/v/release/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient)](https://github.com/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/releases/)\n\n[Paper](https://arxiv.org/abs/1709.09346)\n\nThis repository contains an implementation of the reinforcement learning method described in the paper \"Cold-Start Reinforcement Learning with Softmax Policy Gradient\" by Nan Ding and Radu Soricut from Google Inc. The method is based on a softmax value function that eliminates the need for warm-start training and sample variance reduction during policy updates.\n\n## Method\n\n[RNN Encoder Decoder](https://github.com/bentrevett/pytorch-seq2seq/blob/master/3%20-%20Neural%20Machine%20Translation%20by%20Jointly%20Learning%20to%20Align%20and%20Translate.ipynb)\n\n![model](./image/alg.png)\n\n## Requirements\n\nCreate a conda environment using the following command:\n\n```bash\nconda create -n \u003cenv_name\u003e python=3.9\n```\n\nIntsall the required packages using the following command:\n\n```bash\nconda install --file requirements.txt\n```\n\n## Program issues\n\nIn `pipeline.py`, change the following line if has an error:\n\n```\nAssertionError: Torch not compiled with CUDA enabled\n```\n\nChange\n\n```py\nz = torch.cat([z, zt_idx.cuda()[None]], dim=0) # (T, B) token id\n```\n\nto\n\n```py\nz = torch.cat([z, zt_idx[None]], dim=0) # (T, B) token id\n```\n\n## Experiment\n\n### Summarization Task: Headline Generation\n\nDataset:\n- Training: [English Gigaword](https://catalog.ldc.upenn.edu/LDC2003T05)\n- Testing: [DUC 2004](https://duc.nist.gov/duc2004/)\n\nEvaluation: \n[ROUGE-L score](https://arxiv.org/abs/1803.01937)\n\n### Automatic Image-Caption Generation\n\nDataset:\n- Training / Validation: [Microsoft COCO](https://cocodataset.org/#home)\n- Testing: [Microsoft COCO](https://cocodataset.org/#home)\n\nEvaluation: \n[CIDer score](https://arxiv.org/abs/1411.5726) / ROUGE-L score\n\n## Results\n\n### Model loss\n\n![loss](./image/loss.png)\n\n### Model reward (ROUGE-L score)\n\n![reward](./image/reward.png)\n\n## Acknowledgements\n\nWe would like to thank Nan Ding and Radu Soricut for their valuable contributions to the field of reinforcement learning, and for making their paper available to the public. We also acknowledge the TensorFlow team for providing a powerful and flexible deep learning framework.\n\n## Contributors\n\n\u003ca href=\"https://github.com/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient/graphs/contributors\"\u003e\n  \u003cimg src=\"http://contributors.nn.ci/api?repo=jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient\" /\u003e\n\u003c/a\u003e\n\n## Citation\n\n```\n@misc{20230615,\n  author = {Chih-Chun Chen and Pin-Yen Liu and Po-Chuan Chen},\n  title = {Cold-Start Reinforcement Learning with Softmax Policy Gradient},\n  year = {2023},\n  month = {06},\n  note = {Version 1.0},\n  howpublished = {GitHub},\n  url = {https://github.com/jacksonchen1998/Cold-Start-Reinforcement-Learning-with-Softmax-Policy-Gradient}\n}\n```\n\n```\n@misc{ding2017coldstart,\n      title={Cold-Start Reinforcement Learning with Softmax Policy Gradient}, \n      author={Nan Ding and Radu Soricut},\n      year={2017},\n      eprint={1709.09346},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonchen1998%2Fcold-start-reinforcement-learning-with-softmax-policy-gradient","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacksonchen1998%2Fcold-start-reinforcement-learning-with-softmax-policy-gradient","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonchen1998%2Fcold-start-reinforcement-learning-with-softmax-policy-gradient/lists"}