{"id":13526085,"url":"https://github.com/pemami4911/neural-combinatorial-rl-pytorch","last_synced_at":"2025-04-01T06:31:05.835Z","repository":{"id":44395350,"uuid":"94898419","full_name":"pemami4911/neural-combinatorial-rl-pytorch","owner":"pemami4911","description":"PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning https://arxiv.org/abs/1611.09940","archived":false,"fork":false,"pushed_at":"2018-05-29T13:40:08.000Z","size":348,"stargazers_count":558,"open_issues_count":11,"forks_count":140,"subscribers_count":19,"default_branch":"master","last_synced_at":"2024-11-02T10:34:10.363Z","etag":null,"topics":["neural-combinatorial-optimization","pytorch","reinforcement-learning","seq2seq"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pemami4911.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-20T14:10:01.000Z","updated_at":"2024-10-31T01:50:15.000Z","dependencies_parsed_at":"2022-07-15T03:16:53.539Z","dependency_job_id":null,"html_url":"https://github.com/pemami4911/neural-combinatorial-rl-pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pemami4911%2Fneural-combinatorial-rl-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pemami4911%2Fneural-combinatorial-rl-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pemami4911%2Fneural-combinatorial-rl-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pemami4911%2Fneural-combinatorial-rl-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pemami4911","download_url":"https://codeload.github.com/pemami4911/neural-combinatorial-rl-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246596733,"owners_count":20802885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["neural-combinatorial-optimization","pytorch","reinforcement-learning","seq2seq"],"created_at":"2024-08-01T06:01:25.085Z","updated_at":"2025-04-01T06:31:05.494Z","avatar_url":"https://github.com/pemami4911.png","language":"Python","funding_links":[],"categories":["Paper Implementations","Paper implementations｜论文实现","Paper implementations"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"readme":"# neural-combinatorial-rl-pytorch\n\nPyTorch implementation of [Neural Combinatorial Optimization with Reinforcement Learning](https://arxiv.org/abs/1611.09940). \n\nI have implemented the basic RL pretraining model with greedy decoding from the paper. An implementation of the supervised learning baseline model is available [here](https://github.com/pemami4911/neural-combinatorial-rl-tensorflow). Instead of a critic network, I got my results below on TSP from using an exponential moving average critic. The critic network is simply commented out in my code right now. From correspondence with a few others, it was determined that the exponential moving average critic significantly helped improve results. \n\nMy implementation uses a stochastic decoding policy in the pointer network, realized via PyTorch's `torch.multinomial()`, during training, and beam search (**not yet finished**, only supports 1 beam a.k.a. greedy) for decoding when testing the model. \n\nCurrently, there is support for a sorting task and the planar symmetric Euclidean TSP.\n\nSee `main.sh` for an example of how to run the code.\n\nUse the `--load_path $LOAD_PATH` and `--is_train False` flags to load a saved model.\n\nTo load a saved model and view the pointer network's attention layer, also use the `--plot_attention True` flag.\n\nPlease, feel free to notify me if you encounter any errors, or if you'd like to submit a pull request to improve this implementation.\n\n## Adding other tasks\n\nThis implementation can be extended to support other combinatorial optimization problems. See `sorting_task.py` and `tsp_task.py` for examples on how to add. The key thing is to provide a dataset class and a reward function that takes in a sample solution, selected by the pointer network from the input, and returns a scalar reward. For the sorting task, the agent received a reward proportional to the length of the longest strictly increasing subsequence in the decoded output (e.g., `[1, 3, 5, 2, 4] -\u003e 3/5 = 0.6`).\n\n## Dependencies\n\n* Python=3.6 (should be OK with v \u003e= 3.4)\n* PyTorch=0.2 and 0.3\n* tqdm\n* matplotlib\n* [tensorboard_logger](https://github.com/TeamHG-Memex/tensorboard_logger)\n\nPyTorch 0.4 compatibility is available on branch `pytorch-0.4`.\n\n## TSP Results\n\nResults for 1 random seed over 50 epochs (each epoch is 10,000 batches of size 128). After each epoch, I validated performance on 1000 held out graphs. I used the same hyperparameters from the paper, as can be seen in `main.sh`. The dashed line shows the value indicated in Table 2 of Bello, et. al for comparison. The log scale x axis for the training reward is used to show how the tour length drops early on.\n\n![TSP 20 Train](img/tsp_20_train_reward.png)\n![TSP 20 Val](img/tsp_20_val_reward.png)\n![TSP 50 Train](img/tsp_50_train_reward.png)\n![TSP 50 Val](img/tsp_50_val_reward.png)\n\n## Sort Results\n\nI trained a model on `sort10` for 4 epochs of 1,000,000 randomly generated samples. I tested it on a dataset of size 10,000. Then, I tested the same model on `sort15` and `sort20` to test the generalization capabilities.\n\nTest results on 10,000 samples (A reward of 1.0 means the network perfectly sorted the input): \n\n| task | average reward | variance | \n|---|---|---|\n| sort10 | 0.9966 | 0.0005 |\n| sort15 | 0.7484 | 0.0177 |\n| sort20 | 0.5586 | 0.0060 | \n\n\nExample prediction on `sort10`: \n\n```\ninput: [4, 7, 5, 0, 3, 2, 6, 8, 9, 1]\noutput: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n```\n\n### Attention visualization\n\nPlot the pointer network's attention layer with the argument `--plot_attention True`\n\n## TODO\n\n* [ ] Add RL pretraining-Sampling\n* [ ] Add RL pretraining-Active Search\n* [ ] Active Search\n* [ ] Asynchronous training a la A3C\n* [X] Refactor `USE_CUDA` variable\n* [ ] Finish implementing beam search decoding to support \u003e 1 beam\n* [ ] Add support for variable length inputs\n\n## Acknowledgements\n\nSpecial thanks to the repos [devsisters/neural-combinatorial-rl-tensorflow](https://github.com/devsisters/neural-combinatorial-rl-tensorflow) and [MaximumEntropy/Seq2Seq-PyTorch](https://github.com/MaximumEntropy/Seq2Seq-PyTorch) for getting me started, and @ricgama for figuring out that weird bug with `clone()`\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpemami4911%2Fneural-combinatorial-rl-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpemami4911%2Fneural-combinatorial-rl-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpemami4911%2Fneural-combinatorial-rl-pytorch/lists"}