{"id":19843155,"url":"https://github.com/jakegrigsby/supersonic","last_synced_at":"2026-04-11T22:50:37.204Z","repository":{"id":37598600,"uuid":"162171372","full_name":"jakegrigsby/supersonic","owner":"jakegrigsby","description":"Multiworker PPO with random network distillation in eager execution Tensorflow","archived":false,"fork":false,"pushed_at":"2023-03-24T22:43:10.000Z","size":40628,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-11T12:11:34.980Z","etag":null,"topics":["deep-reinforcement-learning","ppo","rnd","sonic","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jakegrigsby.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-17T18:10:00.000Z","updated_at":"2019-06-01T21:59:51.000Z","dependencies_parsed_at":"2024-11-12T12:55:49.207Z","dependency_job_id":null,"html_url":"https://github.com/jakegrigsby/supersonic","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jakegrigsby%2Fsupersonic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jakegrigsby%2Fsupersonic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jakegrigsby%2Fsupersonic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jakegrigsby%2Fsupersonic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jakegrigsby","download_url":"https://codeload.github.com/jakegrigsby/supersonic/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241214884,"owners_count":19928383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-reinforcement-learning","ppo","rnd","sonic","tensorflow"],"created_at":"2024-11-12T12:37:45.959Z","updated_at":"2026-04-11T22:50:32.155Z","avatar_url":"https://github.com/jakegrigsby.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Supersonic\n### __Multiworker Deep RL Agent for OpenAI Gym and Gym Retro__\n![Sonic Cover Image](supersonic/data/readme_media/readme_cover.jpg)\nSupersonic is an implementation of __[Proximal Policy Optimization with Random Network Distillation](https://arxiv.org/abs/1810.12894)__, written in eager-execution TensorFlow and with support for multiple workers on high cpu machines or clusters.\n\nTo train an agent on the Sonic level 'Green Hill Zone Act1':\n```shell\nmpiexec -n 32 python train.py --lvl GreenHillZone.Act1\n```\n\n#### Setup and Installation\n1. Clone this repository with `git clone -r https://github.com/jakegrigsby/supersonic.git`\n\n2. Install the package\n```shell\n    cd supersonic\n    pip install -e . \n```\n\n3. Install other dependencies\n```shell\n    pip install -r requirements.txt\n```\n\n5. If you want to train on Sonic, you'll need to buy the ROMs and install them on your system. See the [retro contest details](https://contest.openai.com/2018-1/details/) for more instructions. __After installing the ROMs, you can run the included `./set_up_correct_reward_funcs.sh` to switch out the default reward function for the correct one used by the Retro Contest.__\n\n#### Adding New Environments\nSupersonic can run in any OpenAI Gym or Gym-Retro environment (assuming you have the ROMs) out of the box. However, it's common practice to use 'wrappers' around the environment, which do things like clip the reward, reshape the observation or convert the frames to grayscale. __Defaults are included for [all of the v0 pixel-only atari 2600 environments](https://gym.openai.com/envs/#atari), (84x84 grayscaling, normalizing, frameskipping and 'sticky' actions) as well as all of the Sonic levels.__ (see data/sonic-train.csv and data/sonic-val.csv for a list of those options). Support for consecutive Sonic levels is included (see `environment.Gauntlet` and `environment.greenhillzonecomplete`). Also comes with defaults for [gym-super-mario-bros](https://github.com/Kautenja/gym-super-mario-bros).\n\n\nTo add your own custom wrappers, write a function in `environment.py` that returns the wrapped environment, using any of the wrappers included in that file (or added by you). Then use the `env_builder` decorator with the key for that environment. This will be what you enter from the command line to train on that wrapped environment. Here's an example:\n```python\n@env_builder('VeryCustomEnvironment-v100')\ndef build_myenv(lvl):\n    env = base_env(lvl)\n    env = WarpFrame(env)\n    env = MaxAndSkipEnv(env, skip=4)\n    env = RewardScaler(env)\n    env = StickyActionEnv(env)\n    env = FrameStackWrapper(env)\n    return env\n```\nYou should then be able to train on your environment by running\n```mpiexec -n 4 python train.py --lvl VeryCustomEnvironment-v100```\n\n#### Training Agents\nTraining is launched from the command line using the command:\n```shell\nmpiexec -n *num of workers* python train.py --lvl *env name*\n--logdir *path to write logs* --rollouts *num of rollouts*\n```\nWeights are saved in the `weights` directory under a folder with the same name as the `--logdir` you specify.\n\nAn additional flag, `--render` can be added if you want to watch training live. This is an int that determines how many of the\nparallel environments are rendered. So `mpiexec -n 128 python train.py --render 1` trains with 128 workers but will only render\n1 of them.\n\nAt this time, supersonic can only run multiple workers using the cpu version of TensorFlow. It uses synchronous gradient descent to distribute computation and increase performance.\n\n#### Testing Agents\n```shell\npython test.py --lvl *env name* --weights *path to correct weights dir* --episodes *num of episodes*\n```\nAn example would be `python test.py --lvl GreenHillZone.Act1 --weights GreenHillZoneAct1/checkpoint_9500`. The additional flags `--record` and `--record_path` are a bool and str that let gameplay footage be recorded and saved to the specified directory.\n\n#### References\n\n##### Papers:\nSchulman, John, et al. \"Proximal policy optimization algorithms.\" arXiv preprint arXiv:1707.06347 (2017).\n\nBurda, Yuri, et al. \"Exploration by random network distillation.\" arXiv preprint arXiv:1810.12894 (2018).\n\nChen, Jianmin, et al. \"Revisiting distributed synchronous SGD.\" arXiv preprint arXiv:1604.00981 (2016).\n\n##### Repositories:\n[openai/random-network-distillation](https://github.com/openai/random-network-distillation)\n\n[jcwleo/random-network-distillation-pytorch](https://github.com/jcwleo/random-network-distillation-pytorch)\n\n[openai/spinningup](https://github.com/openai/spinningup)\n\n[openai/baselines](https://github.com/openai/baselines)\n\n\n--------------------------------------------------------------------\n\n_Developed by students at the University of Virginia, 2019._\n\n_[UVA Data Science Institute](https://datascience.virginia.edu)_\n\n_[UVA Advanced Research Computing Services](https://arcs.virginia.edu)_\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjakegrigsby%2Fsupersonic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjakegrigsby%2Fsupersonic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjakegrigsby%2Fsupersonic/lists"}