{"id":21711263,"url":"https://github.com/distributedsystemsgroup/tensorpong","last_synced_at":"2026-04-13T10:31:59.304Z","repository":{"id":85177381,"uuid":"84537074","full_name":"DistributedSystemsGroup/tensorpong","owner":"DistributedSystemsGroup","description":null,"archived":false,"fork":false,"pushed_at":"2017-07-02T11:09:14.000Z","size":37327,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-20T18:17:44.640Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TeX","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DistributedSystemsGroup.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-03-10T08:25:05.000Z","updated_at":"2017-07-02T11:09:15.000Z","dependencies_parsed_at":"2023-03-07T13:00:34.960Z","dependency_job_id":null,"html_url":"https://github.com/DistributedSystemsGroup/tensorpong","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DistributedSystemsGroup/tensorpong","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2Ftensorpong","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2Ftensorpong/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2Ftensorpong/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2Ftensorpong/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DistributedSystemsGroup","download_url":"https://codeload.github.com/DistributedSystemsGroup/tensorpong/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedSystemsGroup%2Ftensorpong/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31748994,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T09:16:15.125Z","status":"ssl_error","status_checked_at":"2026-04-13T09:16:05.023Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-25T23:20:49.634Z","updated_at":"2026-04-13T10:31:59.280Z","avatar_url":"https://github.com/DistributedSystemsGroup.png","language":"TeX","funding_links":[],"categories":[],"sub_categories":[],"readme":"# universe-starter-agent\n\n## Preface\n\nThis repo is a clone of the original [universe-starter-agent](https://github.com/openai/universe-starter-agent) by [openai](https://openai.com/). It has been cloned as a starting point for the Semester Project \"Learning to Play Atari Pong with TensorFlow on OpenAI Universe\" for the Spring 2017 Semester in [Eurecom](http://www.eurecom.fr/en/) being developed by [Daniele Reda](http://www.github.com/rdednl) with professor [Pietro Michiardi](https://github.com/michiard) as supervisor.\n\n## universe-starter-agent\n\nThe codebase implements a starter agent that can solve a number of `universe` environments.\nIt contains a basic implementation of the [A3C algorithm](https://arxiv.org/abs/1602.01783), adapted for real-time environments.\n\n## Dependencies\n\n* Python 2.7 or 3.5\n* [six](https://pypi.python.org/pypi/six) (for py2/3 compatibility)\n* [TensorFlow](https://www.tensorflow.org/) 0.12\n* [tmux](https://tmux.github.io/) (the start script opens up a tmux session with multiple windows)\n* [htop](https://hisham.hm/htop/) (shown in one of the tmux windows)\n* [gym](https://pypi.python.org/pypi/gym)\n* gym[atari]\n* [universe](https://pypi.python.org/pypi/universe)\n* [opencv-python](https://pypi.python.org/pypi/opencv-python)\n* [numpy](https://pypi.python.org/pypi/numpy)\n* [scipy](https://pypi.python.org/pypi/scipy)\n\n## Getting Started\n\n```\nconda create --name universe-starter-agent python=3.5\nsource activate universe-starter-agent\n\nbrew install tmux htop cmake      # On Linux use sudo apt-get install -y tmux htop cmake\n\npip install gym[atari]\npip install universe\npip install six\npip install tensorflow\nconda install -y -c https://conda.binstar.org/menpo opencv3\nconda install -y numpy\nconda install -y scipy\n```\n\n\nAdd the following to your `.bashrc` so that you'll have the correct environment when the `train.py` script spawns new bash shells\n```source activate universe-starter-agent```\n\n### Atari Pong\n\n`python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong`\n\nThe command above will train an agent on Atari Pong using ALE simulator.\nIt will see two workers that will be learning in parallel (`--num-workers` flag) and will output intermediate results into given directory.\n\nThe code will launch the following processes:\n* worker-0 - a process that runs policy gradient\n* worker-1 - a process identical to process-1, that uses different random noise from the environment\n* ps - the parameter server, which synchronizes the parameters among the different workers\n* tb - a tensorboard process for convenient display of the statistics of learning\n\nOnce you start the training process, it will create a tmux session with a window for each of these processes. You can connect to them by typing `tmux a` in the console.\nOnce in the tmux session, you can see all your windows with `ctrl-b w`.\nTo switch to window number 0, type: `ctrl-b 0`. Look up tmux documentation for more commands.\n\nTo access TensorBoard to see various monitoring metrics of the agent, open [http://localhost:12345/](http://localhost:12345/) in a browser.\n\nUsing 16 workers, the agent should be able to solve `PongDeterministic-v3` (not VNC) within 30 minutes (often less) on an `m4.10xlarge` instance.\nUsing 32 workers, the agent is able to solve the same environment in 10 minutes on an `m4.16xlarge` instance.\nIf you run this experiment on a high-end MacBook Pro, the above job will take just under 2 hours to solve Pong.\n\nAdd '--visualise' toggle if you want to visualise the worker using env.render() as follows:\n\n`python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise`\n\n![pong](https://github.com/openai/universe-starter-agent/raw/master/imgs/tb_pong.png \"Pong\")\n\nFor best performance, it is recommended for the number of workers to not exceed available number of CPU cores.\n\nYou can stop the experiment with `tmux kill-session` command.\n\n### Playing games over remote desktop\n\nThe main difference with the previous experiment is that now we are going to play the game through VNC protocol.\nThe VNC environments are hosted on the EC2 cloud and have an interface that's different from a conventional Atari Gym\nenvironment;  luckily, with the help of several wrappers (which are used within `envs.py` file)\nthe experience should be similar to the agent as if it was played locally. The problem itself is more difficult\nbecause the observations and actions are delayed due to the latency induced by the network.\n\nMore interestingly, you can also peek at what the agent is doing with a VNCViewer.\n\nNote that the default behavior of `train.py` is to start the remotes on a local machine. Take a look at https://github.com/openai/universe/blob/master/doc/remotes.rst for documentation on managing your remotes. Pass additional `-r` flag to point to pre-existing instances.\n\n### VNC Pong\n\n`python train.py --num-workers 2 --env-id gym-core.PongDeterministic-v3 --log-dir /tmp/vncpong`\n\n_Peeking into the agent's environment with TurboVNC_\n\nYou can use your system viewer as `open vnc://localhost:5900` (or `open vnc://${docker_ip}:5900`) or connect TurboVNC to that ip/port.\nVNC password is `\"openai\"`.\n\n![pong](https://github.com/openai/universe-starter-agent/raw/master/imgs/vnc_pong.png \"Pong over VNC\")\n\n##### Important caveats\n\nOne of the novel challenges in using Universe environments is that\nthey operate in *real time*, and in addition, it takes time for the\nenvironment to transmit the observation to the agent.  This time\ncreates a lag: where the greater the lag, the harder it is to solve\nenvironment with today's RL algorithms.  Thus, to get the best\npossible results it is necessary to reduce the lag, which can be\nachieved by having both the environments and the agent live\non the same high-speed computer network.  So for example, if you have\na fast local network, you could host the environments on one set of\nmachines, and the agent on another machine that can speak to the\nenvironments with low latency.  Alternatively, you can run the\nenvironments and the agent on the same EC2/Azure region.  Other\nconfigurations tend to have greater lag.\n\nTo keep track of your lag, look for the phrase `reaction_time` in\nstderr.  If you run both the agent and the environment on nearby\nmachines on the cloud, your `reaction_time` should be as low as 40ms.\nThe `reaction_time` statistic is printed to stderr because we wrap our\nenvironment with the `Logger` wrapper, as done in\n[here](\u003chttps://github.com/openai/universe-starter-agent/blob/master/envs.py#L32\u003e).\n\nGenerally speaking, environments that are most affected by lag are\ngames that place a lot of emphasis on reaction time.  For example,\nthis agent is able to solve VNC Pong\n(`gym-core.PongDeterministic-v3`) in under 2 hours when both the agent\nand the environment are co-located on the cloud, but this agent had\ndifficulty solving VNC Pong when the environment was on the cloud\nwhile the agent was not.  This issue affects environments that place\ngreat emphasis on reaction time.\n\n#### A note on tuning\n\nThis implementation has been tuned to do well on VNC Pong, and we do not guarantee\nits performance on other tasks.  It is meant as a starting point.\n\n#### Playing flash games\n\nYou may run the following command to launch the agent on the game Neon Race:\n\n`python train.py --num-workers 2 --env-id flashgames.NeonRace-v0 --log-dir /tmp/neonrace`\n\n_What agent sees when playing Neon Race_\n(you can connect to this view via [note](#vnc-pong) above)\n![neon](https://github.com/openai/universe-starter-agent/raw/master/imgs/neon_race.png \"Neon Race\")\n\nGetting 80% of the maximal score takes between 1 and 2 hours with 16 workers, and getting to 100% of the score\ntakes about 12 hours.  Also, flash games are run at 5fps by default, so it should be possible to productively\nuse 16 workers on a machine with 8 (and possibly even 4) cores.\n\n#### Next steps\n\nNow that you have seen an example agent, develop agents of your own.  We hope that you will find\ndoing so to be an exciting and an enjoyable task.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdistributedsystemsgroup%2Ftensorpong","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdistributedsystemsgroup%2Ftensorpong","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdistributedsystemsgroup%2Ftensorpong/lists"}