{"id":15898740,"url":"https://github.com/sintefneodroid/agent","last_synced_at":"2025-09-10T18:37:42.553Z","repository":{"id":41329897,"uuid":"98278680","full_name":"sintefneodroid/agent","owner":"sintefneodroid","description":"Examples of agents for neodroid environments 💡","archived":false,"fork":false,"pushed_at":"2025-02-03T20:21:40.000Z","size":23171,"stargazers_count":10,"open_issues_count":15,"forks_count":6,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-03-01T00:36:24.771Z","etag":null,"topics":["agent","deep-learning","dqn","droid","estimate","game-development","hacktoberfest","imitation-learning","machine-learning","ml","neo","neodroid","obstructions","ppo","python","pytorch","reinforcement-learning","rl","unity","unity3d"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/Neodroid/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sintefneodroid.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["cnheider"],"patreon":"cnheider","open_collective":"cnheider","ko_fi":"cnheider","custom":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null}},"created_at":"2017-07-25T07:39:43.000Z","updated_at":"2025-01-18T10:43:18.000Z","dependencies_parsed_at":"2023-12-21T18:27:08.445Z","dependency_job_id":"8bef9c3b-34ea-4909-8a21-e1e357617d75","html_url":"https://github.com/sintefneodroid/agent","commit_stats":{"total_commits":200,"total_committers":7,"mean_commits":"28.571428571428573","dds":"0.17500000000000004","last_synced_commit":"21e3564696062b67151b013fd5e47df46cf44aa5"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintefneodroid%2Fagent","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintefneodroid%2Fagent/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintefneodroid%2Fagent/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sintefneodroid%2Fagent/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sintefneodroid","download_url":"https://codeload.github.com/sintefneodroid/agent/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244077836,"owners_count":20394353,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","deep-learning","dqn","droid","estimate","game-development","hacktoberfest","imitation-learning","machine-learning","ml","neo","neodroid","obstructions","ppo","python","pytorch","reinforcement-learning","rl","unity","unity3d"],"created_at":"2024-10-06T10:08:22.124Z","updated_at":"2025-03-20T17:30:32.867Z","avatar_url":"https://github.com/sintefneodroid.png","language":"Python","readme":"\u003c!--![header](.github/images/header.png)--\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/images/header.png\" alt='header' /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003eAgent\u003c/h1\u003e\n\n\u003c!--# Agent--\u003e\n\nThis repository will host all initial machine learning efforts applying the [Neodroid](https://github.com/sintefneodroid/) platform.\n\n---\n\n_[Neodroid](https://github.com/sintefneodroid) is developed with support from Research Council of Norway Grant #262900. ([https://www.forskningsradet.no/prosjektbanken/#/project/NFR/262900](https://www.forskningsradet.no/prosjektbanken/#/project/NFR/262900))_\n\n---\n\n| [![Build Status](https://travis-ci.org/sintefneodroid/agent.svg?branch=master)](https://travis-ci.org/sintefneodroid/agent)  | [![Coverage Status](https://coveralls.io/repos/github/sintefneodroid/agent/badge.svg?branch=master)](https://coveralls.io/github/sintefneodroid/agent?branch=master)  | [![GitHub Issues](https://img.shields.io/github/issues/sintefneodroid/agent.svg?style=flat)](https://github.com/sintefneodroid/agent/issues)  |  [![GitHub Forks](https://img.shields.io/github/forks/sintefneodroid/agent.svg?style=flat)](https://github.com/sintefneodroid/agent/network) | [![GitHub Stars](https://img.shields.io/github/stars/sintefneodroid/agent.svg?style=flat)](https://github.com/sintefneodroid/agent/stargazers) |[![GitHub License](https://img.shields.io/github/license/sintefneodroid/agent.svg?style=flat)](https://github.com/sintefneodroid/agent/blob/master/LICENSE.md) |\n|---|---|---|---|---|---|\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca href=\"https://www.python.org/\"\u003e\n\u003cimg alt=\"python\" src=\".github/images/python.svg\" height=\"40\" align=\"left\"\u003e\n\u003c/a\u003e\n\u003ca href=\"https://opencv.org/\" style=\"float:center;\"\u003e\n\u003cimg alt=\"opencv\" src=\".github/images/opencv.svg\" height=\"40\" align=\"center\"\u003e\n\u003c/a\u003e\n\u003ca href=\"http://pytorch.org/\"style=\"float: right;\"\u003e\n\u003cimg alt=\"pytorch\" src=\".github/images/pytorch.svg\" height=\"40\" align=\"right\" \u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003ca href=\"http://www.numpy.org/\"\u003e\n\u003cimg alt=\"numpy\" src=\".github/images/numpy.svg\" height=\"40\" align=\"left\"\u003e\n\u003c/a\u003e\n\u003ca href=\"https://github.com/tqdm/tqdm\" style=\"float:center;\"\u003e\n\u003cimg alt=\"tqdm\" src=\".github/images/tqdm.gif\" height=\"40\" align=\"center\"\u003e\n\u003c/a\u003e\n\u003ca href=\"https://matplotlib.org/\" style=\"float: right;\"\u003e\n\u003cimg alt=\"matplotlib\" src=\".github/images/matplotlib.svg\" height=\"40\" align=\"right\" /\u003e\n\u003c/a\u003e\n\u003c/p\u003e\n\n# Contents Of This Readme\n\n- [Algorithms](#algorithms)\n- [Requirements](#requirements)\n- [Usage](#usage)\n- [Results](#results)\n    - [Target Point Estimator](#target-point-estimator)\n    - [Perfect Information Navigator](#perfect-information-navigator)\n- [Contributing](#contributing)\n- [Other Components](#other-components-of-the-neodroid-platform)\n\n# Algorithms\n\n- [REINFORCE (PG)](agent/agents/model_free/policy_optimisation/pg_agent.py)\n- [DQN](agent/agents/model_free/q_learning/dqn_agent.py)\n- [DDPG](agent/agents/model_free/hybrid/ddpg_agent.py)\n- [PPO](agent/agents/model_free/hybrid/ppo_agent.py)\n- TRPO, GA, EVO, IMITATION...\n\n## **Algorithms Implemented**\n\n1. *Deep Q Learning (DQN)* \u003csub\u003e\u003csup\u003e ([Mnih et al. 2013](https://arxiv.org/pdf/1312.5602.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DQN with Fixed Q Targets* \u003csub\u003e\u003csup\u003e ([Mnih et al. 2013](https://arxiv.org/pdf/1312.5602.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Double DQN (DDQN)* \u003csub\u003e\u003csup\u003e ([Hado van Hasselt et al. 2015](https://arxiv.org/pdf/1509.06461.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DDQN with Prioritised Experience Replay* \u003csub\u003e\u003csup\u003e ([Schaul et al. 2016](https://arxiv.org/pdf/1511.05952.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Dueling DDQN* \u003csub\u003e\u003csup\u003e ([Wang et al. 2016](http://proceedings.mlr.press/v48/wangf16.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *REINFORCE* \u003csub\u003e\u003csup\u003e ([Williams et al. 1992](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Deep Deterministic Policy Gradients (DDPG)* \u003csub\u003e\u003csup\u003e ([Lillicrap et al. 2016](https://arxiv.org/pdf/1509.02971.pdf) ) \u003c/sup\u003e\u003c/sub\u003e\n1. *Twin Delayed Deep Deterministic Policy Gradients (TD3)* \u003csub\u003e\u003csup\u003e ([Fujimoto et al. 2018](https://arxiv.org/abs/1802.09477)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Soft Actor-Critic (SAC \u0026 SAC-Discrete)* \u003csub\u003e\u003csup\u003e ([Haarnoja et al. 2018](https://arxiv.org/pdf/1812.05905.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Asynchronous Advantage Actor Critic (A3C)* \u003csub\u003e\u003csup\u003e ([Mnih et al. 2016](https://arxiv.org/pdf/1602.01783.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Syncrhonous Advantage Actor Critic (A2C)*\n1. *Proximal Policy Optimisation (PPO)* \u003csub\u003e\u003csup\u003e ([Schulman et al. 2017](https://openai-public.s3-us-west-2.amazonaws.com/blog/2017-07/ppo/ppo-arxiv.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DQN with Hindsight Experience Replay (DQN-HER)* \u003csub\u003e\u003csup\u003e ([Andrychowicz et al. 2018](https://arxiv.org/pdf/1707.01495.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *DDPG with Hindsight Experience Replay (DDPG-HER)* \u003csub\u003e\u003csup\u003e ([Andrychowicz et al. 2018](https://arxiv.org/pdf/1707.01495.pdf) ) \u003c/sup\u003e\u003c/sub\u003e\n1. *Hierarchical-DQN (h-DQN)* \u003csub\u003e\u003csup\u003e ([Kulkarni et al. 2016](https://arxiv.org/pdf/1604.06057.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Stochastic NNs for Hierarchical Reinforcement Learning (SNN-HRL)* \u003csub\u003e\u003csup\u003e ([Florensa et al. 2017](https://arxiv.org/pdf/1704.03012.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Diversity Is All You Need (DIAYN)* \u003csub\u003e\u003csup\u003e ([Eyensbach et al. 2018](https://arxiv.org/pdf/1802.06070.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n\n## **Environments Implemented**\n\n1. *Bit Flipping Game* \u003csub\u003e\u003csup\u003e (as described in [Andrychowicz et al. 2018](https://arxiv.org/pdf/1707.01495.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Four Rooms Game* \u003csub\u003e\u003csup\u003e (as described in [Sutton et al. 1998](http://www-anw.cs.umass.edu/~barto/courses/cs687/Sutton-Precup-Singh-AIJ99.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Long Corridor Game* \u003csub\u003e\u003csup\u003e (as described in [Kulkarni et al. 2016](https://arxiv.org/pdf/1604.06057.pdf)) \u003c/sup\u003e\u003c/sub\u003e\n1. *Ant-{Maze, Push, Fall}* \u003csub\u003e\u003csup\u003e (as desribed in [Nachum et al. 2018](https://arxiv.org/pdf/1805.08296.pdf) and their accompanying [code](https://github.com/tensorflow/models/tree/master/research/efficient-hrl)) \u003c/sup\u003e\u003c/sub\u003e\n\n# Requirements\n\n- pytorch\n- tqdm\n- Pillow\n- numpy\n- matplotlib\n- torchvision\n- torch\n- Neodroid\n- pynput\n\n(Optional)\n\n- visdom\n- gym\n\nTo install these use the command:\n\n````bash\npip3 install -r requirements.txt\n````\n\n# Usage\n\nExport python path to the repo root so we can use the utilities module\n\n````bash\nexport PYTHONPATH=/path-to-repo/\n````\n\nFor training a agent use:\n\n````bash\npython3 procedures/train_agent.py\n````\n\nFor testing a trained agent use:\n\n````bash\npython3 procedures/test_agent.py\n````\n\n# Results\n\n## Target Point Estimator\n\nUsing Depth, Segmentation And RGB images to estimate the location of target point in an environment.\n\n### [REINFORCE (PG)](agent/agents/model_free/policy_optimisation/pg_agent.py)\n\n### [DQN](agent/agents/model_free/q_learning/dqn_agent.py)\n\n### [DDPG](agent/agents/model_free/hybrid/ddpg_agent.py)\n\n### [PPO](agent/agents/model_free/hybrid/ppo_agent.py)\n\n### GA, EVO, IMITATION...\n\n## Perfect Information Navigator\n\nHas access to perfect location information about the obstructions and target in the environment, the objective is to navigate to the target with colliding with the obstructions.\n\n### [REINFORCE (PG)](agent/agents/model_free/policy_optimisation/pg_agent.py)\n\n### [DQN](agent/agents/model_free/q_learning/dqn_agent.py)\n\n### [DDPG](agent/agents/model_free/hybrid/ddpg_agent.py)\n\n### [PPO](agent/agents/model_free/hybrid/ppo_agent.py)\n\n### GA, EVO, IMITATION...\n\n# Contributing\n\nSee guidelines for contributing [here](.github/CONTRIBUTING.md).\n\n# Licensing\n\nThis project is licensed under the Apache V2 License. See [LICENSE](LICENSE.md) for more information.\n\n# Citation\n\nFor citation you may use the following bibtex entry:\n\n````\n@misc{neodroid-agent,\n  author = {Heider, Christian},\n  title = {Neodroid Platform Agents},\n  year = {2018},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/sintefneodroid/agent}},\n}\n````\n\n# Other Components Of the Neodroid Platform\n\n- [neo](https://github.com/sintefneodroid/neo)\n- [droid](https://github.com/sintefneodroid/droid)\n\n# Authors\n\n* **Christian Heider Nielsen** - [cnheider](https://github.com/cnheider)\n\nHere other [contributors](https://github.com/sintefneodroid/agent/contributors) to this project are listed.\n","funding_links":["https://github.com/sponsors/cnheider","https://patreon.com/cnheider","https://opencollective.com/cnheider","https://ko-fi.com/cnheider"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsintefneodroid%2Fagent","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsintefneodroid%2Fagent","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsintefneodroid%2Fagent/lists"}