{"id":21989744,"url":"https://github.com/iandanforth/deeprl-nav","last_synced_at":"2026-04-28T09:36:25.423Z","repository":{"id":142059085,"uuid":"146319088","full_name":"iandanforth/deeprl-nav","owner":"iandanforth","description":"Deep Reinforcement Learning Nanodegree Navigation Project","archived":false,"fork":false,"pushed_at":"2019-03-17T16:49:02.000Z","size":489,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-04-03T04:16:47.848Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iandanforth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-08-27T15:43:57.000Z","updated_at":"2019-03-17T16:49:03.000Z","dependencies_parsed_at":null,"dependency_job_id":"820f04a4-771f-4959-a9a2-c39571100baa","html_url":"https://github.com/iandanforth/deeprl-nav","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/iandanforth/deeprl-nav","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iandanforth%2Fdeeprl-nav","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iandanforth%2Fdeeprl-nav/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iandanforth%2Fdeeprl-nav/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iandanforth%2Fdeeprl-nav/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iandanforth","download_url":"https://codeload.github.com/iandanforth/deeprl-nav/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iandanforth%2Fdeeprl-nav/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32375619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T09:24:15.638Z","status":"ssl_error","status_checked_at":"2026-04-28T09:24:15.071Z","response_time":56,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-29T19:32:48.960Z","updated_at":"2026-04-28T09:36:25.402Z","avatar_url":"https://github.com/iandanforth.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Project 1: Navigation\n\n\u003cp align=center\u003e\n\t\u003cimg width=80% src=\"images/hero.png\"/\u003e\n\u003c/p\u003e\n\n### Introduction\n\nThis is the first Unity based project in the Udacity Deep Reinforcement Learning Nanodegree.\n\n\nIn this project we trained a DQN reinforcement learning agent to reach a score of +13 on \naverage over 100 episodes in the Udacity Deep Reinforcement Learing Nanodegree Bananas \nenvironment. (A simplified version of the [Banana Collectors Unity-ML environment](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Learning-Environment-Examples.md#banana-collector).\n\nIn this environment positive reward is accumulated by running into yellow \"good\" bananas and \navoiding blue \"bad\" bananas which return -1 reward. An episode ends after a fixed interval of 300 \nsteps.\n\n### Report\n\nIn addition to adapting provided code to reach this score we contribute two useful components. \nThe first is a [simple wrapper class](peel.py) for the provided Unity environment which makes it \ndirectly compatible with the existing class DQN code which was designed for an OpenAI Gym \ninterface.\n\nThe second and more important contribution is to establish human baselines for this environment. \nFinally we propose an simple alternate measure for declaring this environment \"solved\" which \nbetter measures the ability of an agent.\n\nFor details please read the [full report](Report.md).\n\n#### Environment Description\n\n(Text slightly modified from the official course description)\n\nA reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided \nfor collecting a blue banana.  Thus, the goal of your agent is to collect as many yellow \nbananas as possible while avoiding blue bananas.  \n\nThe state space has 37 dimensions and contains the agent's velocity, along with ray-based \nperception of objects around agent's forward direction.  \n\nRay Perception (35)\n\n7 rays projecting from the agent at the following angles (and returned in this order):\n\n[20, 90, 160, 45, 135, 70, 110] # 90 is directly in front of the agent\n\nRay (5)\n\nEach ray is projected into the scene. If it encounters one of four detectable objects\nthe value at that position in the array is set to 1. Finally there is a distance measure\nwhich is a fraction of the ray length.\n\n[Banana, Wall, BadBanana, Agent, Distance]\n\nexample\n\n[0, 1, 1, 0, 0.2]\n\nThere is a BadBanana detected 20% of the way along the ray and a wall behind it.\n\nVelocity of Agent (2)\n\n- Left/right velocity (usually near 0)\n- Forward/backward velocity (0-11.2)\n\nGiven this information, the agent has to learn how to best select actions.  Four discrete actions \nare available, corresponding to:\n- **`0`** - move forward.\n- **`1`** - move backward.\n- **`2`** - turn left.\n- **`3`** - turn right.\n\nThe task is episodic, and in order to solve the environment, your agent must get an average score \nof +13 over 100 consecutive episodes. We discuss the utility of this metric in our report.\n\n### Getting Started\n\n#### Installation\n\nThis project has numerous dependencies and assumes you have a working environment according to the\nUdacity Deep Reinforcement Learning Nanodegree instructions. If not:\n\n[Install Dependencies Now](https://github.com/udacity/deep-reinforcement-learning#dependencies)\n\nAll code for this project is executed from the command line so you can skip Jupyter setup if you'd like.\n\nNext follow the instructions from the [Navigation Project README](https://github.com/udacity/deep-reinforcement-learning/blob/master/p1_navigation/README.md) which we have partially copied below.\n\n1. Download the environment from one of the links below.  You need only select the environment that matches your operating system:\n    - Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip)\n    - Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip)\n    - Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip)\n    - Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip)\n    \n    (_For Windows users_) Check out [this link](https://support.microsoft.com/en-us/help/827218/how-to-determine-whether-a-computer-is-running-a-32-bit-version-or-64) if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.\n\n    (_For AWS_) If you'd like to train the agent on AWS (and have not [enabled a virtual screen](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-on-Amazon-Web-Service.md)), then please use [this link](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux_NoVis.zip) to obtain the environment.\n\n2. Place the file in the `unity/` directory of this repository and unzip (or decompress) the file.\n\n__PLEASE NOTE__ : While we are confident that this can be made to work under other environments\nthis specific instruction was only tested under Windows 10 as it is the only local CUDA capable\nmachine available to us.\n\n### Instructions\n\nThe following assume you have a properly installed environment (e.g. a conda env) and are\nrunning these commands from a command line where that environment has been activated.\n\n#### Freeplay\n\nIf you're on windows you can play the environment yourself by running the freeplay script.\n\n`python freeplay.py`\n\n#### Review\n\nTo watch a pre-trained agent perform run the review script.\n\n`python review.py checkpoints\\checkpoint-454.pth --graphics`\n\nLeave off `--graphics` to run in headless mode and speed up the review.\n\n#### Train\n\nTo retrain an agent from scratch using the provided code run the train script.\n\n`python train.py`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiandanforth%2Fdeeprl-nav","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiandanforth%2Fdeeprl-nav","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiandanforth%2Fdeeprl-nav/lists"}