{"id":26658277,"url":"https://github.com/ltbringer/drlnd_navigation_project","last_synced_at":"2025-03-25T09:19:12.666Z","repository":{"id":37214531,"uuid":"154420242","full_name":"ltbringer/DRLND_Navigation_Project","owner":"ltbringer","description":"Deep Q learning implementation to solve the navigation project","archived":false,"fork":false,"pushed_at":"2023-02-15T21:31:27.000Z","size":104990,"stargazers_count":2,"open_issues_count":13,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2023-03-03T00:07:31.201Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ltbringer.png","metadata":{"files":{"readme":"Readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-10-24T01:32:03.000Z","updated_at":"2022-02-10T09:08:03.000Z","dependencies_parsed_at":"2023-01-31T04:16:48.183Z","dependency_job_id":null,"html_url":"https://github.com/ltbringer/DRLND_Navigation_Project","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltbringer%2FDRLND_Navigation_Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltbringer%2FDRLND_Navigation_Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltbringer%2FDRLND_Navigation_Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ltbringer%2FDRLND_Navigation_Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ltbringer","download_url":"https://codeload.github.com/ltbringer/DRLND_Navigation_Project/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245431684,"owners_count":20614184,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-25T09:19:11.877Z","updated_at":"2025-03-25T09:19:12.659Z","avatar_url":"https://github.com/ltbringer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DRLND Navigation Project\n\n## Install\n1. Unzip the environment for your machine:\n    - Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip)\n    - Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip)\n    - Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip)\n    - Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip)\n2. Create a virtual environment:\n    - `virtualenv -p /usr/bin/python\u003cversion\u003e \u003cproject\u003e`\n    - `conda create -n \u003cproject\u003e python=\u003cversion\u003e`\n3. Install dependencies `$ pip install -r requirements.txt`\n\n## Objective\n\n![banana_env_gif](https://github.com/AmreshVenugopal/DRLND_Navigation_Project/blob/master/banana.gif?raw=true?raw=true \"Banana environment\")\n\nA reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.\n\nThe state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around the agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to:\n\n- 0 - move forward.\n- 1 - move backward.\n- 2 - turn left.\n- 3 - turn right.\n\nThe task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.\n\n## Usage\n\n### Simple Training Example\n```\n$ python main.py --env-path=/path/to/unzipped_banana_env\n```\n\n### Simple Test Example\n\nThis repo contains a `checkpoint.pth` which contains weights\nthat can be loaded right into the model like so:\n```\n$ python play.py --env-path=/path/to/unzipped_banana_env --model-path=/path/to/checkpoint.pth\n```\n\n### Flags\nThe agent training supports many flags which **if not provided** or\n**are of incorrect type** the **defaults would be used**.\n\n```\nusage: main.py [-h] [--env-path ENV_PATH] [--model-path MODEL_PATH]\n               [--episodes EPISODES] [--time-steps TIME_STEPS]\n               [--qualify-score QUALIFY_SCORE] [--score-window SCORE_WINDOW]\n               [--buffer-size BUFFER_SIZE] [--batch-size BATCH_SIZE]\n               [--gamma GAMMA] [--lr LR] [--eps-start EPS_START]\n               [--eps-decay EPS_DECAY] [--eps-end EPS_END] [--tau TAU]\n               [--update-every UPDATE_EVERY] [--fc1_units FC1_UNITS]\n               [--fc2_units FC2_UNITS] [--seed SEED]\n\nTeach an agent to pick up yellow bananas from blue implementing Deep-Q\nLearning\n\noptional arguments:\n  -h, --help            Show this help message and exit\n\n  --env-path ENV_PATH   Path to the unity environment\n\n  --model-path MODEL_PATH\n                        Path to a trained q-network's checkpoint(.pth) file\n\n  --episodes EPISODES   Number of episodes for which the agent must be trained\n\n  --time-steps TIME_STEPS\n                        Number of steps to be taken in an episode\n\n  --qualify-score QUALIFY_SCORE\n                        Score at which the training must stop\n\n  --score-window SCORE_WINDOW\n                        Number of episodes for which the qualify-score should\n                        be maintained as average\n\n  --buffer-size BUFFER_SIZE\n                        Number of episodes to keep in memory (for experience\n                        replay)\n\n  --batch-size BATCH_SIZE\n                        Number of samples/batch for training the Q network\n\n  --gamma GAMMA         Discount factor of the rewards\n\n  --lr LR               Learning rate\n\n  --eps-start EPS_START\n                        Initial epsilon for epsilon greedy\n\n  --eps-decay EPS_DECAY\n                        The value by which the initial epsilon must decay\n                        over-time\n\n  --eps-end EPS_END     The minimum value of epsilon beyond which there should\n                        be no decay\n\n  --tau TAU             The degree of influence the target network has on the main/local network\n\n  --update-every UPDATE_EVERY\n                        Number of time-steps in an episode after which the Q\n                        network should be updated\n\n  --fc1_units FC1_UNITS\n                        Neurons in the first fully connected layer\n\n  --fc2_units FC2_UNITS\n                        Neurons in the second fully connected layer\n\n  --seed SEED           Random seed to ensure same results\n\n```\n\n## Report\nThe experiment conducted has a detailed report [here](https://github.com/AmreshVenugopal/DRLND_Navigation_Project/blob/master/Report.md)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fltbringer%2Fdrlnd_navigation_project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fltbringer%2Fdrlnd_navigation_project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fltbringer%2Fdrlnd_navigation_project/lists"}