Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/shilpakancharla/deep-rl-lunar-lander
Using a deep Q-learning network and searching for optimal hyperparameters in order to solve the lunar lander problem provided by OpenAI Gym.
https://github.com/shilpakancharla/deep-rl-lunar-lander
deep-reinforcement-learning hyperparameter-optimization lunar-lander openai-gym openai-gym-environments
Last synced: about 2 months ago
JSON representation
Using a deep Q-learning network and searching for optimal hyperparameters in order to solve the lunar lander problem provided by OpenAI Gym.
- Host: GitHub
- URL: https://github.com/shilpakancharla/deep-rl-lunar-lander
- Owner: shilpakancharla
- Created: 2021-03-27T20:57:24.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-05-03T00:39:18.000Z (over 3 years ago)
- Last Synced: 2024-01-29T23:12:10.704Z (11 months ago)
- Topics: deep-reinforcement-learning, hyperparameter-optimization, lunar-lander, openai-gym, openai-gym-environments
- Language: Python
- Homepage:
- Size: 2.52 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Applications of Reinforcement Learning: Lunar Lander Simulation
## Abstract
The purpose of the following reinforcement learning experiment is to investigate optimal parameter values for deep Q-learning (DQN) on the Lunar Lander problem provided by OpenAI Gym. The LunarLander-v2 is an environment with uncertainty and this investigation explores optimal parameters that will maximize the mean reward over 400 episodes or less. A deep learning network is designed for the agent and various reinforcement learning parameters are used to carry out the simulation. Through the use of a neural network with two hidden layers, the agent was able to converge to a mean average reward score of 200 with epsilon = 0.9, epsilon-decay = 0.995, alpha (learning rate) = 0.001, and gamma (discount factor) = 0.99 in a little over 250 episodes. A comparative analysis between different parameters used is also performed. The results and the architecture of the model used from this experiment are also compared to other similar experiments that employ the DQN method for the Lunar Lander problem.