https://github.com/leo27945875/td3-ant-v2
https://github.com/leo27945875/td3-ant-v2
deep-learning mujoco-environments reinforcement-learning td3-pytorch
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/leo27945875/td3-ant-v2
- Owner: leo27945875
- Created: 2021-08-25T17:50:34.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2021-09-18T09:03:47.000Z (over 3 years ago)
- Last Synced: 2025-01-29T12:48:00.380Z (4 months ago)
- Topics: deep-learning, mujoco-environments, reinforcement-learning, td3-pytorch
- Language: Jupyter Notebook
- Homepage:
- Size: 16.8 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# TD3

## Set the parameters in block "Set the parameters" in `TD3_Ant.ipynb`:
* env_name = Name of enviroment
* seed = Random seed
* start_timesteps = When to start training TD3 model
* eval_freq = Frequency of evaluation
* max_timesteps = Maximum timesteps
* save_models = Need save model ?
* expl_noise = Exploration noise
* batch_size = Batch size
* discount = Discount factor
* tau = The parameter to smoothly update targrt network in TD3 paper
* policy_noise = Policy noise to do target policy smoothing
* noise_clip = Maximum value of policy noise
* policy_freq = Frequency to update the actor and target networks## Model Training:
After setting up all parameters, just click the button "Run All", then you can train TD3 model and evaluate it. You can find the model file in the folder "./pytorch_models" and reward records of evaluations in "./results/[enviroment name]".
## Results:

* DP: Delayed Policy updates
* TPS: Target Policy Smoothing
* CDQ: Clipped Double Q-learning