https://github.com/kohlerhector/tree-mbpo
Study Model-Based Policy Optimization by varying the model estimator classes (e.g Decision Trees vs MLP)
https://github.com/kohlerhector/tree-mbpo
decision-tree mbpo mbrl mlp rl sac scikit-learn stable-baselines3
Last synced: 9 months ago
JSON representation
Study Model-Based Policy Optimization by varying the model estimator classes (e.g Decision Trees vs MLP)
- Host: GitHub
- URL: https://github.com/kohlerhector/tree-mbpo
- Owner: KohlerHECTOR
- Created: 2024-02-03T15:09:41.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-07T10:40:46.000Z (over 1 year ago)
- Last Synced: 2024-12-30T19:49:23.165Z (10 months ago)
- Topics: decision-tree, mbpo, mbrl, mlp, rl, sac, scikit-learn, stable-baselines3
- Language: Python
- Homepage:
- Size: 3.09 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
###### For Tree-Based-Exploration see: https://github.com/KohlerHECTOR/TREX-Tree-Reward-EXploration
## Only Continuous actions
Install scikit-learn and SB3
```pip3 install -r requirements.txt```



### Available Models are Decision Trees, best CV Trees, and MLPs
### Available Policy Optim Algos are SAC and TD3
Launch MBPO for 100 iterations on InvertedPendulum with Decision Trees as Model estimators and SAC as policy optim.
Results are saved in 'Experience_Results/pendul-tree-sac/':
```python3 experience.py InvertedPendulum-v4 tree sac 100 pendul-tree-sac```
Launch MBPO for 100 iterations on InvertedPendulum with 2x64 MLP as Model estimators and SAC as policy optim.
Results are saved in 'Experience_Results/pendul-mlp-sac/':
```python3 experience.py InvertedPendulum-v4 mlp sac 100 pendul-mlp-sac```
Save Plots of comparisons 'Experience_Results/Comparison-date-time/':
```python3 compare_experiences.py pendul-tree-sac pendul-mlp-sac```
Save Plots of results in 'Experience_Results/pendul-tree-sac/':
```python3 plot_experience.py pendul-tree-sac```
MBPO: https://arxiv.org/abs/1906.08253

