Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kohlerhector/trex-tree-reward-exploration
Using Tree estimators of the MDP models to then count leaves grouping similar transitions and do count-based exploration.
https://github.com/kohlerhector/trex-tree-reward-exploration
decision-trees drl exploration rl scikit-learn stable-baselines3
Last synced: 7 days ago
JSON representation
Using Tree estimators of the MDP models to then count leaves grouping similar transitions and do count-based exploration.
- Host: GitHub
- URL: https://github.com/kohlerhector/trex-tree-reward-exploration
- Owner: KohlerHECTOR
- Created: 2024-02-06T17:41:25.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-02-07T16:49:36.000Z (11 months ago)
- Last Synced: 2024-11-08T04:15:25.567Z (about 2 months ago)
- Topics: decision-trees, drl, exploration, rl, scikit-learn, stable-baselines3
- Language: Python
- Homepage:
- Size: 436 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
##### Exploration and Model Learning Tricks following this [blog](https://lilianweng.github.io/posts/2020-06-07-exploration-drl/):
###### For Tree-Model-Based-Policy-Exploration see: https://github.com/KohlerHECTOR/Tree-MBPO
- Normlalize Observation cf [RND-PPO](https://arxiv.org/abs/1810.12894) (*Implemented*).
- In practice, normalize rewards does not seem to work as running means do not incorporate well sparse high-rewards (*Implemented*).
- Bisimilarity Measure [MICO](https://arxiv.org/pdf/2106.08229.pdf]) (two pairs $(s, a)$ should have a similar bisimiliratiy measure if they lead to similar
$(r, s\_{next})$ ) (*Not Implemented*).
- Non-episodic returns, cf [RND-PPO](https://arxiv.org/abs/1810.12894), $\texttt{env.reset()}$ is called only after a policy update is performed (*Not Implemented*).##### Tree models and counters.
- Given a maximum depth $D$, tree can map any coninuous space to $2^D$ discete and countable leaves. Using [Decision Trees](https://scikit-learn.org/stable/modules/tree.html), one can learn: $S \times A \rightarrow \{R \times S\}^{2^D}$, $S \times A \rightarrow \{S\}^{2^D}$, $S \times A \rightarrow \{R\}^{2^D}$
- One can see that trees can be used both to learn predictive models, and discrete counter when equipped with a dictionary memory where key are leaves labels and values are visitations of leaves.
- Tree can be used to do both Count-Based and Prediction Based $r^{i}_t = N_t(\text{TreeLeaf}(s_t, a_t))^{-\frac{1}{2}}$ or $r^{i}_t =\text{TreeScore}(s_t, a_t, r_t, s\_{{next}_t})^{-\frac{1}{2}}$ or even combine both: $r^{i}_t = N_t(\text{TreeLeaf}(s_t, a_t))^{-\frac{1}{2}} \times \text{TreeScore}(s_t, a_t, r_t, s\_{{next}_t})$![](MountainCar.png)