Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kohlerhector/trex-tree-reward-exploration

Using Tree estimators of the MDP models to then count leaves grouping similar transitions and do count-based exploration.
https://github.com/kohlerhector/trex-tree-reward-exploration

decision-trees drl exploration rl scikit-learn stable-baselines3

Last synced: about 1 month ago
JSON representation

Using Tree estimators of the MDP models to then count leaves grouping similar transitions and do count-based exploration.

Host: GitHub
URL: https://github.com/kohlerhector/trex-tree-reward-exploration
Owner: KohlerHECTOR
Created: 2024-02-06T17:41:25.000Z (almost 1 year ago)
Default Branch: main
Last Pushed: 2024-02-07T16:49:36.000Z (12 months ago)
Last Synced: 2024-11-08T04:15:25.567Z (3 months ago)
Topics: decision-trees, drl, exploration, rl, scikit-learn, stable-baselines3
Language: Python
Homepage:
Size: 436 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        ##### Exploration and Model Learning Tricks following this [blog](https://lilianweng.github.io/posts/2020-06-07-exploration-drl/):

###### For Tree-Model-Based-Policy-Exploration see: https://github.com/KohlerHECTOR/Tree-MBPO

- Normlalize Observation cf [RND-PPO](https://arxiv.org/abs/1810.12894) (*Implemented*).

- In practice, normalize rewards does not seem to work as running means do not incorporate well sparse high-rewards (*Implemented*). 

- Bisimilarity Measure [MICO](https://arxiv.org/pdf/2106.08229.pdf]) (two pairs $(s, a)$ should have a similar bisimiliratiy measure if they lead to similar

$(r, s\_{next})$ ) (*Not Implemented*).

- Non-episodic returns, cf [RND-PPO](https://arxiv.org/abs/1810.12894), $\texttt{env.reset()}$ is called only after a policy update is performed (*Not Implemented*).

##### Tree models and counters.

- Given a maximum depth $D$, tree can map any coninuous space to $2^D$ discete and countable leaves. Using [Decision Trees](https://scikit-learn.org/stable/modules/tree.html), one can learn:  $S \times A \rightarrow \{R \times S\}^{2^D}$, $S \times A \rightarrow \{S\}^{2^D}$, $S \times A \rightarrow \{R\}^{2^D}$

- One can see that trees can be used both to learn predictive models, and discrete counter when equipped with a dictionary memory where key are leaves labels and values are visitations of leaves.

- Tree can be used to do both Count-Based and Prediction Based $r^{i}_t = N_t(\text{TreeLeaf}(s_t, a_t))^{-\frac{1}{2}}$ or $r^{i}_t =\text{TreeScore}(s_t, a_t, r_t, s\_{{next}_t})^{-\frac{1}{2}}$ or even combine both: $r^{i}_t = N_t(\text{TreeLeaf}(s_t, a_t))^{-\frac{1}{2}} \times \text{TreeScore}(s_t, a_t, r_t, s\_{{next}_t})$

![](MountainCar.png)