Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eczy/jit-discretization
https://github.com/eczy/jit-discretization
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/eczy/jit-discretization
- Owner: eczy
- Created: 2020-10-22T22:44:34.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-10-29T23:42:08.000Z (over 4 years ago)
- Last Synced: 2024-10-27T17:37:56.904Z (3 months ago)
- Language: Python
- Size: 180 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Just-in-time (JIT) improvement on discretization planning
Can you come up with a process (algorithm) that successively refines a continuous space MDP solution? You should be able to interrupt the process at any time and return a reasonable policy; the more computation it is allowed, however, the better the resulting policy. For example, you could start by using a coarse grid to compute a no-lookahead, 1-nearest-neighbor policy. Then, given more time, you can add gridpoints to refine the resolution. Can you add on to past computations to improve on the policy?
What about if you’d like to use more nearest neighbors? Or take additional lookahead steps? Can you algorithmically determine “the best” next refinement to take given the current policy / value function and saved computation?
---
This repo applies this discretization procedure to the MountainCar problem from Sutton & Barto.Currently, this repo only considers an iterative increase in discretization resolution, but it would be trivial to implement a schedule for knn and lookahead improvement (these parameters are implemented in the code, but a scheduling mechanism has not been implemented).
## Episode Length Decays as Resolution Increases
![Resolution vs Steps](assets/resolution_v_steps.png)## Value Function Visualizations
- x-axis: discretized position of the car
- left to right on plot -> left to right position in env
- y-axis: discretized velocity of the car
- top to bottom on plot -> negative to positive velocity in env
- colorbar: calculated value of discretized state### Resolution 16
![Value function at resolution 16](assets/res_16_knn_1_lookahead_1.png)### Resolution 64
![Value function at resolution 64](assets/res_64_knn_1_lookahead_1.png)### Resolution 256
![Value function at resolution 256](assets/res_256_knn_1_lookahead_1.png)### Resolution 1024
![Value function at resolution 1024](assets/res_1024_knn_1_lookahead_1.png)