Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jnptr/subgoal-mcts

Subgoal-Based Temporal Abstractions in Monte-Carlo Tree Search
https://github.com/jnptr/subgoal-mcts

Last synced: about 5 hours ago
JSON representation

Subgoal-Based Temporal Abstractions in Monte-Carlo Tree Search

Awesome Lists containing this project

README

        

# subgoal-mcts

Partial code for Master's thesis *Subgoal-Based Temporal Abstractions in Monte-Carlo Tree Search* (2018) by Jan Peter.

The underlying algorithm was developed together with Christian Meyer during a university project in 2016.

---

To address the problem of efficiently generating macro-actions from a subgoal predicate in deterministic domains, we propose an extension to *Monte-Carlo Tree Search (MCTS)* that dynamically samples an approximate set of available macro-actions locally for each state. This is done by modifying the expansion phase of MCTS to generate random sequences of primitive actions until a subgoal is reached. Once the likelihood of discovering further macro-actions that terminate in a distinct subgoal drops below a threshold, the set of macro-actions is considered complete and is used for further planning. By treating this macro-action sampling as a Bernoulli trial, we do not make any assumptions about the number of macro-actions possible in each state and only rely on a desired *action coverage* along with a tolerated error.

This method exploits the fact that in most hierarchical domains, it is more important to choose among subgoals than behaving optimally while pursuing a subgoal. Moreover, macro-actions can be gradually refined by replacing them with the instance that has a higher cumulative reward each time the corresponding subgoal is rediscovered.

Altogether, drawing on the already known benefits of MCTS, our approach aims to achieve the following properties:

**Generality** Only a subgoal predicate is needed in addition to the generative model of the domain. Unlike other methods, no information is needed about how to achieve a subgoal, from where it is reachable, how valuable it is, and how many macro-actions are available from each state.

**Scalability** The size of the state and action spaces of the domain is irrelevant to the size of the internal lookahead tree. Its branching factor merely depends on the actual number of reachable subgoals in each state. Its depth depends on the specified horizon and the length of each macro-action.

**Robustness** The performance of the algorithm is insensitive to a loose definition of subgoals. This enables the use of heuristics that can be generated with little domain knowledge, possibly using a subgoal discovery method.

**Efficiency** Sampling macro-actions can be implemented efficiently and incurs little overhead to the base MCTS algorithm. Further speedup can be achieved by running multiple instances of the algorithm in parallel, each with a low action coverage.