https://github.com/thiagopbueno/tf-mdp

Probabilistic planning in continuous state-action MDPs in TensorFlow.
https://github.com/thiagopbueno/tf-mdp
mdp planning rddl tensorflow
Last synced: 6 months ago
JSON representation
Probabilistic planning in continuous state-action MDPs in TensorFlow.
Host: GitHub
URL: https://github.com/thiagopbueno/tf-mdp
Owner: thiagopbueno
License: gpl-3.0
Created: 2018-08-20T13:36:26.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2022-06-21T21:24:03.000Z (about 4 years ago)
Last Synced: 2025-09-22T18:44:03.595Z (10 months ago)
Topics: mdp, planning, rddl, tensorflow
Language: Python
Homepage: http://tf-mdp.rtfd.io/
Size: 427 KB
Stars: 12
Watchers: 0
Forks: 3
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # tf-mdp[![Py Versions][py-versions.svg]][pypi-project] [![PyPI version][pypi-version.svg]][pypi-version] [![Build Status][travis.svg]][travis-project] [![Documentation Status][rtd-badge.svg]][rtd-badge] [![License: GPL v3][license.svg]][license]

Probabilistic planning in continuous state-action MDPs using TensorFlow.

**tf-mdp** is an implementation based on the paper:

> Thiago P. Bueno; Leliane N. de Barros; Denis D. Mauá; Scott Sanner


> **[Deep Reactive Policies for Planning in Stochastic Nonlinear Domains](https://aaai.org/ojs/index.php/AAAI/article/view/4744)**


> In AAAI, 2019.

# Quickstart

**tf-mdp** is a Python3.6+ package available in PyPI.

```text

$ pip3 install tf-mdp

```

Please make sure you have a running TensorFlow version on your system before pip-installing this package.

# Features

**tf-mdp** solves discrete-time continuous state-action MDPs.

The domains/instances are specified using the [RDDL][rddl] language.

It is built on the following packages available on the Python3 RDDL toolkit:

- [pyrddl][pyrddl]: RDDL lexer/parser.

- [rddlgym][rddlgym]: A toolkit for working with RDDL domains.

- [rddl2tf][rddl2tf]: RDDL2TensorFlow compiler.

- [tf-rddlsim][tf-rddlsim]: A RDDL simulator running in TensorFlow.

Please refer to each project documentation for further details.

# Usage

```text

$ tfmdp --help

usage: tfmdp [-h] [-l LAYERS [LAYERS ...]]

             [-a {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}]

             [-iln] [-b BATCH_SIZE] [-hr HORIZON] [-e EPOCHS]

             [-lr LEARNING_RATE]

             [-opt {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}]

             [-lfn {linear,mse}] [-ld LOGDIR] [-v]

             rddl

Probabilistic planning in continuous state-action MDPs using TensorFlow.

positional arguments:

  rddl                  RDDL file or rddlgym domain id

optional arguments:

  -h, --help            show this help message and exit

  -l LAYERS [LAYERS ...], --layers LAYERS [LAYERS ...]

                        number of units in each hidden layer in policy network

  -a {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}, --activation {none,sigmoid,tanh,relu,relu6,crelu,elu,selu,softplus,softsign}

                        activation function for hidden layers in policy

                        network

  -iln, --input-layer-norm

                        input layer normalization flag

  -b BATCH_SIZE, --batch-size BATCH_SIZE

                        number of trajectories in a batch (default=256)

  -hr HORIZON, --horizon HORIZON

                        number of timesteps (default=40)

  -e EPOCHS, --epochs EPOCHS

                        number of timesteps (default=200)

  -lr LEARNING_RATE, --learning-rate LEARNING_RATE

                        optimizer learning rate (default=0.001)

  -opt {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}, --optimizer {Adadelta,Adagrad,Adam,GradientDescent,ProximalGradientDescent,ProximalAdagrad,RMSProp}

                        loss optimizer (default=RMSProp)

  -lfn {linear,mse}, --loss-fn {linear,mse}

                        loss function (default=linear)

  -ld LOGDIR, --logdir LOGDIR

                        log directory for data summaries (default=/tmp/tfmdp)

  -v, --verbose         verbosity mode

```

# Examples

```text

$ tfmdp Reservoir-20 -l 2048 -iln -a elu -b 256 -hr 40 -e 200 -lr 0.001 -lfn mse -v

Running tf-mdp v0.5.4 ...

>> RDDL:   Reservoir-20

>> logdir: /tmp/tfmdp

>> Policy Net:

layers = [2048]

activation = elu

input  layer norm = True

>> Hyperparameters:

epochs        = 200

learning rate = 0.001

batch size    = 256

horizon       = 40

>> Optimization:

optimizer     = RMSProp

loss function = mse

>> Loading model ...

Done in 0.018952 sec.

>> Optimizing...

2021-06-23 22:56:18.873731: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2021-06-23 22:56:18.895765: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz

2021-06-23 22:56:18.896462: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x46628b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2021-06-23 22:56:18.896514: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

Epoch   199: loss = 1201677952.000000

Done in 28.525183 sec.

>> Performance:

total reward = -3653.9695, reward per timestep = -91.3492

```

```text

$ tfmdp HVAC-3 -l 256 128 64 32 -iln -a elu -b 256 -hr 40 -e 200 -lr 0.0001 -lfn mse -v

Running tf-mdp v0.5.4 ...

>> RDDL:   HVAC-3

>> logdir: /tmp/tfmdp

>> Policy Net:

layers = [256,128,64,32]

activation = elu

input  layer norm = True

>> Hyperparameters:

epochs        = 200

learning rate = 0.0001

batch size    = 256

horizon       = 40

>> Optimization:

optimizer     = RMSProp

loss function = mse

>> Loading model ...

Done in 0.017646 sec.

>> Optimizing...

2021-06-23 22:54:05.766434: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2021-06-23 22:54:05.787832: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz

2021-06-23 22:54:05.788607: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x49a4d00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2021-06-23 22:54:05.788690: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

Epoch   199: loss = 103798661120.0000000

Done in 15.748765 sec.

>> Performance:

total reward = -315724.4688, reward per timestep = -7893.1117

```

```text

$ tfmdp Navigation-v2 -l 256 128 64 32 -a elu -b 128 -hr 20 -e 200 -lr 0.001 -lfn mse -v

Running tf-mdp v0.5.4 ...

>> RDDL:   Navigation-v2

>> logdir: /tmp/tfmdp

>> Policy Net:

layers = [256,128,64,32]

activation = elu

input  layer norm = False

>> Hyperparameters:

epochs        = 200

learning rate = 0.001

batch size    = 128

horizon       = 20

>> Optimization:

optimizer     = RMSProp

loss function = mse

>> Loading model ...

Done in 0.012209 sec.

>> Optimizing...

2021-06-23 22:50:59.732002: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

2021-06-23 22:50:59.751959: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2199995000 Hz

2021-06-23 22:50:59.752494: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5bc6a20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:

2021-06-23 22:50:59.752514: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

Epoch   199: loss = 6452.3613285

Done in 6.466699 sec.

>> Performance:

total reward = -78.3427, reward per timestep = -3.9171

```

# Documentation

Please refer to [https://tf-mdp.readthedocs.io/][readthedocs] for the code documentation.

# Support

If you are having issues with tf-mdp, please let me know at: [thiago.pbueno@gmail.com](mailto://thiago.pbueno@gmail.com).

# License

Copyright (c) 2018-2021 Thiago Pereira Bueno All Rights Reserved.

tf-mdp is free software: you can redistribute it and/or modify it

under the terms of the GNU Lesser General Public License as published by

the Free Software Foundation, either version 3 of the License, or (at

your option) any later version.

tf-mdp is distributed in the hope that it will be useful, but

WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser

General Public License for more details.

You should have received a copy of the GNU Lesser General Public License

along with tf-mdp. If not, see http://www.gnu.org/licenses/.

[pyrddl]: https://github.com/thiagopbueno/pyrddl

[rddl2tf]: https://github.com/thiagopbueno/rddl2tf

[rddl]: http://users.cecs.anu.edu.au/~ssanner/IPPC_2011/RDDL.pdf

[rddlgym]: https://github.com/thiagopbueno/rddlgym

[tf-rddlsim]: https://github.com/thiagopbueno/tf-rddlsim

[py-versions.svg]: https://img.shields.io/pypi/pyversions/tf-mdp.svg?logo=python&logoColor=white

[pypi-project]: https://pypi.org/project/tf-mdp

[pypi-version.svg]: https://badge.fury.io/py/tf-mdp.svg

[pypi-version]: https://badge.fury.io/py/tf-mdp

[travis.svg]: https://img.shields.io/travis/thiagopbueno/tf-mdp/master.svg?logo=travis

[travis-project]: https://travis-ci.com/thiagopbueno/tf-mdp

[rtd-badge.svg]: https://readthedocs.org/projects/tf-mdp/badge/?version=latest

[rtd-badge]: https://tf-mdp.readthedocs.io/en/latest/?badge=latest

[license.svg]: https://img.shields.io/badge/License-GPL%20v3-blue.svg

[license]: https://github.com/thiagopbueno/tf-mdp/blob/master/LICENSE
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/thiagopbueno/tf-mdp

Awesome Lists containing this project

README