https://github.com/mtrazzi/meta_rl
The Tensorflow code and a DeepMind Lab wrapper for my article "Meta-Reinforcement Learning" on FloydHub.
https://github.com/mtrazzi/meta_rl
blogpost deepmind-lab machine-learning neuroscience paper-implementations psychology-experiments reinforcement-learning tensorflow
Last synced: 8 months ago
JSON representation
The Tensorflow code and a DeepMind Lab wrapper for my article "Meta-Reinforcement Learning" on FloydHub.
- Host: GitHub
- URL: https://github.com/mtrazzi/meta_rl
- Owner: mtrazzi
- License: mit
- Created: 2019-02-11T15:29:56.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-03-28T12:10:16.000Z (about 7 years ago)
- Last Synced: 2025-05-31T14:09:12.144Z (about 1 year ago)
- Topics: blogpost, deepmind-lab, machine-learning, neuroscience, paper-implementations, psychology-experiments, reinforcement-learning, tensorflow
- Language: Python
- Homepage:
- Size: 5.63 MB
- Stars: 37
- Watchers: 2
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
⚠**DISCLAIMER**⚠
This is the git submodule for the Harlow task for my article [Meta-Reinforcement Learning](https://blog.floydhub.com/author/michaeltrazzi/) on FloydHub.
- For the main repository for the Harlow task (with more information about the task) see [here](https://github.com/mtrazzi/harlow).
- For the two-step task see [here](https://github.com/mtrazzi/two-step-task).
To get started, check out the parent [`README.md`](https://github.com/mtrazzi/harlow#getting-started).
# Discussion
I answer questions and give more informations here:
- [Hacker news](https://news.ycombinator.com/item?id=19503985)
- [r/MachineLearning](https://www.reddit.com/r/MachineLearning/comments/b688id/p_reimplementing_deepminds_metarl_papers/)
# Directory structure
``` bash
meta-rl
├── harlow.py # main file that implements the DeepMind Lab wrapper, processes the frames and run the trainings, initializing a DeepMind Lab environment.
└── meta_rl
├── worker.py # implements the class `Worker`, that contains the method `work` to collect training data and `train` to train the networks on this training data.
└── ac_network.py # implements the class `AC_Network`, where we initialize all the networks & the loss function.
```
# Branches
- [`master`](https://github.com/mtrazzi/meta_rl): for this branch, the frames are pre-processed, on a dataset of 42 pictures of students from 42 (cf. the FloydHub blog for more details). Our model achieved 40% performance on this simplified version of the Harlow task.
- [`dev`](https://github.com/mtrazzi/meta_rl/tree/dev): for this branch, we implemented a stacked LSTM + a convolutional network, to have exactly the same setup as in [Wang et al, 2018 Nature Neuroscience](https://www.nature.com/articles/s41593-018-0147-8). Here is the reward curve we obtained:
- [`monothread2pixel`](https://github.com/mtrazzi/meta_rl/tree/monothread2pixel): here, we used for our dataset only a black image and a white image. We pre-processed those two images so our agent only sees a one-hot, that is either [0,1] or [1,0]. Here is the resulting reward curve after training:
- [`multiprocessing`](https://github.com/mtrazzi/meta_rl/tree/multiprocessing): I implemented multiprocessing using Python’s library multiprocessing. However, it appeared that Tensorflow doesn’t allow to use multiprocessing after having imported tensorflow, so that multiprocessing branch came to a dead end.
- [`ray`](https://github.com/mtrazzi/meta_rl/tree/ray): we also tried multiprocessing with ray another multiprocessing library. However, it didn’t work out because DeepMind was not pickable, i.e. it couldn’t be serialized using pickle.
# Todo
On branch `master`:
- [ ] train with more episodes (for instance 20-50k) to see if some seeds keep learning.
- [ ] train with different seeds, to see if some seeds can reach > 40% performance.
- [ ] train with more units in the LSTM (for instance > 100 instead of 48), to see if it can keep learning after 10k episodes.
- [ ] train with more images (for instance 1000).
For multi-threading (e.g. in `dev`):
- [ ] support for [distributed tensorflow](https://www.tensorflow.org/guide/distribute_strategy) on multiple GPUs.
- [ ] get rid of CPython's global interpreter lock by connecting Tensorflow's C API with DeepMind Lab C API.
For multiprocessing:
- [ ] in [`multiprocessing`](https://github.com/mtrazzi/meta_rl/tree/multiprocessing) branch, try to import tensorflow _after_ the multiprocessing calls.
- [ ] in [`ray`](https://github.com/mtrazzi/meta_rl/tree/ray), try to make the DeepMind Lab environment pickable (for instance by looking at how OpenAI made their physics engine [mujoco-py](https://github.com/openai/mujoco-py) pickable.
# Support
- We support Python3.6.
- The branch `master` was tested on FloydHub's instances (using `Tensorflow 1.12` and `CPU`). To change for `GPU`, change `tf.device("/cpu:0")` with `tf.device("/device:GPU:0")` in [`harlow.py`](https://github.com/mtrazzi/meta_rl/blob/master/harlow.py).
## Pip
All the pip packages should be either installed on FloydHub or installed with [`install.sh`](https://github.com/mtrazzi/harlow/blob/master/install.sh).
However, if you want to run this repository on your machine, here are the requirements:
```
numpy==1.16.2
tensorflow==1.12.0
six==1.12.0
scipy==1.2.1
skimage==0.0
setuptools==40.8.0
Pillow==5.4.1
```
Additionally, for the branch `ray` you might need to do (`pip install ray`) and for the branch `multiprocessing` you would need to install `multiprocessing` with (`pip install multiprocessing`).
# Credits
This work uses [awjuliani's Meta-RL implementation](https://github.com/awjuliani/Meta-RL).
I couldn't have done without my dear friend [Kevin Costa](https://github.com/kcosta42), and the additional details provided kindly by [Jane Wang](http://www.janexwang.com/).