{"id":17027975,"url":"https://github.com/mtrazzi/meta_rl","last_synced_at":"2025-10-14T14:42:58.098Z","repository":{"id":111512155,"uuid":"170152718","full_name":"mtrazzi/meta_rl","owner":"mtrazzi","description":"The Tensorflow code and a DeepMind Lab wrapper for my article \"Meta-Reinforcement Learning\" on FloydHub.","archived":false,"fork":false,"pushed_at":"2019-03-28T12:10:16.000Z","size":5904,"stargazers_count":37,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-31T14:09:12.144Z","etag":null,"topics":["blogpost","deepmind-lab","machine-learning","neuroscience","paper-implementations","psychology-experiments","reinforcement-learning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mtrazzi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-02-11T15:29:56.000Z","updated_at":"2022-11-13T06:58:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"6a25cce7-fc0e-4e4d-bab9-1d1351297e53","html_url":"https://github.com/mtrazzi/meta_rl","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mtrazzi/meta_rl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtrazzi%2Fmeta_rl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtrazzi%2Fmeta_rl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtrazzi%2Fmeta_rl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtrazzi%2Fmeta_rl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mtrazzi","download_url":"https://codeload.github.com/mtrazzi/meta_rl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtrazzi%2Fmeta_rl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279019157,"owners_count":26086682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blogpost","deepmind-lab","machine-learning","neuroscience","paper-implementations","psychology-experiments","reinforcement-learning","tensorflow"],"created_at":"2024-10-14T07:51:49.564Z","updated_at":"2025-10-14T14:42:58.093Z","avatar_url":"https://github.com/mtrazzi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"/img/meta_rl_glitch.gif\" alt=\"Harlow Task\"\u003e\n\u003c/p\u003e\n\n⚠**DISCLAIMER**⚠\nThis is the git submodule for the Harlow task for my article [Meta-Reinforcement Learning](https://blog.floydhub.com/author/michaeltrazzi/) on FloydHub.\n- For the main repository for the Harlow task (with more information about the task) see [here](https://github.com/mtrazzi/harlow).\n- For the two-step task see [here](https://github.com/mtrazzi/two-step-task).\n\nTo get started, check out the parent [`README.md`](https://github.com/mtrazzi/harlow#getting-started).\n\n# Discussion\n\nI answer questions and give more informations here:\n- [Hacker news](https://news.ycombinator.com/item?id=19503985)\n- [r/MachineLearning](https://www.reddit.com/r/MachineLearning/comments/b688id/p_reimplementing_deepminds_metarl_papers/)\n\n# Directory structure\n\n``` bash\nmeta-rl\n├── harlow.py                 # main file that implements the DeepMind Lab wrapper, processes the frames and run the trainings, initializing a DeepMind Lab environment.\n└── meta_rl\n    ├── worker.py             # implements the class `Worker`, that contains the method `work` to collect training data and `train` to train the networks on this training data.\n    └── ac_network.py         # implements the class `AC_Network`, where we initialize all the networks \u0026 the loss function.\n```\n\n# Branches\n\n- [`master`](https://github.com/mtrazzi/meta_rl): for this branch, the frames are pre-processed, on a dataset of 42 pictures of students from 42 (cf. the FloydHub blog for more details). Our model achieved 40% performance on this simplified version of the Harlow task.\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"/img/reward_cuve_5_seeds_42_images.png\" alt=\"img/reward_cuve_5_seeds_42_images.png\"\u003e\n\u003c/p\u003e\n\n- [`dev`](https://github.com/mtrazzi/meta_rl/tree/dev): for this branch, we implemented a stacked LSTM + a convolutional network, to have exactly the same setup as in [Wang et al, 2018 Nature Neuroscience](https://www.nature.com/articles/s41593-018-0147-8). Here is the reward curve we obtained:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"/img/conv_plus_stacked_lstm.png\" alt=\"img/conv_plus_stacked_lstm.png\"\u003e\n\u003c/p\u003e\n\n- [`monothread2pixel`](https://github.com/mtrazzi/meta_rl/tree/monothread2pixel): here, we used for our dataset only a black image and a white image. We pre-processed those two images so our agent only sees a one-hot, that is either [0,1] or [1,0]. Here is the resulting reward curve after training:\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"/img/monothread2pixels.png\" alt=\"img/monothread2pixels.png\"\u003e\n\u003c/p\u003e\n\n- [`multiprocessing`](https://github.com/mtrazzi/meta_rl/tree/multiprocessing): I implemented multiprocessing using Python’s library multiprocessing. However, it appeared that Tensorflow doesn’t allow to use multiprocessing after having imported tensorflow, so that multiprocessing branch came to a dead end.\n\n- [`ray`](https://github.com/mtrazzi/meta_rl/tree/ray): we also tried multiprocessing with ray  another multiprocessing library. However, it didn’t work out because DeepMind was not pickable, i.e. it couldn’t be serialized using pickle.\n\n# Todo\n\nOn branch `master`:\n- [ ] train with more episodes (for instance 20-50k) to see if some seeds keep learning.\n- [ ] train with different seeds, to see if some seeds can reach \u003e 40% performance.\n- [ ] train with more units in the LSTM (for instance \u003e 100 instead of 48), to see if it can keep learning after 10k episodes.\n- [ ] train with more images (for instance 1000).\n\nFor multi-threading (e.g. in `dev`):\n- [ ] support for [distributed tensorflow](https://www.tensorflow.org/guide/distribute_strategy) on multiple GPUs.\n- [ ] get rid of CPython's global interpreter lock by connecting Tensorflow's C API with DeepMind Lab C API.\n\nFor multiprocessing:\n- [ ] in [`multiprocessing`](https://github.com/mtrazzi/meta_rl/tree/multiprocessing) branch, try to import tensorflow _after_ the multiprocessing calls.\n- [ ] in [`ray`](https://github.com/mtrazzi/meta_rl/tree/ray), try to make the DeepMind Lab environment pickable (for instance by looking at how OpenAI made their physics engine [mujoco-py](https://github.com/openai/mujoco-py) pickable.\n\n\n# Support\n\n- We support Python3.6.\n\n- The branch `master` was tested on FloydHub's instances (using `Tensorflow 1.12` and `CPU`). To change for `GPU`, change `tf.device(\"/cpu:0\")` with `tf.device(\"/device:GPU:0\")` in [`harlow.py`](https://github.com/mtrazzi/meta_rl/blob/master/harlow.py).\n\n## Pip\n\nAll the pip packages should be either installed on FloydHub or installed with [`install.sh`](https://github.com/mtrazzi/harlow/blob/master/install.sh).\n\nHowever, if you want to run this repository on your machine, here are the requirements:\n```\nnumpy==1.16.2\ntensorflow==1.12.0\nsix==1.12.0\nscipy==1.2.1\nskimage==0.0\nsetuptools==40.8.0\nPillow==5.4.1\n```\n\nAdditionally, for the branch `ray` you might need to do (`pip install ray`) and for the branch `multiprocessing` you would need to install `multiprocessing` with (`pip install multiprocessing`).\n\n# Credits\n\nThis work uses [awjuliani's Meta-RL implementation](https://github.com/awjuliani/Meta-RL).\n\nI couldn't have done without my dear friend [Kevin Costa](https://github.com/kcosta42), and the additional details provided kindly by [Jane Wang](http://www.janexwang.com/).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtrazzi%2Fmeta_rl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmtrazzi%2Fmeta_rl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtrazzi%2Fmeta_rl/lists"}