{"id":20663782,"url":"https://github.com/vita-group/diffses","last_synced_at":"2025-04-19T15:57:01.687Z","repository":{"id":107045804,"uuid":"575626332","full_name":"VITA-Group/DiffSES","owner":"VITA-Group","description":"[TPAMI] \"Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search\", Wenqing Zheng*, S P Sharan*, Zhiwen Fan, Kevin Wang, Yihan Xi, Atlas Wang","archived":false,"fork":false,"pushed_at":"2023-01-04T19:04:01.000Z","size":70315,"stargazers_count":15,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-29T09:51:12.070Z","etag":null,"topics":["interpretable-machine-learning","neurosymbolic","reinforcement-learning","symbolic-regression"],"latest_commit_sha":null,"homepage":"https://vita-group.github.io/DiffSES/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VITA-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-07T23:58:04.000Z","updated_at":"2024-10-15T08:43:39.000Z","dependencies_parsed_at":"2023-03-13T14:38:29.343Z","dependency_job_id":null,"html_url":"https://github.com/VITA-Group/DiffSES","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiffSES","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiffSES/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiffSES/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VITA-Group%2FDiffSES/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VITA-Group","download_url":"https://codeload.github.com/VITA-Group/DiffSES/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249731752,"owners_count":21317343,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["interpretable-machine-learning","neurosymbolic","reinforcement-learning","symbolic-regression"],"created_at":"2024-11-16T19:19:50.644Z","updated_at":"2025-04-19T15:57:01.681Z","avatar_url":"https://github.com/VITA-Group.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n\n# Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search\n\n**[Wenqing Zheng](http://wenqing-zheng.github.io)\\*, [S P Sharan](https://github.com/Syzygianinfern0)\\*, [Zhiwen Fan](https://zhiwenfan.github.io), [Kevin Wang](), [Yihan Xi](), [Atlas Wang](https://www.ece.utexas.edu/people/faculty/atlas-wang)**\n\n\u003c!-- **Accepted at [NeurIPS 2022](https://neurips.cc/virtual/2022/poster/54408)** --\u003e\n\n| [```Website```](https://vita-group.github.io/DiffSES) | [```Arxiv```](https://arxiv.org/abs/2212.14849) |\n:------------------------------------------------------:|:-----------------------------------------------:|\n\n\u003cimg src=\"docs/static/figures/v2demo1.png\" width=\"768\"\u003e\n\n\u003c/div\u003e\n\n---\n\n# Introduction\n\nLearning efficient and interpretable policies has been a challenging task in reinforcement learning (RL), particularly\nin the visual RL setting with complex scenes. While deep neural networks have achieved competitive performance, the\nresulting policies are often over-parameterized black boxes that are difficult to interpret and deploy efficiently. More\nrecent symbolic RL frameworks have shown that high-level domain-specific programming logic can be designed to handle\nboth policy learning and symbolic planning. However, these approaches often rely on human-coded primitives with little\nfeature learning, and when applied to high-dimensional continuous conversations such as visual scenes, they can suffer\nfrom scalability issues and perform poorly when images have complicated compositions and object interactions.\nTo address these challenges, we propose Differentiable Symbolic Expression Search (DiffSES), a novel symbolic learning\napproach that discovers discrete symbolic policies using partially differentiable optimization. By using object-level\nabstractions instead of raw pixel-level inputs, DiffSES is able to leverage the simplicity and scalability advantages of\nsymbolic expressions, while also incorporating the strengths of neural networks for feature learning and optimization.\nOur experiments demonstrate that DiffSES is able to generate symbolic policies that are more interpretable and scalable\nthan state-of-the-art symbolic RL methods, even with a reduced amount of symbolic prior knowledge.\n\n\u003cdiv align=\"center\"\u003e\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\n\u003cimg src=\"docs/static/figures/flow-1.png\" width=\"768\"\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003cth\u003e\nInference procedure of the learned symbolic policy\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n# Results\n\n\u003cdiv align=\"center\"\u003e\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\n\u003cimg src=\"docs/static/figures/teaser-1.png\" width=\"768\"\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003cth\u003e\nA subset of our trained environments\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\nHere is the comparison of the models in a transfer learning setting. In this setting, the teacher DRL model is trained\nin AdventureIsland3, and the symbolic agent is learned based on it. Then both agents are applied to AdventureIsland2\nwithout fine-tuning. The performance of the symbolic policy drops less than DRL model.\n\n\u003cdiv align=\"center\"\u003e\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\n\u003cvideo src=\"https://user-images.githubusercontent.com/31875325/209475899-301e7ace-cc24-4ad9-8aa5-c83950d5d1e8.mp4\"\u003e\n\u003c/video\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003cth\u003e\nSymbolic policies enable ease of transfer owing to the disentanglement of control policies and feature extraction steps.\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd\u003e\n\u003cimg src=\"docs/static/figures/symbolic-policy-pong.png\" width=\"768\"\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003cth\u003e\nVisualization of a trained DiffSES policy\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\n---\n\n# Usage\n\n## Stage I - Neural Policy Learning\n\n\u003e Training a Visual RL agent as a teacher\n\n- Stable baselines 3 for PPO training\n\nRun `train_visual_rl.py` with appropriate environment selected. Refer to stable baselines zoo for additional\nconfiguration options. Use the wrappers provided in `retro_utils.py` for running on retro environments with multiple\nlives and stages.\n\nTrained model will be generated in the `logs/` folder (along with tensorboard logs).\n\n## Stage II - Symbolic Fitting\n\n\u003e Distillation of Teacher agent into Symbolic Student\n\n- GPLearn on offline dataset of teacher's actions\n\n### Part A: Training a self-supervised object detector\n\nTraining images for multiple atari environments can be\nfound [here](https://drive.google.com/file/d/1vzFVFhJZDZMkJ8liROtIyzOiUY42r4TZ/view). If you would like to run on\ncustom/other environments, consider generating them using the provided script `save_frames.py`. We then proceed to train\nthe OD module using these frames.\n\nFor more training parameters, consider referring the scripts and the SPACE project's documentation.\n\n```shell\ncd space/\npython main.py --task train --config configs/atari_spaceinvaders.yaml resume True device 'cuda:0'\n```\n\nThis should generate weights in the `space/output/logs` folder. Pretrained models from SPACE are\navailable [here](https://drive.google.com/file/d/1gUvLTfy5pKeLa6k3RT8GiEXWiGG8XzzD/view).\n\n### Part B: Generating the offline dataset\n\nSave teacher model's behavior (state-action pairs) along with OD module processing all such states. This creates a JSON\nof the form. `sample.json` contains a dummy dataset for demonstration purposes.\n\n```\n[\n  {\n    \"state\": [\n      {\n        \"type\": int,\n        \"x_velocity\": float,\n        \"y_velocity\": float,\n        \"x_position\": int,\n        \"y_position\": int\n      },\n      {\n        \"type\": int,\n        \"x_velocity\": float,\n        ...\n      }\n      ...\n    ],\n    \"teacher_action\": int\n  }\n  ...\n]\n```\n\n### Part C: Symbolic distillation\n\nWe use gplearn's symbolic regression API in `distill_teacher.py` to train a symbolic tree to mimic the teacher's\nactions. The operators are as defined in the file and can easily be extended for more operands through the simple\ngplearn APIs. Please check `see/judges.py` for a few sample implementations of operators. The operands are the states\nfrom JSON as stored. We recommend running this experiment numerous times to achieve good performance as convergence of\nsuch a random search is not a guarantee every time. Please refer to `gplearn_optuna.py` for a sample of automating such\na search on random data.\n\n## Stage III - Fine-tuning Symbolic Tree\n\n\u003e Neural Guided Differentiable Search\n\nLastly, our symbolic finetuning stage consists of `symbolic_finetuning.py` which uses a custom implementation of gplearn\nmodified in order to support the following:\n\n- **RL style training:** rewards as a fitness metric rather than MSE with respect to teacher behavior.\n- **Differentiable constant optimization:** new mutation scheme where the constants are set to be differentiable, the\n  tree acts as the policy network for a PPO agent and optimization is performed on those constants.\n- **Soft expert supervision in loss:** add-on to earlier bullet along with an extra loss term to aforementioned loss\n  being\n  the difference between the teacher's action and the symbolic tree's prediction.\n\nWhile running that file, please run a `pip install -e .` inside the custom implementation of gplearn to install the\nlocal version instead of the prebuilt wheels from PyPi. Similar to [Part 2.3](#part-c--symbolic-distillation), we\nrecommend running this experiment numerous times to achieve\nacceptable levels of convergence.\n\n## Citation\n\nIf you find our code implementation helpful for your own research or work, please cite our paper.\n\n```bibtex\n@article{zheng2022symbolic,\n  title={Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression Search},\n  author={Zheng, Wenqing and Sharan, SP and Fan, Zhiwen and Wang, Kevin and Xi, Yihan and Wang, Zhangyang},\n  journal={arXiv preprint arXiv:2212.14849},\n  year={2022}\n}\n```\n\n# Contact\n\nFor any queries, please [raise an issue](https://github.com/VITA-Group/DiffSES/issues/new) or\ncontact [Wenqing Zheng](mailto:w.zheng@utexas.edu).\n\n# License\n\nThis project is open sourced under [MIT License](LICENSE).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Fdiffses","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvita-group%2Fdiffses","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvita-group%2Fdiffses/lists"}