{"id":20054850,"url":"https://github.com/thu-ml/srpo","last_synced_at":"2025-05-05T13:31:44.509Z","repository":{"id":199676328,"uuid":"703414203","full_name":"thu-ml/SRPO","owner":"thu-ml","description":"Codes accompanying the paper \"Score Regularized Policy Optimization through Diffusion Behavior\" (ICLR 2024).","archived":false,"fork":false,"pushed_at":"2024-02-10T12:46:24.000Z","size":606,"stargazers_count":28,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-05-22T11:33:58.845Z","etag":null,"topics":["behavior-regularization","d4rl","diffusion","generative","offline","reinforcement-learning","rl","score-based-models","srpo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thu-ml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-11T07:44:56.000Z","updated_at":"2024-05-15T02:11:36.000Z","dependencies_parsed_at":"2024-11-13T12:44:14.100Z","dependency_job_id":"98914b53-bd41-48ff-8a18-ad6869eb75fc","html_url":"https://github.com/thu-ml/SRPO","commit_stats":{"total_commits":13,"total_committers":2,"mean_commits":6.5,"dds":0.3076923076923077,"last_synced_commit":"7ddae7a9be7681a6d135a69f206fda2e9bbe39f4"},"previous_names":["thu-ml/srpo"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FSRPO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FSRPO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FSRPO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thu-ml%2FSRPO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thu-ml","download_url":"https://codeload.github.com/thu-ml/SRPO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252506444,"owners_count":21759048,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["behavior-regularization","d4rl","diffusion","generative","offline","reinforcement-learning","rl","score-based-models","srpo"],"created_at":"2024-11-13T12:44:03.539Z","updated_at":"2025-05-05T13:31:44.179Z","avatar_url":"https://github.com/thu-ml.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Score Regularized Policy Optimization through Diffusion Behavior\n\nHuayu Chen, Cheng Lu, Zhengyi Wang, Hang Su, Jun Zhu\n\n![image info](./SRPO.PNG)\n\n## D4RL experiments\n\n### Requirements\nInstallations of [PyTorch](https://pytorch.org/), [MuJoCo](https://github.com/deepmind/mujoco), and [D4RL](https://github.com/Farama-Foundation/D4RL) are needed.\n\n### Running\nDownload the pretrained behavior and critic checkpoints from [here](https://drive.google.com/drive/folders/1N0qC6lakTtwLa7oE0B_9jHfwCj65Irxx?usp=drive_link) and store them under `./SRPO_model_factory/`.\n\nYou can also choose to pretrain the behavior and the critic model yourself. Respectively run\n\n```.bash\nTASK=\"halfcheetah-medium-v2\"; seed=0; python3 -u train_behavior.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}\n```\n\n```.bash\nTASK=\"halfcheetah-medium-v2\"; seed=0; python3 -u train_critic.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed}\n```\n\n\nFinally, run\n\n```.bash\nTASK=\"halfcheetah-medium-v2\"; seed=0; python3 -u train_policy.py --expid ${TASK}-baseline-seed${seed} --env $TASK --seed ${seed} --actor_load_path ./SRPO_model_factory/${TASK}-baseline-seed${seed}/behavior_ckpt200.pth --critic_load_path ./SRPO_model_factory/${TASK}-baseline-seed${seed}/critic_ckpt150.pth\n```\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-ml%2Fsrpo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthu-ml%2Fsrpo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthu-ml%2Fsrpo/lists"}