{"id":22704978,"url":"https://github.com/akensert/ddpg-optimal-scouting-runs","last_synced_at":"2025-06-27T15:07:16.763Z","repository":{"id":210252252,"uuid":"726090372","full_name":"akensert/ddpg-optimal-scouting-runs","owner":"akensert","description":"Deep deterministic policy gradient algorithm for the selection of optimal gradient scouting runs","archived":false,"fork":false,"pushed_at":"2023-12-01T16:11:25.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-27T15:06:34.448Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/akensert.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-12-01T14:13:49.000Z","updated_at":"2023-12-01T15:10:40.000Z","dependencies_parsed_at":"2023-12-01T16:43:36.285Z","dependency_job_id":null,"html_url":"https://github.com/akensert/ddpg-optimal-scouting-runs","commit_stats":null,"previous_names":["akensert/ddpg-optimal-scouting-runs"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/akensert/ddpg-optimal-scouting-runs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fddpg-optimal-scouting-runs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fddpg-optimal-scouting-runs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fddpg-optimal-scouting-runs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fddpg-optimal-scouting-runs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/akensert","download_url":"https://codeload.github.com/akensert/ddpg-optimal-scouting-runs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/akensert%2Fddpg-optimal-scouting-runs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262279128,"owners_count":23286550,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-10T09:08:37.295Z","updated_at":"2025-06-27T15:07:16.737Z","avatar_url":"https://github.com/akensert.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Twin-delayed DDPG for optimal selection of gradient scouting runs\n\n## About\nAn attempt to implement and train a DDPG agent to select optimal scouting runs for a given compound. The scouting runs were run in a simulator, using well-studied retention models. \n\n\u003e Caution: the agent is not meant to be used in real practice. \n\nThe goal of this project was two-fold:\n\n1. To succesfully develop and train a reinforcement learning (RL) agent to perform scouting runs based on feedback. The feedback is computed based on a reward function which takes into consideration the accuracy of the retention models (fit to/resulted from the scouting runs) and the run-time.\n2. If the agent learns well, get insight on what scouting runs are optimal given a certain compound. Are the choices of the agent what we expect? Are there any surprises?\n\n## Room for improvement\nAlthough occasionally converging to reasonable solutions, the training is unstable. One of the main reasons for this is likely the way the rewards are calculated; as mentioned, the rewards are based on retention model fittings, which are very sensitive to the data points obtained (what scouting runs were made). (This could also be an issue when later evaluating the performance.) Below are some suggestions for improving the DDPG algorithm:\n\n1. Modify the reward function, including better scaling (e.g., between -1.0 and 1.0)\n2. Scale actions between -1.0 and 1.0, and states between e.g. 0.0 and 1.0. Caution: need to reverse scaling in the environment.\n3. Fine-tune the hyperparameters of the DDPG agent\n    - E.g. discount factor,\n    - learning rate,\n    - action noise,\n    - and tau.\n4. Improve the architecture of the neural networks, as well as its hyperparameters\n    - E.g. better initialization,\n    - regularizaton,\n    - number of layers and units.\n5. Replace existing buffer with prioritized experience replay buffer.\n\n## Requirements\n* Python 3.10\n    * jupyter (version 1.0.0)\n    * tensorflow (version 2.13.0)\n    * matplotlib (version 3.7.2)\n    * tqdm (version 4.45.0)\n    * gymnasium (version 0.26.2)\n    \n\u003e See setup.py for more detail on what packages are installed and what versions.\n\n## Setup and run\n1. Navigate to to the desired location.\n2. Clone the repository: e.g., `git clone git@github.com:pharmanlysis/ddpg-optimal-scouting-runs.git`\n3. Install the package (setup the repistory): `pip install -e .`\n4. Navigate source code (in `src/`) to study and possibly modify the code. \n5. Navigate to scripts (`../scripts/`) and train the agent in the environment, via `python main.py`\n6. Navigate to root (`../`) and track the training progression via tensorboard: `tensorboard --logdir logs/`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakensert%2Fddpg-optimal-scouting-runs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakensert%2Fddpg-optimal-scouting-runs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakensert%2Fddpg-optimal-scouting-runs/lists"}