https://github.com/MolecularAI/PaRoutes
Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.
https://github.com/MolecularAI/PaRoutes
Last synced: 3 months ago
JSON representation
Home of the PaRoutes framework for benchmarking multi-step retrosynthesis predictions.
- Host: GitHub
- URL: https://github.com/MolecularAI/PaRoutes
- Owner: MolecularAI
- License: apache-2.0
- Created: 2022-02-24T09:26:01.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-08-05T06:44:21.000Z (9 months ago)
- Last Synced: 2024-09-26T02:01:30.181Z (7 months ago)
- Language: Python
- Size: 1.49 MB
- Stars: 68
- Watchers: 5
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- top-pharma50 - **MolecularAI/PaRoutes** - step retrosynthesis predictions.<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 66 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 5 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> Python <img src='https://github.com/HubTou/topgh/blob/main/icons/license.png'> Apache License 2.0 <img src='https://github.com/HubTou/topgh/blob/main/icons/last.png'> 2022-11-23 14:27:04 | (Ranked by starred repositories)
- top-pharma50 - **MolecularAI/PaRoutes** - step retrosynthesis predictions.<br><img src='https://github.com/HubTou/topgh/blob/main/icons/gstars.png'> 66 <img src='https://github.com/HubTou/topgh/blob/main/icons/forks.png'> 5 <img src='https://github.com/HubTou/topgh/blob/main/icons/code.png'> Python <img src='https://github.com/HubTou/topgh/blob/main/icons/license.png'> Apache License 2.0 <img src='https://github.com/HubTou/topgh/blob/main/icons/last.png'> 2022-11-23 14:27:04 | (Ranked by starred repositories)
README
PaRoutes is a framework for benchmarking multi-step retrosynthesis methods,
i.e. route predictions.It provides:
* A curated reaction dataset for building one-step retrosynthesis models
* Two sets of 10,000 routes
* Two sets of stock molecules to use as stop-criterion for the search
* Scripts to compute route quality and route diversity metrics## Prerequisites
Before you begin, ensure you have met the following requirements:
* Linux, Windows or macOS platforms are supported - as long as the dependencies are supported on these platforms.
* You have installed [anaconda](https://www.anaconda.com/) or [miniconda](https://docs.conda.io/en/latest/miniconda.html) with python 3.7 - 3.9
The tool has been developed on a Linux platform.
## Installation
First clone the repository using Git.
Then execute the following commands in the root of the repository
conda env create -f env.yml
conda activate paroutes-env
python data/download_data.pyNow all the dependencies and datasets are setup.
## Usage
### Performing route predictions
PaRoutes provide a list of targets and stock molecules in SMILES format
for two sets **n1** and **n5**.For **n1** you find in the `data/` folder of the repository
* `n1-targets.txt` - the target molecules
* `n1-stock.txt` - the stock moleculesFor **n5** you find in the `data/` folder of the repository
* `n5-targets.txt` - the target molecules
* `n5-stock.txt` - the stock moleculesFor more information on the files in the `data/` folder, please read the README file for that folder.
### Analysing predictions
The predicted route exported by your software need to be converted to a format
that can be read by the analysis tool. This format is outlined in the `analysis\README.md`The following command for analysis assumes:
1. The current directory is the root of the `paroutes` repo
2. Your route predictions for the **n1** targets in a JSON format is located at `~/output_routes.json`Then you can type
python analysis/route_quality.py --routes ~/output_routes.json --references data/n1-routes.json --output ~/route_analyses.csv
to calculate the route quality metrics. It will print out how many of the targets were solved and the top-1, top-5 and top-10 accuracies (by default). For further details have a look in the `data/README.md` file.
To perform clustering on the same dataset, you can type
python analysis/route_clusters.py --routes ~/output_routes.json --model data/chembl_10k_route_distance_model.ckpt --min_density 2 --output ~/cluster_analyses.json
The script will print out the average number of clusters formed for each target. For further details have a look in the `data/README.md` file.
## Benchmark results
### Results published in the PaRoutes publication (Genheden et al. 2022)
| Search method | Route set | Solved targets | Top-1 | Top-5 | Top-10 | Routes extracted | Number of clusters |
|:----------------|:------------|-----------------:|--------:|--------:|---------:|-------------------:|---------------------:|
| Mcts | set-n1 | 9714 | 0.20 | 0.55 | 0.61 | 273 | 68 |
| Mcts | set-n5 | 9676 | 0.09 | 0.34 | 0.42 | 272 | 77 |
| Retro* | set-n1 | 9726 | 0.17 | 0.48 | 0.54 | 264 | 68 |
| Retro* | set-n5 | 9703 | 0.08 | 0.30 | 0.38 | 149 | 39 |
| DFPN | set-n1 | 8475 | 0.19 | 0.33 | 0.33 | 6 | 2 |
| DFPN | set-n5 | 7382 | 0.08 | 0.14 | 0.14 | 6 | 2 |### Results with the 2.0 version
| Search method | Route set | Solved targets | Top-1 | Top-5 | Top-10 | Routes_extracted | Number of clusters |
|:----------------|:------------|-----------------:|--------:|--------:|---------:|-------------------:|---------------------:|
| Mcts | set-n1 | 9716 | 0.2372 | 0.5107 | 0.5414 | 306 | 109 |
| Mcts | set-n5 | 9689 | 0.1237 | 0.3584 | 0.4056 | 311 | 113 |
| Retro* | set-n1 | 9728 | 0.2027 | 0.4516 | 0.4847 | 154 | 31 |
| Retro* | set-n5 | 9729 | 0.1143 | 0.3365 | 0.3897 | 138 | 26 |
| DFPN | set-n1 | 7786 | 0.1705 | 0.2456 | 0.246 | 5 | 2 |
| DFPN | set-n5 | 6730 | 0.0753 | 0.1146 | 0.1151 | 5 | 2 |**Notes**
- "Top-N" refers to the accuracy, i.e. the capability to recover the reference route among the top-N ranked routes
- "Routes extracted" and "Number of clusters" are median over all targets## Contributing
We welcome contributions, in the form of issues or pull requests.
If you have a question or want to report a bug, please submit an issue.
To contribute with code to the project, follow these steps:
1. Fork this repository.
2. Create a branch: `git checkout -b `.
3. Make your changes and commit them: `git commit -m ''`
4. Push to the remote branch: `git push`
5. Create the pull request.Please use ``black`` package for formatting, and follow ``pep8`` style guide.
## Contributors
* [@SGenheden](https://www.github.com/SGenheden)
* [@EBjerrum](https://www.github.com/EBjerrum)Yasmine Nahal is acknowledged for the creation of the PaRoutes logo.
The contributors have limited time for support questions, but please do not hesitate to submit an issue (see above).
## License
The software is licensed under the Apache 2.0 license (see LICENSE file), and is free and provided as-is.
## References
Genheden, S.; Bjerrum, E. PaRoutes: Towards a Framework for Benchmarking Retrosynthesis Route Predictions. Digit. Discov. 2022, 1 (4), 527–539. https://doi.org/10.1039/D2DD00015F.