Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/MolecularAI/route-distances

Tools and routines to calculate distances between synthesis routes and to cluster them.
https://github.com/MolecularAI/route-distances

astrazeneca cheminformatics clustering

Last synced: 13 days ago
JSON representation

Tools and routines to calculate distances between synthesis routes and to cluster them.

Awesome Lists containing this project

README

        

# route-distances

[![License](https://img.shields.io/github/license/MolecularAI/route-distances)](https://github.com/MolecularAI/route-distances/blob/master/LICENSE)
[![Tests](https://github.com/MolecularAI/route-distances/workflows/tests/badge.svg)](https://github.com/MolecularAI/route-distances/actions?workflow=tests)
[![codecov](https://codecov.io/gh/MolecularAI/route-distances/branch/master/graph/badge.svg)](https://codecov.io/gh/MolecularAI/route-distances)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)

This repository contains tools and routines to calculate distances between synthesis routes and to cluster them.

This repository is mainly intended for developers and researchers. If you want a fully functional tool that is easy to use,
please consider looking into the *AiZynthFinder* project.

## Prerequisites

Before you begin, ensure you have met the following requirements:

* Linux, Windows or macOS platforms are supported - as long as the dependencies are supported on these platforms.

* You have installed [anaconda](https://www.anaconda.com/) or [miniconda](https://docs.conda.io/en/latest/miniconda.html) with python 3.9 to 3.11

The tool has been developed on a Linux platform, but the software has been tested on Windows 10 and macOS Catalina.

## Installation

### For users

Setup your python environment and then run

pip install route-distances

### For developers

First clone the repository using Git.

Then execute the following commands in the root of the repository

conda env create -f conda-env.yml
conda activate routes-env
poetry install

the `route_distances` package is now installed in editable mode.

## Usage

The tool will install the `cluster_aizynth_output` that is used
to calculate distances and clusters for AiZynthFinder output

cluster_aizynth_output --files finder_output1.hdf5 finder_output2.hdf5 --output finder_distances.hdf5 --nclusters 0 --model ted

This will perform TED calculations and add a column `distance_matrix` with the distances and column `cluster_labels` with the cluster labels for each route to the output file.

An ML model for fast predictions can be found here: [https://zenodo.org/record/4925903](https://zenodo.org/record/4925903).

This can be used with the `cluster_aizynth_output` tool

cluster_aizynth_output --files finder_output1.hdf5 finder_output2.hdf5 --output finder_distances.hdf5 --nclusters 0 --model chembl_10k_route_distance_model.ckpt

For further details, please consult the [documentation](https://molecularai.github.io/route-distances/).

## Development

### Testing

Tests uses the ``pytest`` package, and is installed by `poetry`

Run the tests using:

pytest -v


### Documentation generation

The documentation is generated by Sphinx from hand-written tutorials and docstrings

The HTML documentation can be generated by

invoke build-docs

## Contributing

We welcome contributions, in the form of issues or pull requests.

If you have a question or want to report a bug, please submit an issue.

To contribute with code to the project, follow these steps:

1. Fork this repository.
2. Create a branch: `git checkout -b `.
3. Make your changes and commit them: `git commit -m ''`
4. Push to the remote branch: `git push`
5. Create the pull request.

Please use ``black`` package for formatting, and follow ``pep8`` style guide.

## Contributors

* Samuel Genheden

The contributors have limited time for support questions, but please do not hesitate to submit an issue (see above).

## License

The software is licensed under the MIT license (see LICENSE file), and is free and provided as-is.

## References

1. Genheden S, Engkvist O, Bjerrum E (2021) Clustering of synthetic routes using tree edit distance. J. Chem. Inf. Model. 61:3899–3907 [https://doi.org/10.1021/acs.jcim.1c00232](https://doi.org/10.1021/acs.jcim.1c00232)
2. Genheden S, Engkvist O, Bjerrum E (2022) Fast prediction of distances between synthetic routes with deep learning. Mach. Learn. Sci. Technol. 3:015018 [https://doi.org/10.1088/2632-2153/ac4a91](https://doi.org/10.1088/2632-2153/ac4a91)