Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/giotto-ai/molecule_bond_prediction
Predict scalar coupling in molecules
https://github.com/giotto-ai/molecule_bond_prediction
molecular-structures topological-data-analysis
Last synced: about 1 month ago
JSON representation
Predict scalar coupling in molecules
- Host: GitHub
- URL: https://github.com/giotto-ai/molecule_bond_prediction
- Owner: giotto-ai
- License: other
- Created: 2019-12-02T11:47:53.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-03-14T04:56:42.000Z (almost 4 years ago)
- Last Synced: 2023-03-06T12:57:01.490Z (almost 2 years ago)
- Topics: molecular-structures, topological-data-analysis
- Language: Python
- Homepage:
- Size: 5.68 MB
- Stars: 12
- Watchers: 3
- Forks: 5
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Predicting Molecular Bond Strengths with Topological Machine Learning
## What is it?
This repository showcases the core functionalities of
[giotto-learn](https://github.com/giotto-ai/giotto-learn), a Python library for
topological machine learning. The accompanying blog post can be found [here](https://medium.com/p/getting-started-with-giotto-learn-a-python-library-for-topological-machine-learning-451d88d2c4bc?source=email-22cbcf87e250--writer.postDistributed&sk=c5dffceb84a1108ea916f136f519258e).This demo is based on the [Predicting Molecular
Properties](https://www.kaggle.com/c/champs-scalar-coupling/overview)
competition on Kaggle, where the task is to predict the bond strength between atoms in molecules.## Getting started
The easiest way to get started is to create a conda environment as follows:
```
conda create python=3.7 --name molecule -y
conda activate molecule
pip install -r requirements.txt
```## Results
The scoring function is described on Kaggle and is calculated as follows:
where:
* ![T](https://render.githubusercontent.com/render/math?math=T) is the number of coupling types
* ![n_t](https://render.githubusercontent.com/render/math?math=n_t) is the number of observations of type t
* ![y_i](https://render.githubusercontent.com/render/math?math=y_i) is the actual coupling value for this sample
* ![](https://render.githubusercontent.com/render/math?math=%5Chat%7By_i%7D) is the predicted coupling value for this sampleThe figure below summarizes the results and gives a comparison of the results
with and without topological features.
## External code
The following Kaggle notebooks were used for this project:* For non-topological features: https://www.kaggle.com/robertburbidge/distance-features
* For plotting molecules (but adapted): https://www.kaggle.com/mykolazotko/3d-visualization-of-molecules-with-plotly## Some related publications
To get an introduction to the application of topological data analysis to
machine learning, see:
* An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists: https://arxiv.org/pdf/1710.04019.pdfThe idea to use topological data analysis for predictions on molecules is not new. Below you can find some interesting papers related to this:
* Persistent-Homology-based Machine Learning and its Applications – A Survey: https://arxiv.org/abs/1811.00252 (esp. section 5)
* Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening: https://arxiv.org/pdf/1708.08135.pdfThe following papers were used to get some inspiration for the feature creation:
* The Ring of Algebraic Functions on Persistence Bar Codes: https://arxiv.org/pdf/1304.0530.pdf
* A topological approach for protein classification: https://arxiv.org/pdf/1510.00953.pdf