https://github.com/dls5-omics/multimolecule
Accelerate Molecular Biology Research with Machine Learning
https://github.com/dls5-omics/multimolecule
ai4science machine-learning molecular-biology
Last synced: 4 months ago
JSON representation
Accelerate Molecular Biology Research with Machine Learning
- Host: GitHub
- URL: https://github.com/dls5-omics/multimolecule
- Owner: DLS5-Omics
- License: agpl-3.0
- Created: 2024-02-27T07:14:35.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2026-02-25T20:39:49.000Z (4 months ago)
- Last Synced: 2026-02-25T22:40:39.018Z (4 months ago)
- Topics: ai4science, machine-learning, molecular-biology
- Language: Python
- Homepage: https://multimolecule.danling.org/
- Size: 35.5 MB
- Stars: 50
- Watchers: 3
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: license-faq.md
Awesome Lists containing this project
README
# [MultiMolecule](https://multimolecule.danling.org)
> [!TIP]
> Accelerate Molecular Biology Research with Machine Learning
[](https://doi.org/10.5281/zenodo.15119050)
[](https://app.codacy.com/gh/DLS5-Omics/multimolecule/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade)
[](https://app.codacy.com/gh/DLS5-Omics/multimolecule/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_coverage)
[](https://codecov.io/gh/DLS5-Omics/multimolecule)
[](https://pypi.org/project/multimolecule)
[](https://pypi.org/project/multimolecule)
[](https://multimolecule.danling.org)
[](https://www.gnu.org/licenses/agpl-3.0)
## Introduction
Welcome to MultiMolecule (浦原), a foundational library designed to accelerate scientific research in molecular biology through machine learning.
MultiMolecule provides a comprehensive yet flexible set of tools for researchers aiming to leverage AI with ease, focusing on biomolecular data (RNA, DNA, and protein).
## Overview
MultiMolecule is built with flexibility and ease of use in mind.
Its modular design allows you to utilize only the components you need, integrating seamlessly into your existing workflows without adding unnecessary complexity.
- [`data`](data): Smart [`Dataset`][multimolecule.data.Dataset] that automatically infer tasks—including their level (sequence, token, contact) and type (classification, regression). Provides multi-task datasets and samplers to facilitate multitask learning without additional configuration.
- [`datasets`](datasets): A collection of widely-used biomolecular datasets.
- [`modules`](modules): Modular neural network building blocks, including [embeddings](modules/embeddings), [heads](modules/heads), and criterions for constructing custom models.
- [`models`](models): Implementation of state-of-the-art pre-trained models in molecular biology.
- [`tokenisers`](tokenisers): Tokenizers to convert DNA, RNA, protein and other sequences to one-hot encodings.
## Installation
Install the most recent stable version on PyPI:
```shell
pip install multimolecule
```
Install the latest version from the source:
```shell
pip install git+https://github.com/DLS5-Omics/MultiMolecule
```
## Citation
> [!NOTE]
> The artifacts distributed in this repository are part of the MultiMolecule project.
> If you use MultiMolecule in your research, you must cite the MultiMolecule project as follows:
```bibtex
@software{chen_2024_12638419,
author = {Chen, Zhiyuan and Zhu, Sophia Y.},
title = {MultiMolecule},
doi = {10.5281/zenodo.12638419},
publisher = {Zenodo},
url = {https://doi.org/10.5281/zenodo.12638419},
year = 2024,
month = may,
day = 4
}
```
## License
We believe openness is the Foundation of Research.
MultiMolecule is licensed under the [GNU Affero General Public License](license.md).
For additional terms and clarifications, please refer to our [License FAQ](license-faq.md).
Please join us in building an open research community.
`SPDX-License-Identifier: AGPL-3.0-or-later`