https://github.com/datamol-io/molfeat
molfeat - the hub for all your molecular featurizers
https://github.com/datamol-io/molfeat
Last synced: 27 days ago
JSON representation
molfeat - the hub for all your molecular featurizers
- Host: GitHub
- URL: https://github.com/datamol-io/molfeat
- Owner: datamol-io
- License: apache-2.0
- Created: 2023-03-13T19:39:29.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2025-04-01T15:50:32.000Z (about 1 month ago)
- Last Synced: 2025-04-11T00:34:26.540Z (28 days ago)
- Language: Python
- Homepage: https://molfeat.datamol.io
- Size: 11.2 MB
- Stars: 202
- Watchers: 9
- Forks: 22
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: .github/SECURITY.md
Awesome Lists containing this project
- best-of-atomistic-machine-learning - GitHub - 23% open ยท โฑ๏ธ 27.11.2024): (General Tools)
README
![]()
molfeat - the hub for all your molecular featurizers
---
[](https://zenodo.org/badge/latestdoi/613548667)
[](https://pypi.org/project/molfeat/)
[](https://anaconda.org/conda-forge/molfeat)
[](https://pypi.org/project/molfeat/)
[](https://anaconda.org/conda-forge/molfeat)
[](https://pypi.org/project/molfeat/)
[](https://github.com/datamol-io/molfeat/blob/main/LICENSE)
[](https://github.com/datamol-io/molfeat/stargazers)
[](https://github.com/datamol-io/molfeat/network/members)
[](https://github.com/datamol-io/molfeat/actions/workflows/test.yml)
[](https://github.com/datamol-io/molfeat/actions/workflows/code-check.yml)
[](https://github.com/datamol-io/molfeat/actions/workflows/doc.yml)
[](https://github.com/datamol-io/molfeat/actions/workflows/release.yml)Molfeat is a hub of molecular featurizers. It supports a wide variety of out-of-the-box molecular featurizers and can be easily extended to include your own custom featurizers.
- ๐ Fast, with a simple and efficient API.
- ๐ Unify pre-trained molecular embeddings and hand-crafted featurizers in a single package.
- โ Easily add your own featurizers through plugins.
- ๐ Benefit from increased performance through a trouble-free caching system.Visit our website at .
## Installation
### Installing Molfeat
Use mamba:
```bash
mamba install -c conda-forge molfeat
```_**Tips:** You can replace `mamba` by `conda`._
_**Note:** We highly recommend using a [Conda Python distribution](https://github.com/conda-forge/miniforge) to install Molfeat. The package is also pip installable if you need it: `pip install molfeat`._
### Optional dependencies
Not all featurizers in the Molfeat core package are supported by default. Some featurizers require additional dependencies. If you try to use a featurizer that requires additional dependencies, Molfeat will raise an error and tell you which dependencies are missing and how to install them.
- To install `dgl`: run `mamba install -c dglteam "dgl<=2.0"` # there is some issue with "dgl>2.0.0" related to graphbolt
- To install `dgllife`: run `mamba install -c conda-forge dgllife`
- To install `fcd_torch`: run `mamba install -c conda-forge fcd_torch`
- To install `pyg`: run `mamba install -c conda-forge pytorch_geometric`
- To install `graphormer-pretrained`: run `mamba install -c conda-forge graphormer-pretrained`
- To install `map4`: see
- To install `bio-embeddings`: run `mamba install -c conda-forge 'bio-embeddings >=0.2.2'`If you install Molfeat using pip, there are optional dependencies that can be installed with the main package. For example, `pip install "molfeat[all]"` allows installing all the compatible optional dependencies for small molecule featurization. There are other options such as `molfeat[dgl]`, `molfeat[graphormer]`, `molfeat[transformer]`, `molfeat[viz]`, and `molfeat[fcd]`. See the [optional-dependencies](https://github.com/datamol-io/molfeat/blob/main/pyproject.toml#L60) for more information.
### Installing Plugins
The functionality of Molfeat can be extended through plugins. The use of a plugin system ensures that the core package remains easy to install and as light as possible, while making it easy to extend its functionality with plug-and-play components. Additionally, it ensures that plugins can be developed independently from the core package, removing the bottleneck of a central party that reviews and approves new plugins. Consult the molfeat documentation for more details on how to [create](docs/developers/create-plugin.md) your own plugins.
However, this does imply that the installation of a plugin is plugin-dependent: please consult the relevant documentation to learn more.
## API tour
```python
import datamol as dm
from molfeat.calc import FPCalculator
from molfeat.trans import MoleculeTransformer
from molfeat.store.modelstore import ModelStore# Load some dummy data
data = dm.data.freesolv().sample(100).smiles.values# Featurize a single molecule
calc = FPCalculator("ecfp")
calc(data[0])# Define a parallelized featurization pipeline
mol_transf = MoleculeTransformer(calc, n_jobs=-1)
mol_transf(data)# Easily save and load featurizers
mol_transf.to_state_yaml_file("state_dict.yml")
mol_transf = MoleculeTransformer.from_state_yaml_file("state_dict.yml")
mol_transf(data)# List all available featurizers
store = ModelStore()
store.available_models# Find a featurizer and learn how to use it
model_card = store.search(name="ChemBERTa-77M-MLM")[0]
model_card.usage()
```## How to cite
Please cite Molfeat if you use it in your research: [](https://zenodo.org/badge/latestdoi/613548667).
## Contribute
See [developers](docs/developers/) for a comprehensive guide on how to contribute to `molfeat`. `molfeat` is a community-led
initiative and whether you're a first-time contributor or an open-source veteran, this project greatly benefits from your contributions.
To learn more about the community and [datamol.io](https://datamol.io/) ecosystem, please see [community](docs/community/).## Maintainers
- @cwognum
- @maclandrol
- @hadim## License
Under the Apache-2.0 license. See [LICENSE](LICENSE).