{"id":15681422,"url":"https://github.com/jacksonburns/fastprop","last_synced_at":"2025-04-19T12:48:04.230Z","repository":{"id":221438334,"uuid":"717804553","full_name":"JacksonBurns/fastprop","owner":"JacksonBurns","description":"Fast Molecular Property Prediction with mordredcommunity","archived":false,"fork":false,"pushed_at":"2025-04-11T00:50:27.000Z","size":9810,"stargazers_count":22,"open_issues_count":0,"forks_count":4,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-12T08:32:44.045Z","etag":null,"topics":["chemistry","machine-learning","qspr"],"latest_commit_sha":null,"homepage":"https://jacksonburns.github.io/fastprop/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JacksonBurns.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-11-12T16:48:12.000Z","updated_at":"2025-04-11T18:29:36.000Z","dependencies_parsed_at":"2024-02-14T03:20:33.644Z","dependency_job_id":"499e8632-ee3c-463c-9da4-fca2c5303fa9","html_url":"https://github.com/JacksonBurns/fastprop","commit_stats":null,"previous_names":["jacksonburns/fastprop"],"tags_count":2,"template":false,"template_full_name":"JacksonBurns/blank-python-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JacksonBurns%2Ffastprop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JacksonBurns%2Ffastprop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JacksonBurns%2Ffastprop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JacksonBurns%2Ffastprop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JacksonBurns","download_url":"https://codeload.github.com/JacksonBurns/fastprop/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249698637,"owners_count":21312242,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chemistry","machine-learning","qspr"],"created_at":"2024-10-03T16:54:17.959Z","updated_at":"2025-04-19T12:48:04.199Z","avatar_url":"https://github.com/JacksonBurns.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e  \n  \u003cimg alt=\"fastprop Logo\" height=\"400\" src=\"https://raw.githubusercontent.com/JacksonBurns/fastprop/main/fastprop_logo.png\"\u003e\n\u003c/p\u003e\n\u003ch2 align=\"center\"\u003eMolecular Property Prediction with \u003ca href=\"https://github.com/JacksonBurns/mordred-community\"\u003emordredcommunity\u003c/a\u003e\u003c/h2\u003e\n\u003ch3 align=\"center\"\u003eFast, Scalable, and \u003c500 LOC\u003c/h3\u003e\n \n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"GitHub Repo Stars\" src=\"https://img.shields.io/github/stars/JacksonBurns/fastprop?style=social\"\u003e\n  \u003cimg alt=\"PyPI - Downloads\" src=\"https://img.shields.io/pypi/dm/fastprop\"\u003e\n  \u003cimg alt=\"PyPI\" src=\"https://img.shields.io/pypi/v/fastprop\"\u003e\n  \u003cimg alt=\"PyPI - License\" src=\"https://img.shields.io/github/license/JacksonBurns/fastprop\"\u003e\n\u003c/p\u003e\n\n# Announcements\n## alphaXiv Paper\nThe companion academic paper describing `fastprop` is freely available online at [alphaXiv](https://www.alphaxiv.org/abs/2404.02058).\nThe source for the paper is stored in this repository under the `paper` directory.\n\n## Initial Release :tada:\n`fastprop` version 1 is officially released, meaning the API is now stable and ready for production use!\nPlease try `fastprop` on your datasets and let us know what you think.\nFeature requests and bug reports are **very** appreciated!\n\n# Installing `fastprop`\n`fastprop` supports Mac, Windows, and Linux on Python versions 3.8 or newer.\nInstalling from `pip` is the best way to get `fastprop`, but if you need to check out a specific GitHub branch or you want to contribute to `fastprop` a source installation is recommended.\nPending interest from users, a `conda` package will be added.\n\nCheck out the demo notebook for quick intro to `fastprop` via Google Colab - runs in your browser, GPU included, no install required: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JacksonBurns/fastprop/blob/main/fastprop_demo.ipynb)\n\n## `pip` [recommended]\n`fastprop` is available via PyPI with `pip install fastprop`.\n\nTo make extending `fastprop` easier and keep the installation size down, dependencies required for hyperparameter optimization and SHAP analysis are _optional_.\nThey can be installed with `pip install fastprop[hopt]`, `pip install fastprop[shap]`, or `pip install fastprop[shap,hopt]` to install them both.\nIf you want to use `fastprop` but not write new code on top of it, you may want to install these now - you can always do so later, however, and `fastprop` will remind you.\n\n## Source\nTo install `fastprop` from GitHub directly you can:\n 1. Run `pip install https://github.com/JacksonBurns/fastprop.git@main` to install from the `main` branch (or specify any other branch you like).\n 2. Clone the repository with `git clone https://github.com/JacksonBurns/fastprop.git`, navigate to `fastprop` with `cd fastprop`, and run `pip install .`\n\nTo contribute to `fastprop` please follow [this tutorial](https://opensource.com/article/19/7/create-pull-request-github) (or something similar) to set up a forked version of `fastprop` and open a pull request (similar to above option 2).\nAll contributions are appreciated!\nSee [Developing `fastprop`](#developing-fastprop) for more details.\n\n# About `fastprop`\n`fastprop` is a package for performing deep-QSPR (Quantitative Structure-Property Relationship) with minimal user intervention.\nBy passing in a list of SMILES strings, `fastprop` will automatically generate and cache a set of molecular descriptors using [`mordredcommunity`](https://github.com/JacksonBurns/mordred-community) and train an FNN to predict the corresponding properties.\nSee the `examples` and `benchmarks` directories to see how to run training - the rest of this documentation will focus on how you can run, configure, and customize `fastprop`.\n\n## `fastprop` Framework\nThere are four distinct steps in `fastprop` that define its framework:\n 1. Featurization - transform the input molecules (as SMILES strings) into an array of molecular descriptors which are saved\n 2. Preprocessing - clean the descriptors by removing or imputing missing values then rescaling the remainder\n 3. Training - send the processed input to the neural network, which is a simple FNN (sequential fully-connected layers with an activation function between), optionally limiting the inputs to +/-3 standard deviations to aid in extrapolation\n 4. Prediction - save the trained model for future use\n\n## Configurable Parameters\n 1. Featurization\n    - Input CSV file: comma separated values (CSV) file (with headers) containing SMILES strings representing the molecules and the targets\n    - SMILES column name: name of the column containing the SMILES strings\n    - Target column name(s): name(s) of the columns containing the targets\n\n    _and_\n    - Which `mordred` descriptors to calculate: 'all' or 'optimized' (a smaller set of descriptors; faster, but less accurate).\n    - Enable/Disable caching of calculated descriptors: `fastprop` will by default cache calculated descriptors based on the input filename and warn the user when it loads descriptors from the file rather than calculating on-the-fly\n\n    _or_\n    - Load precomputed descriptors: filepath to where descriptors are already cached either manually or by `fastprop`\n 2. Preprocessing\n    - standardize: call `rdkit`'s `rdMolStandardize.Cleanup` function on the input molecules before calculating descriptors (`False` by default)\n    - _not configurable_: `fastprop` will always rescale input features, set invariant and missing features to zero, and impute missing values with the per-feature mean\n 3. Training\n    - Number of Repeats: How many times to split/train/test on the dataset (increments random seed by 1 each time).\n\n    _and_\n    - Number of FNN layers (default 2; repeated fully connected layers of hidden size)\n    - Hidden Size: number of neurons per FNN layer (default 1800)\n    - Clamp Input: Enable/Disable input clamp to +/-3 (winsorization) to aid in extrapolation (default False).\n\n    _or_\n    - Hyperparameter optimization: runs hyperparameter optimization identify the optimal number of layers and hidden size\n\n    _generic NN training parameters_\n    - Output Directory\n    - Learning rate\n    - Batch size\n    - Problem type (one of: regression, binary, multiclass (start labels from 0), multilabel)\n    - Training, Validation, and Testing fraction (set testing to zero to use all data for training and validation)\n 4. Prediction\n    - Input SMILES: either a single SMILES or file of SMILES strings on individual lines\n    - Output format: filepath to write the results or nothing, defaults to stdout\n\n# Using `fastprop`\n`fastprop` can be run from the command line or as a Python module.\nRegardless of the method of use the parameters described in [Configurable Parameters](#configurable-parameters) can be modified.\n\n## Command Line\nAfter installation, `fastprop` is accessible from the command line via `fastprop subcommand`, where `subcommand` is either `train`, `predict`, or `shap`.\n - `train` takes in the parameters described in [Configurable Parameters](#configurable-parameters) sections 1, 2, and 3 (featurization, preproccessing, and training) and trains `fastprop` model(s) on the input data.\n - `predict` uses the output of a call to `train` to make prediction on arbitrary SMILES strings.\n - `shap` performs SHAP analysis on a trained model to determine which of the input features are important.\n\nTry `fastprop --help` or `fastprop subcommand --help` for more information and see below.\n\n\u003e [!TIP]\n\u003e `fastprop` will use all of your CPUs for descriptor calculation by default - set the `MORDRED_NUM_PROC` environment variable to some other number to change this behavior.\n\n### Configuration File [recommended]\nSee `examples/example_fastprop_train_config.yaml` for configuration files that show all options that can be configured during training.\nIt is everything shown in the [Configurable Parameters](#configurable-parameters) section.\n\n### Arguments\nAll of the options shown in the [Configuration File](#configuration-file-recommended) section can also be passed as command line flags instead of written to a file.\nWhen passing the arguments, replace all `_` (underscore) with `-` (hyphen), i.e. `fastprop train --number-epochs 100`\nSee `fastprop train --help` or `fastprop predict --help` for more information.\n\n`fastprop shap` and `fastprop predict` have only a couple arguments and so do not use configuration files.\n\n## Python Module\n\n### Example\nSee `examples/fastprop_computational_adme_demo.ipynb`, `benchmarks/quantumscents/quantumscents.py`, and `benchmarks/fubrain/delta_fubrain.py`.\n\n### Package Structure\nThis section documents where the various modules and functions used in `fastprop` are located.\nCheck each file listed for more information, as each contains additional inline documentation useful for development as a Python module.\nTo use the core `fastprop` model and dataloaders in your own work, consider looking at `shap.py` or `train.py` which show how to import and instantiate the relevant classes.\n\n#### `fastprop`\n - `defaults`: contains the function `init_logger` used to initialize loggers in different submodules, as well as the default configuration for training.\n - `model`: the model itself and a convenience function for training.\n - `metrics`: wraps a number of common loss and score functions.\n - `descriptors`: functions for calculating descriptors.\n - `data`: functions for cleaning and scaling data.\n - `io`: functions for loading data from files.\n\n#### `fastprop.cli`\n`fastprop_cli`` contains all the CLI code which is likely not useful in use from a script.\nIf you wish to extend the CLI, check the inline documentation there.\n\n# Benchmarks\nThe `benchmarks` directory contains the scripts needed to perform the studies (see `benchmarks/README.md` for more detail, they are a great way to learn how to use `fastprop`).\nTo just see the results, checkout [`paper/paper.pdf`](https://github.com/JacksonBurns/fastprop/blob/main/paper/paper.pdf) (or `paper/paper.md` for the plain text version).\n\n# Relationship to Chemprop\nIn addition to having a similar name, `fastprop` and Chemprop do a similar things: map chemical structures to their corresponding properties in a user-friendly way using machine learning.\nI ([@JacksonBurns](https://github.com/jacksonburns)) am also a developer of Chemprop so some code is inevitably shared between the two (`fastprop` to Chemprop and vice versa).\n\n`fastprop` _feels_ a lot like Chemprop but without a lot of the clutter.\nThe `fast` in `fastprop` (both in usage and execution time) comes from the basic architecture, the use of caching, and the reduced configurability of `fastprop` (i.e. I hope you like MSE loss for regression tasks, because that's the only training metric `fastprop` will use via the CLI).\n\n# Developing `fastprop`\nBug reports, feature requests, and pull requests are welcome and encouraged!\nFollow [this tutorial from GitHub](https://docs.github.com/en/get-started/exploring-projects-on-github/contributing-to-a-project) to get started.\n\n`fastprop` is built around PyTorch lightning, which defines a rigid API for implementing models that is followed here.\nSee the [section on the package layout](#python-module) for information on where all the other functions are, and check out the docstrings and inline comments in each file for more information on what each does.\n\nNote that the `pyproject.toml` defines optional `dev` and `bmark` packages, which will get you setup with the same dependencies used for CI and benchmarking.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonburns%2Ffastprop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjacksonburns%2Ffastprop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjacksonburns%2Ffastprop/lists"}