{"id":16511336,"url":"https://github.com/BAMeScience/equitrain","last_synced_at":"2025-05-05T09:31:55.939Z","repository":{"id":229070664,"uuid":"775357077","full_name":"pbenner/equitrain","owner":"pbenner","description":"Generic training script for Equiformer V2","archived":false,"fork":false,"pushed_at":"2024-04-24T11:46:41.000Z","size":282,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-05-01T23:40:46.325Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pbenner.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-21T08:36:52.000Z","updated_at":"2024-05-07T11:30:27.526Z","dependencies_parsed_at":"2024-03-21T22:27:47.261Z","dependency_job_id":"dc1cbd53-ea5f-4277-98a3-7f586a060300","html_url":"https://github.com/pbenner/equitrain","commit_stats":null,"previous_names":["pbenner/equitrain"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fequitrain","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fequitrain/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fequitrain/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pbenner%2Fequitrain/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pbenner","download_url":"https://codeload.github.com/pbenner/equitrain/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224438581,"owners_count":17311263,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T15:59:51.300Z","updated_at":"2025-05-05T09:31:55.902Z","avatar_url":"https://github.com/pbenner.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Equitrain: A Unified Framework for Training and Fine-tuning Machine Learning Interatomic Potentials\n\nEquitrain is an open-source software package designed to simplify the training and fine-tuning of machine learning universal interatomic potentials (MLIPs). Equitrain addresses the challenges posed by the diverse and often complex training codes specific to each MLIP by providing a unified and efficient framework. This allows researchers to focus on model development rather than implementation details.\n\n---\n\n## Key Features\n\n- **Unified Framework**: Train and fine-tune MLIPs using a consistent interface.\n- **Flexible Model Wrappers**: Support for different MLIP architectures through model-specific wrappers.\n- **Efficient Preprocessing**: Automated preprocessing with options for computing statistics and managing data.\n- **GPU/Node Scalability**: Seamless integration with multi-GPU and multi-node environments using `accelerate`.\n- **Extensive Resources**: Includes scripts for dataset preparation, initial model setup, and training workflows.\n\n---\n\n## Installation\n\n`equitrain` can be installed in your environment by doing:\n\n```bash\npip install equitrain\n```\n\n**Note!** Until the package is fully deployed in PyPI, you can only install it by following the instructions below.\n\n\n### Development\n\nTo install the package for development purposes, first clone the repository:\n\n```bash\ngit clone https://github.com/BAMeScience/equitrain.git\ncd equitrain/\n```\n\nCreate a virtual environment (either with `conda` or `virtualenv`). Note we are using Python 3.10 to create the environment.\n\n**Using `virtualenv`**\n\nCreate and activate the environment:\n\n```bash\npython3.10 -m venv .venv\nsource .venv/bin/activate\n```\n\nMake sure `pip` is up-to-date:\n\n```bash\npip install --upgrade pip\n```\n\nWe recommend using `uv` for the fast installation of the package:\n\n```bash\npip install uv\nuv pip install -e '.[dev,docu]'\n```\n\n* The `-e` flag makes sure to install the package in editable mode.\n* The `[dev]` optional dependencies install a set of packages used for formatting, typing, and testing.\n* The `[docu]` optional dependencies install the packages for launching the documentation page.\n\n**Using `conda`**\n\nCreate the environment with the settings file `environment.yml`:\n\n```bash\nconda env create -f environment.yml\n```\n\nAnd activate it:\n\n```bash\nconda activate equitrain\n```\n\nThis will automatically install the dependencies. If you want the optional dependencies installed:\n\n```bash\npip install -e '[dev,docu]'\n```\n\nAlternatively, you can create a `conda` environment with Python 3.10 and follow all the steps in the installation explained above when using `virtualenv`:\n\n```bash\nconda create -n equitrain python=3.10 setuptools pip\nconda activate equitrain\n```\n\n---\n\n## Quickstart Guide\n\n### 1. Preprocessing Data\n\nPreprocess data files to compute necessary statistics and prepare for training:\n\n#### Command Line:\n\n```bash\nequitrain-preprocess \\\n    --train-file=\"data-train.xyz\" \\\n    --valid-file=\"data-valid.xyz\" \\\n    --compute-statistics \\\n    --atomic-energies=\"average\" \\\n    --output-dir=\"data\" \\\n    --r-max 4.5\n```\n\n\u003c!-- TODO: change this following a notebook style --\u003e\n#### Python Script:\n\n```python\nfrom equitrain import get_args_parser_preprocess, preprocess\n\ndef test_preprocess():\n    args = get_args_parser_preprocess().parse_args()\n    args.train_file         = 'data.xyz'\n    args.valid_file         = 'data.xyz'\n    args.output_dir         = 'test_preprocess/'\n    args.compute_statistics = True\n    # Compute atomic energies\n    args.atomic_energies    = \"average\"\n    # Cutoff radius for computing graphs\n    args.r_max = 4.5\n\n    preprocess(args)\n\nif __name__ == \"__main__\":\n    test_preprocess()\n```\n\n---\n\n### 2. Training a Model\n\nTrain a model using the prepared dataset and specify the MLIP wrapper:\n\n#### Command Line:\n\n```bash\nequitrain -v \\\n    --train-file data/train.h5 \\\n    --valid-file data/valid.h5 \\\n    --output-dir result \\\n    --model mace.model \\\n    --model-wrapper 'mace' \\\n    --epochs 10 \\\n    --tqdm\n```\n\n\u003c!-- TODO: change this following a notebook style --\u003e\n#### Python Script:\n\n```python\nfrom equitrain import get_args_parser_train, train\nfrom equitrain.model_wrappers import MaceWrapper\n\ndef test_train_mace():\n    args = get_args_parser_train().parse_args()\n    args.train_file  = 'data/train.h5'\n    args.valid_file  = 'data/valid.h5'\n    args.output_dir  = 'test_train_mace'\n    args.epochs      = 10\n    args.batch_size  = 64\n    args.lr          = 0.01\n    args.verbose     = 1\n    args.tqdm        = True\n    args.model       = MaceWrapper(args, \"mace.model\")\n\n    train(args)\n\nif __name__ == \"__main__\":\n    test_train_mace()\n```\n\n---\n\n### 3. Making Predictions\n\nUse a trained model to make predictions on new data:\n\n\u003c!-- TODO: change this following a notebook style --\u003e\n#### Python Script:\n\n```python\nfrom equitrain import get_args_parser_predict, predict\nfrom equitrain.model_wrappers import MaceWrapper\n\ndef test_mace_predict():\n    args = get_args_parser_predict().parse_args()\n    args.predict_file = 'data/valid.h5'\n    args.batch_size   = 64\n    args.model        = MaceWrapper(args, \"mace.model\")\n\n    energy_pred, forces_pred, stress_pred = predict(args)\n\n    print(energy_pred)\n    print(forces_pred)\n    print(stress_pred)\n\nif __name__ == \"__main__\":\n    test_mace_predict()\n```\n\n---\n\n## Advanced Features\n\n### Multi-GPU and Multi-Node Training\n\nEquitrain supports multi-GPU and multi-node training using `accelerate`. Example scripts are available in the `resources/training` directory.\n\n### Dataset Preparation\n\nEquitrain provides scripts for downloading and preparing popular datasets such as Alexandria and MPTraj. These scripts can be found in the `resources/data` directory.\n\n### Pretrained Models\n\nInitial model examples and configurations can be accessed in the `resources/models` directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBAMeScience%2Fequitrain","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBAMeScience%2Fequitrain","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBAMeScience%2Fequitrain/lists"}