{"id":19984489,"url":"https://github.com/intellabs/matsciml","last_synced_at":"2025-04-04T18:08:41.853Z","repository":{"id":63439221,"uuid":"536298092","full_name":"IntelLabs/matsciml","owner":"IntelLabs","description":"Open MatSci ML Toolkit is a framework for prototyping and scaling out deep learning models for materials discovery supporting widely used materials science datasets, and built on top of PyTorch Lightning, the Deep Graph Library, and PyTorch Geometric.","archived":false,"fork":false,"pushed_at":"2024-10-29T21:07:41.000Z","size":37247,"stargazers_count":145,"open_issues_count":30,"forks_count":21,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-10-29T23:37:39.145Z","etag":null,"topics":["ai","dgl","pytorch","pytorch-lightning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IntelLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":"Security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-13T20:27:28.000Z","updated_at":"2024-10-29T21:07:46.000Z","dependencies_parsed_at":"2024-02-12T23:30:46.788Z","dependency_job_id":"b7698dcf-8e2b-4935-8cdb-206b11a50fe2","html_url":"https://github.com/IntelLabs/matsciml","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2Fmatsciml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2Fmatsciml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2Fmatsciml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2Fmatsciml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IntelLabs","download_url":"https://codeload.github.com/IntelLabs/matsciml/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247226215,"owners_count":20904465,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","dgl","pytorch","pytorch-lightning"],"created_at":"2024-11-13T04:19:10.963Z","updated_at":"2025-04-04T18:08:41.836Z","avatar_url":"https://github.com/IntelLabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003ch1 align=\"center\"\u003eOpen MatSci ML Toolkit : A Broad, Multi-Task Benchmark for Solid-State Materials Modeling\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![Documentation](https://readthedocs.org/projects/matsciml/badge/?version=latest)](https://matsciml.readthedocs.io/en/latest/?badge=latest)\n[![Datasets on Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.10768743.svg)](https://doi.org/10.5281/zenodo.10768743)\n[![lightning](https://img.shields.io/badge/Lightning-v2.4.0%2B-792ee5?logo=pytorchlightning)](https://lightning.ai/docs/pytorch/1.8.6)\n[![pytorch](https://img.shields.io/badge/PyTorch-v2.4.0%2B-red?logo=pytorch)](https://pytorch.org/get-started/locally/)\n[![dgl](https://img.shields.io/badge/DGL-v2.0%2B-blue?logo=dgl)](https://docs.dgl.ai/en/latest/)\n[![pyg](https://img.shields.io/badge/PyG-2.4.0%2B-red?logo=pyg)](https://pytorch-geometric.readthedocs.io/en/2.3.1/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![matsciml-preprint](https://img.shields.io/badge/TMLR-Open_MatSciML_Toolkit-blue)](https://openreview.net/forum?id=QBMyDZsPMd)\n[![hpo-paper](https://img.shields.io/badge/OpenReview-AI4Mat_2022_HPO-blue)](https://openreview.net/forum?id=_7bEq9JQKIJ)\n\n\u003c/div\u003e\n\nThis is the implementation of the MatSci ML benchmark, which includes ~1.5 million ground-state materials collected from various datasets, as well as integration of the OpenCatalyst dataset supporting diverse data format (point cloud, DGL graphs, PyG graphs), learning methods (single task, multi-task, multi-data) and deep learning models. Primary project contributors include: Santiago Miret (Intel Labs), Kin Long Kelvin Lee (Intel AXG), Carmelo Gonzales (Intel Labs), Mikhail Galkin (Intel Labs), Marcel Nassar (Intel Labs), Matthew Spellings (Vector Institute).\n\n### News\n\n- [2024/08/23] [Readthedocs](https://matsciml.readthedocs.io/en/latest/) is now online!\n- [2023/09/27] Release of [pre-packaged lmdb-based datasets](https://zenodo.org/record/8381476) from v1.0.0 via Zenodo.\n- [2023/08/31] Initial release of the MatSci ML Benchmark with integration of ~1.5 million ground state materials.\n- [2023/07/31] The Open MatSci ML Toolkit : A Flexible Framework for Deep Learning on the OpenCatalyst Dataset paper is accepted into TMLR. See previous version for code related to the benchmark.\n\n### Introduction\n\nThe MatSci ML Benchmark contains diverse sets of tasks (energy prediction, force prediction, property prediction) across a broad range of datasets (OpenCatalyst Project [1], Materials Project [2], LiPS [3], OQMD [4], NOMAD [5], Carolina Materials Database [6]). Most of the data is related to energy prediction task, which is the most common property tracked for most materials systems in the literature. The codebase support single-task learning, as well as multi-task (training one model for multiple tasks within a dataset) and multi-date (training a model across multiple datsets with a common property). Additionally, we provide a generative materials pipeline that applies diffusion models (CDVAE [7]) to generate new unit cells.\n\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./docs/MatSci-ML-Benchmark-Table.png\"/\u003e\n\u003c/p\u003e\n\nThe package follows the original design principles of the Open MatSci ML Toolkit, including:\n- Ease of use for new ML researchers and practitioners that want get started on interacting with the OpenCatalyst dataset.\n- Scalable computation of experiments leveraging [PyTorch Lightning](https://www.pytorchlightning.ai/) across different computation capabilities (laptop, server, cluster) and hardware platforms (CPU, GPU, XPU) without sacrificing performance in the compute and modeling.\n- Integrating support for [DGL](https://docs.dgl.ai/en/0.9.x/) and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/) for rapid GNN development.\n\nThe examples outlined in the next section how to get started with Open MatSci ML Toolkit using simple Python scripts, Jupyter notebooks, or the PyTorch Lightning CLI for a simple training on a portable subset of the original dataset (dev-set) that can be run on a laptop. Subsequently, we scale our example python script to large compute systems, including distributed data parallel training (multiple GPU on a single node) and multi-node training (multiple GPUs across multiple nodes) in a computing cluster. Leveraging both PyTorch Lightning and DGL capabilities, we can enable the compute and experiment scaling with minimal additional complexity.\n\n### Installation\n\n- `Docker`: We provide a Dockerfile inside the `docker` that can be run to install a container using standard docker commands.\n- `mamba`: We have included a `mamba` specification that provides a complete out-of-the-box installation. Run `mamba env create -n matsciml --file conda.yml`, and will install all dependencies and `matsciml` as an editable install.\n- `pip`: In this case, we assume you are bringing your own virtual environment. Depending on what hardware platform you have, you can copy-paste the following commands; because the absolute mess that is modern Python packaging, these commands include the URLs for binary distributions of PyG and DGL graph backends.\n\nFor CPU only (good for local laptop development):\n\n```console\npip install -f https://data.pyg.org/whl/torch-2.4.0+cpu.html -f https://data.dgl.ai/wheels/torch-2.4/repo.html -e './[all]'\n```\n\nFor XPU usage, you will need to install PyTorch separately first, followed by `matsciml`; note that the PyTorch version is lower\nas 2.3.1 is the latest XPU binary distributed.\n\n```console\npip install torch==2.3.1+cxx11.abi torchvision==0.18.1+cxx11.abi torchaudio==2.3.1+cxx11.abi intel-extension-for-pytorch==2.3.110+xpu oneccl_bind_pt==2.3.100+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/\npip install -f https://data.pyg.org/whl/torch-2.3.0+cpu.html -f https://data.dgl.ai/wheels/torch-2.3/repo.html -e './[all]'\n```\n\nFor CUDA usage, substitute the index links with your particular toolkit version (e.g. 12.1 below):\n\n```console\npip install -f https://data.dgl.ai/wheels/torch-2.4/cu121/repo.html -f https://data.pyg.org/whl/torch-2.4.0+cu121.html -e './[all]'\n```\n\nAdditionally, for a development install, one can specify the extra packages like `black` and `pytest` with `pip install './[dev]'`. These can be\nadded to the commit workflow by running `pre-commit install` to generate `git` hooks.\n\n### Intel XPU capabilities\n\n\u003e[!NOTE]\n\u003e As of PyTorch 2.4+, XPU support has been upstreamed to PyTorch and starting from `torch\u003e=2.5.0` onwards, should be available as a `pip` install.\n\u003e We will update the instructions accordingly when it does. We recommend consulting the [PyTorch documentation](https://pytorch.org/docs/main/notes/get_start_xpu.html)\n\u003e for updates and instructions on how to get started with XPU use. In the meantime, please consult [this page](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) to see how to set up PyTorch on XPUs.\n\nThe module `matsciml.lightning.xpu` implements interfaces for Intel XPU to Lightning abstractions, including\nthe `XPUAccelerator` and two strategies for deployment (single XPU/tile and distributed data parallel).\nBecause we use PyTorch Lightning, there aren't many marked differences in running on Intel XPU, or GPUs\nfrom other vendors. The abstractions we mentioned are registered in the various Lightning registries,\nand should be accessible simply through `pl.Trainer` arguments, e.g.:\n\n```python\ntrainer = pl.Trainer(accelerator='xpu')\n```\n\nThe one major difference is for distributed data parallelism: Intel XPUs use the oneCCL communication\nbackend, which replaces `nccl`, `gloo`, or other backends typically passed to `torch.distributed`.\nPlease see `examples/devices` for single XPU/tile and DDP use cases.\n\n**NOTE**: Currently there is a hard-coded `torch.cuda.stream` context in PyTorch Lightning's `DDPStrategy`.\nThis [issue](https://github.com/Lightning-AI/pytorch-lightning/issues/19766) has been created to see if the maintainers would be happy to patch\nit so that the `cuda.Stream` context is only used if a CUDA device is being used. If you encounter\na `RuntimeError: Tried to instantiate dummy base class Stream`, please just set `ctx = nullcontext()`\nin the line of code that raises the exception.\n\n## Examples\n\nThe `examples` folder contains simple, unit scripts that demonstrate how to use the pipeline in specific ways:\n\n\u003cdetails\u003e\n\u003csummary\u003e\nGet started with different datasets with \"devsets\"\n\u003c/summary\u003e\n\n```bash\n# Materials project\npython examples/datasets/materials_project/single_task_devset.py\n\n# Carolina materials database\npython examples/datasets/carolina_db/single_task_devset.py\n\n# NOMAD\npython examples/datasets/nomad/single_task_devset.py\n\n# OQMD\npython examples/datasets/oqmd/single_task_devset.py\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nRepresentation learning with symmetry pretraining\n\u003c/summary\u003e\n\n```bash\n# uses the devset for synthetic point group point clouds\npython examples/tasks/symmetry/single_symmetry_example.py\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nExample notebook-based development and testing\n\u003c/summary\u003e\n\n```bash\njupyter notebook examples/devel-example.ipynb\n```\n\u003c/details\u003e\n\nFor more advanced use cases:\n\n\u003cdetails\u003e\n\u003csummary\u003e\nCheckout materials generation with CDVAE\n\u003c/summary\u003e\n\nCDVAE [7] is a latent diffusion model that trains a VAE on the reconstruction\nobjective, adds Gaussian noise to the latent variable, and learns to predict\nthe noise. The noised and generated features inlcude lattice parameters,\natoms composition, and atom coordinates.\nThe generation process is based on the annealed Langevin dynamics.\n\nCDVAE is implemented in the `GenerationTask` and we provide a custom data\nsplit from the Materials Project bounded by 25 atoms per structure.\nThe process is split into 3 parts with 3 respective scripts found in\n`examples/model_demos/cdvae/`.\n1. Training CDVAE on the reconstruction and denoising objectives: `cdvae.py`\n2. Sampling the structures (from scratch or reconstruct the test set): `cdvae_inference.py`\n3. Evaluating the sampled structures: `cdvae_metrics.py`\n\nThe sampling procedure takes some time (about 5-8 hours for 10000 structures\ndepending on the hardware) due to the Langevin dynamics.\nThe default hyperparameters of CDVAE components correspond to that from the\noriginal paper and can be found in `cdvae_configs.py`.\n\n\n```bash\n# training\npython examples/model_demos/cdvae/cdvae.py --data_path \u003cpath/to/splits\u003e\n\n# sampling 10,000 structures from scratch\npython examples/model_demos/cdvae/cdvae_inference.py --model_path \u003cpath/to/checkpoint\u003e --data_path \u003cpath/to/splits\u003e --tasks gen\n\n# evaluating the sampled structures\npython examples/model_demos/cdvae/cdvae_metrics.py --root_path \u003cpath/to/generated_samples\u003e --data_path \u003cpath/to/splits\u003e --tasks gen\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nMultiple tasks trained using the same dataset\n\u003c/summary\u003e\n\n```bash\n# this script requires modification as you'll need to download the materials\n# project dataset, and point L24 to the folder where it was saved\npython examples/tasks/multitask/single_data_multitask_example.py\n```\n\nUtilizes Materials Project data to train property regression and material classification jointly\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nMultiple tasks trained using multiple datasets\n\u003c/summary\u003e\n\n```bash\npython examples/tasks/multitask/three_datasets.py\n```\n\nTrain regression tasks against IS2RE, S2EF, and LiPS datasets jointly\n\u003c/details\u003e\n\n\n### Data Pipeline\n\nIn the `scripts` folder you will find two scripts needed to download and preprocess datasets: the `download_datasets.py` can be used to obtain Carolina DB, Materials Project, NOMAD, and OQMD datasets, while the `download_ocp_data.py` preserves the original Open Catalyst script.\n\nIn the current release, we have implemented interfaces to a number of large scale materials science datasets. Under the hood, the data structures pulled from each dataset have been homogenized, and the only real interaction layer for users is through the `MatSciMLDataModule`, a subclass of `LightningDataModule`.\n\n```python\nfrom matsciml.lightning.data_utils import MatSciMLDataModule\n\n# no configuration needed, although one can specify the batch size and number of workers\ndevset_module = MatSciMLDataModule.from_devset(dataset=\"MaterialsProjectDataset\")\n```\n\nThis will let you springboard into development without needing to worry about _how_ to wrangle with the datasets; just grab a batch and go! With the exception of Open Catalyst, datasets will typically return point cloud representations; we provide a flexible transform interface to interconvert between representations and frameworks:\n\n\u003cdetails\u003e\n\u003csummary\u003e\nFrom point clouds to DGL graphs\n\u003c/summary\u003e\n\n```python\nfrom matsciml.datasets.transforms import PointCloudToGraphTransform\n\n# make the materials project dataset emit DGL graphs, based on a atom-atom distance cutoff of 10\ndevset = MatSciMLDataModule.from_devset(\n    dataset=\"MaterialsProjectDataset\",\n    dset_kwargs={\"transforms\": [PointCloudToGraphTransform(backend=\"dgl\", cutoff_dist=10.)]}\n)\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nBut I want to use PyG?\n\u003c/summary\u003e\n\n```python\nfrom matsciml.datasets.transforms import PointCloudToGraphTransform\n\n# change the backend argument to obtain PyG graphs\ndevset = MatSciMLDataModule.from_devset(\n    dataset=\"MaterialsProjectDataset\",\n    dset_kwargs={\"transforms\": [PointCloudToGraphTransform(backend=\"pyg\", cutoff_dist=10.)]}\n)\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\nWhat else can I configure with `MatSciMLDataModule`?\n\u003c/summary\u003e\n\nDatasets beyond devsets can be configured through class arguments:\n\n```python\ndevset = MatSciMLDataModule(\n    dataset=\"MaterialsProjectDataset\",\n    train_path=\"/path/to/training/lmdb/folder\",\n    batch_size=64,\n    num_workers=4,     # configure data loader instances\n    dset_kwargs={\"transforms\": [PointCloudToGraphTransform(backend=\"pyg\", cutoff_dist=10.)]},\n    val_split=\"/path/to/val/lmdb/folder\"\n)\n```\n\nIn particular, `val_split` and `test_split` can point to their LMDB folders, _or_ just a float between [0,1] to do quick, uniform splits. The rest, including distributed sampling, will be taken care of for you under the hood.\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\nHow do I compose multiple datasets?\n\u003c/summary\u003e\n\nGiven the amount of configuration involved, composing multiple datasets takes a little more work but we have tried to make it as seamless as possible. The main difference from the single dataset case is replacing `MatSciMLDataModule` with `MultiDataModule` from `matsciml.lightning.data_utils`, configuring each dataset manually, and passing them collectively into the data module:\n\n```python\nfrom matsciml.datasets import MaterialsProjectDataset, OQMDDataset, MultiDataset\nfrom matsciml.lightning.data_utils import MultiDataModule\n\n# configure training only here, but same logic extends to validation/test splits\ntrain_dset = MultiDataset(\n  [\n    MaterialsProjectDataset(\"/path/to/train/materialsproject\"),\n    OQMDDataset(\"/path/to/train/oqmd\")\n  ]\n)\n\n# this configures the actual data module passed into Lightning\ndatamodule = MultiDataModule(\n  batch_size=32,\n  num_workers=4,\n  train_dataset=train_dset\n)\n```\n\nWhile it does require a bit of extra work, this was to ensure flexibility in how you can compose datasets. We welcome feedback on the user experience! 😃\n\n\u003c/details\u003e\n\n### Task abstraction\n\nIn Open MatSci ML Toolkit, tasks effective form learning objectives: at a high level, a task takes an encoding model/backbone that ingests a structure to predict one or several properties, or classify a material. In the single task case, there may be multiple _targets_ and the neural network architecture may be fluid, but there is only _one_ optimizer. Under this definition, multi-task learning comprises multiple tasks and optimizers operating jointly through _a single embedding_.\n\n\n## References\n- [1] Chanussot, L., Das, A., Goyal, S., Lavril, T., Shuaibi, M., Riviere, M., Tran, K., Heras-Domingo, J., Ho, C., Hu, W. and Palizhati, A., 2021. Open catalyst 2020 (OC20) dataset and community challenges. Acs Catalysis, 11(10), pp.6059-6072.\n- [2] Jain, A., Ong, S.P., Hautier, G., Chen, W., Richards, W.D., Dacek, S., Cholia, S., Gunter, D., Skinner, D., Ceder, G. and Persson, K.A., 2013. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL materials, 1(1).\n- [3] Batzner, S., Musaelian, A., Sun, L., Geiger, M., Mailoa, J.P., Kornbluth, M., Molinari, N., Smidt, T.E. and Kozinsky, B., 2022. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1), p.2453.\n- [4] Kirklin, S., Saal, J.E., Meredig, B., Thompson, A., Doak, J.W., Aykol, M., Rühl, S. and Wolverton, C., 2015. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. npj Computational Materials, 1(1), pp.1-15.\n- [5] Draxl, C. and Scheffler, M., 2019. The NOMAD laboratory: from data sharing to artificial intelligence. Journal of Physics: Materials, 2(3), p.036001.\n- [6] Zhao, Y., Al‐Fahdi, M., Hu, M., Siriwardane, E.M., Song, Y., Nasiri, A. and Hu, J., 2021. High‐throughput discovery of novel cubic crystal materials using deep generative neural networks. Advanced Science, 8(20), p.2100566.\n- [7] Xie, T., Fu, X., Ganea, O.E., Barzilay, R. and Jaakkola, T.S., 2021, October. Crystal Diffusion Variational Autoencoder for Periodic Material Generation. In International Conference on Learning Representations.\n\n## Contributing\n\nPlease refer to the [developers guide](https://matsciml.readthedocs.io/en/latest/developers.html) for how to contribute the the Open MatSciML Toolkit.\n\n\n## Citations\n\nIf you use Open MatSci ML Toolkit in your technical work or publication, we would appreciate it if you cite the Open MatSci ML Toolkit paper in TMLR:\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\nMiret, S.; Lee, K. L. K.; Gonzales, C.; Nassar, M.; Spellings, M. The Open MatSci ML Toolkit: A Flexible Framework for Machine Learning in Materials Science. Transactions on Machine Learning Research, 2023.\n\u003c/summary\u003e\n\n```bibtex\n@article{openmatscimltoolkit,\n  title = {The Open {{MatSci ML}} Toolkit: {{A}} Flexible Framework for Machine Learning in Materials Science},\n  author = {Miret, Santiago and Lee, Kin Long Kelvin and Gonzales, Carmelo and Nassar, Marcel and Spellings, Matthew},\n  year = {2023},\n  journal = {Transactions on Machine Learning Research},\n  issn = {2835-8856}\n}\n```\n\n\u003c/details\u003e\n\nIf you use v1.0.0, please cite our paper:\n\n\u003cdetails\u003e\n\n\u003csummary\u003e\nLee, K. L. K., Gonzales, C., Nassar, M., Spellings, M., Galkin, M., \u0026 Miret, S. (2023). MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling. arXiv preprint arXiv:2309.05934.\n\u003c/summary\u003e\n\n```bibtex\n@article{lee2023matsciml,\n  title={MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling},\n  author={Lee, Kin Long Kelvin and Gonzales, Carmelo and Nassar, Marcel and Spellings, Matthew and Galkin, Mikhail and Miret, Santiago},\n  journal={arXiv preprint arXiv:2309.05934},\n  year={2023}\n}\n```\n\n\u003c/details\u003e\n\n\nPlease cite datasets used in your work as well. You can find additional descriptions and details regarding each dataset [here](matsciml/datasets/DATASETS.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fmatsciml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintellabs%2Fmatsciml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fmatsciml/lists"}