{"id":13935534,"url":"https://github.com/huzongxiang/MatDGL","last_synced_at":"2025-07-19T20:33:19.789Z","repository":{"id":216179890,"uuid":"445102078","full_name":"huzongxiang/MatDGL","owner":"huzongxiang","description":"MatDGL is a neural network package that allows researchers to train custom models for crystal modeling tasks. It aims to accelerate the research and application of material science.","archived":false,"fork":false,"pushed_at":"2024-07-30T09:21:28.000Z","size":57309,"stargazers_count":63,"open_issues_count":0,"forks_count":12,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-11-15T07:54:14.156Z","etag":null,"topics":["deep-learning","graph","machine-learning","massagepassing","materials","neural-networks","pretrain","tensorflow","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/huzongxiang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-06T08:50:33.000Z","updated_at":"2024-07-30T09:21:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"62e038bb-80b1-4265-854a-27189dc4cc6e","html_url":"https://github.com/huzongxiang/MatDGL","commit_stats":null,"previous_names":["huzongxiang/matdgl","huzongxiang/crystalnetwork"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huzongxiang%2FMatDGL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huzongxiang%2FMatDGL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huzongxiang%2FMatDGL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/huzongxiang%2FMatDGL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/huzongxiang","download_url":"https://codeload.github.com/huzongxiang/MatDGL/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226677076,"owners_count":17666002,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","graph","machine-learning","massagepassing","materials","neural-networks","pretrain","tensorflow","transformer"],"created_at":"2024-08-07T23:01:51.398Z","updated_at":"2024-11-27T03:30:49.684Z","avatar_url":"https://github.com/huzongxiang.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"![](https://img.shields.io/badge/license-MIT-red)\n![](https://img.shields.io/badge/build-passing-brightgreen)\n![](https://img.shields.io/pypi/v/matdgl)\n![](https://img.shields.io/pypi/dm/matdgl)\n![](https://img.shields.io/badge/python-3.8-blue)\n![](https://img.shields.io/badge/tensorflow-2.10.0-red)\n![](https://img.shields.io/github/stars/huzongxiang/MatDGL?style=social)\n\n# MatDGL(Material Deep Graph Learning)\nMatDGL is a neural network package that allows researchers to train custom models for material modeling tasks. It aims to accelerate the research and application of material science. It provides user a series of state-of-the-art models and supports user's innovative researches.\n\n## Table of Contents\n\n* [Hightlights](#hightlights)\n* [Installation](#installation)\n* [Usage](#usage)\n* [Framework](#matdgl-framework)\n* [Implemented-models](#implemented-models)\n* [Contributors](#contributors)\n* [References](#references)\n* [Contact](#Contact)\n\n\u003ca name=\"Hightlights\"\u003e\u003c/a\u003e\n## Hightlights\n+ Easy to installation.\n+ Three steps to fast testing.\n+ Flexible and adaptive to user's trainning task.\n\n\u003ca name=\"Installation\"\u003e\u003c/a\u003e\n## Installation\n\nMatDGL can be installed easily through anaconda! As follows:\n\n+ Create a new conda environment named \"matdgl\" by command, then activate environment \"matdgl\":    \n    ```bash\n    conda create -n matdgl python=3.8\n    conda activate matdgl\n    ```  \n   It's necessary to create a new conda environment to aviod bugs causing by version conflict.   \n \n+ Configure dependencies of matdgl:\n    ```bash\n    conda install -c conda-forge tensorflow-gpu\n    ```\n\n+ Install pymatgen:  \n    ```bash\n    conda install --channel conda-forge pymatgen  \n    ```    \n\n+ Install other dependencies:  \n    ```bash\n    conda install --channel conda-forge mendeleev  \n    conda install --channel conda-forge graphviz  \n    conda install --channel conda-forge pydot  \n    conda install --channel conda-forge sklearn\n    ```   \n\n+ Install matdgl:  \n    ```bash\n    pip install matdgl\n    ```  \n  \n\n\u003ca name=\"Usage\"\u003e\u003c/a\u003e\n## Usage\n### Quick start\nMatDGL is very easy to use!  \nJust ***three steps*** can finish a fast test using matdgl:\n+ **download test data**  \nGet test datas from https://github.com/huzongxiang/MatDGL/tree/main/datas/    \nThere are four json files in datas: dataset_classification.json, dataset_multiclassification.json, dataset_regression.json  \nand dataset_pretrain.json.    \n+ **prepare workdir**  \nDownload datas and put it in your trainning work directory, test.py file should also be put in the directory  \n\t```\n\tworkdir\n\t│   test.py\n    |\n\t└───datas\n\t\t│   dataset_classification.json\n\t\t│   dataset_multiclassification.json\n\t\t│   dataset_regression.json\n\t\t│   dataset_pretrain.json\n\t``` \n+ **run command**  \nrun command:  \n\t```bash\n\tpython test.py\n\t```  \nYou have finished your testing multi-classification trainning! The trainning results and model weight could be saved in /results and /models, respectively.  \n\n### Understanding trainning script  \nYou can use matdgl by provided trainning scripts in user_easy_trainscript only, but understanding script will help you custom your trainning task!   \n     \n+ **get datas**  \nGet current work directory of running trainning script, the script will read datas from 'workdir/datas/' , then saves results and models to 'workdir/results/' and 'workdir/models/'  \n\t```python\n\tfrom pathlib import Path\n\tModulePath = Path(__file__).parent.absolute() # workdir\n\t```  \n\n+ **fed trainning datas**   \nModule Dataset will read data from 'ModulePath/datas/dataset.json', 'task_type' defines regression/classification/multi-classification, 'data_path' gets path of trainning datas.  \n\t```python\n\tfrom matdgl.data import Dataset\n\tdataset = Dataset(task_type='multiclassfication', data_path=ModulePath)\n\t```  \n\n+ **generator**  \nModule GraphGenerator feds datas into model during trainning. The Module splits datas into train, valid, test sets, and transform structures data into labelled graphs and gets three generators.\nBATCH_SIZE is batch size during trainning, DATA_SIZE defines number of datas your used in entire datas, CUTOFF is cutoff of graph edges in crystal.   \n\t```python\n\tfrom matdgl.data.generator import GraphGenerator\n\tBATCH_SIZE = 128\n\tDATA_SIZE = None\n\tCUTOFF = 2.5\n\tGenerators = GraphGenerator(dataset, data_size=DATA_SIZE, batch_size=BATCH_SIZE, cutoff=CUTOFF)\n\ttrain_data = Generators.train_generator\n\tvalid_data = Generators.valid_generator\n\ttest_data = Generators.test_generator\n\n\t#if task is multiclassfication, should define variable multiclassifiction\n\tmulticlassification = Generators.multiclassification  \n\t```  \n\n+ **building model**  \nModule GNN defines a trainning framework that accepts a series of models. MatDGL provides a series of mainstream models as your need.  \n\t```python\n\tfrom matdgl.models import GNN\n\tfrom matdgl.models.gnnmodel import MpnnBaseModel, TransformerBaseModel, CgcnnModel, GraphAttentionModel\n\n\tgnn = GNN(model=MpnnBaseModel,\n\t\tatom_dim=16\n\t\tbond_dim=64\n\t\tnum_atom=118\n\t\tstate_dim=16\n\t\tsp_dim=230\n\t\tunits=32\n\t\tedge_steps=1\n\t\tmessage_steps=1\n\t\ttransform_steps=1\n\t\tnum_attention_heads=8\n\t\tdense_units=64\n\t\toutput_dim=64\n\t\treadout_units=64\n\t\tdropout=0.0\n\t\treg0=0.00\n\t\treg1=0.00\n\t\treg2=0.00\n\t\treg3=0.00\n\t\treg_rec=0.00\n\t\tbatch_size=BATCH_SIZE\n\t\tspherical_harmonics=True\n\t\tregression=dataset.regression\n\t\toptimizer = 'Adam'\n\t\t)\n\t```\n\n+ **trainning**  \nUsing trainning function of model to train. Common trainning parameters can be defined, workdir is current directory of trainning script, it saves results of model during trainning. If test_data exists, model will predict on test_data.  \n\t```python\n\tgnn.train(train_data, valid_data, test_data, epochs=700, lr=3e-3, warm_up=True, load_weights=False, verbose=1, checkpoints=None, save_weights_only=True, workdir=ModulePath)\n\t```\n\n+ **prediction**  \nThe simplest method for predicting is using script predict.py in /user_easy_train_scripts.  \nUsing predict_data funciton to predict.  \n\t```python\n\tgnn.predict_datas(test_data, workdir=ModulePath)    # predict on test datas with labels\n\ty_pred_keras = gnn.predict(datas)                   # predict on new datas without labels\n\t```\n\n+ **preparing your custom datas**  \nIf you have your structures (and labels), the Dataset receives pymatgen.core.Structure type. So you should transform your POSCAR or cif to pymatgen.core.Structure type.  \n\t```python\n\timport os\n\tfrom pymatgen.core.structure import Structure\n\tstructures = []                                      # your structure list\n\tfor cif in os.listdir(cif_path):\n\t\tstructures.append(Structure.from_file(cif))    # for POSCAR too\n\n\t# construct your dataset\n\tfrom matdgl.data import Dataset\n\tdataset = Dataset(task_type='my_classification', data_path=ModulePath)  # task_type could be my_regression, my_classification, my_multiclassification\n\tdataset.prepare_x(structures)\n\tdataset.prepare_y(labels)   # if you have labels used to trainning model, labels could be None in prediction on new datas without labels\n\n\t# alternatively, you can construct dataset as follow\n\tdataset.structures = structures\n\tdataset.labels = labels\n\n\t# save your structures and labels to dataset in dataset_my*.json\n\tdataset.save_datasets(strurtures, labels)\n\n\t# for prediction on new datas without labels, Generators has not attribute multiclassification, should assign definite value\n\tGenerators = GraphGenerator(dataset, data_size=DATA_SIZE, batch_size=BATCH_SIZE, cutoff=CUTOFF)     # dataset.labels is None\n\tGenerators.multiclassification = 5\n\tmulticlassification = Generators.multiclassification  # multiclassification = 5\n\t```\n\n+ **models provided by matdgl**  \n We provide GraphModel, MpnnBaseModel, TransformerBaseModel, MpnnModel, TransformerModel, DirectionalMpnnModel, DirectionalTransformerModel and CGCNN model according to your demends. TransformerModel, GraphModel and MpnnModel are different models. TransformerModel is a graph transformer. MpnnModel is a massege passing neural network. GraphModel is a combination of TransformerModel and MpnnModel. MpnnBaseModel and TransformerBaseModel don't take directional informations of crystal into count so them run faster. MpnnBaseModel is the fastest model but accuracy is enough for most tasks. TransformerModel can achieve the hightest accuracy in most tasks. The CGCNN model is the crystal graph convolution neural network model. The GraphAttentionModel is the graph attention neural network.  \n\t```python\n\tfrom matdgl.models import GNN\n\tfrom matdgl.models.gnnmodel import MpnnBaseModel, TransformerBaseModel , DirectionalMpnnModel, DirectionalTransformerModel, MpnnModel, TransformerModel, GraphModel, CgcnnModel, GraphAttentionModel\n\t```\n\n+ **custom your model and trainning**  \nThe Module GNN provides a flexible trainning framework to accept tensorflow.keras.models.Model type customized by user. Yon can custom your model and train the model according to the following example.  \n\t```python\n\tfrom tensorflow.keras.models import Model\n\tfrom tensorflow.keras import layers\n\tfrom matdgl.layers import MessagePassing\n\tfrom matdgl.layers import PartitionPadding\n\n\tdef MyModel(\n\t\tbond_dim,\n\t\tatom_dim=16,\n\t\tnum_atom=118,\n\t\tstate_dim=16,\n\t\tsp_dim=230,\n\t\tunits=32,\n\t\tmessage_steps=1,\n\t\treadout_units=64,\n\t\tbatch_size=16,\n\t\t):\n\t\tatom_features = layers.Input((), dtype=\"int32\", name=\"atom_features_input\")\n\t\tatom_features_ = layers.Embedding(num_atom, atom_dim, dtype=\"float32\", name=\"atom_features\")(atom_features)\n\t\tbond_features = layers.Input((bond_dim), dtype=\"float32\", name=\"bond_features\")\n\t\tlocal_env = layers.Input((6), dtype=\"float32\", name=\"local_env\")\n\t\tstate_attrs = layers.Input((), dtype=\"int32\", name=\"state_attrs_input\")   \n\t\tstate_attrs_ = layers.Embedding(sp_dim, state_dim, dtype=\"float32\", name=\"state_attrs\")(state_attrs)\n\n\t\tpair_indices = layers.Input((2), dtype=\"int32\", name=\"pair_indices\")\n\n\t\tatom_graph_indices = layers.Input(\n\t\t(), dtype=\"int32\", name=\"atom_graph_indices\"\n\t\t)\n\n\t\tbond_graph_indices = layers.Input(\n\t\t(), dtype=\"int32\", name=\"bond_graph_indices\"\n\t\t)\n\n\t\tpair_indices_per_graph = layers.Input((2), dtype=\"int32\", name=\"pair_indices_per_graph\")\n\n\t\tx = MessagePassing(message_steps)(\n\t\t[atom_features_, edge_features, state_attrs_, pair_indices,\n\t\t\tatom_graph_indices, bond_graph_indices]\n\t\t)\n\n\t\tx = PartitionPadding(batch_size)([x[0], atom_graph_indices])\n\t\tx = layers.BatchNormalization()(x)\n\t\tx = layers.GlobalAveragePooling1D()(x)\n\t\tx = layers.Dense(readout_units, activation=\"relu\", name='readout0')(x)\n\t\tx = layers.Dense(1, activation=\"sigmoid\", name='final')(x)\n\n\t\tmodel = Model(\n\t\tinputs=[atom_features, bond_features, local_env, state_attrs, pair_indices, atom_graph_indices,\n\t\t\t\t\tbond_graph_indices, pair_indices_per_graph],\n\t\toutputs=[x],\n\t\t)\n\t\treturn model\n\n\tfrom matdgl.models import GNN\n\tgnn = GNN(model=MyModel,     \n\t\tatom_dim=16,\n\t\tbond_dim=64,\n\t\tnum_atom=118,\n\t\tstate_dim=16,\n\t\tsp_dim=230,\n\t\tunits=32,\n\t\tmessage_steps=1,\n\t\treadout_units=64,\n\t\tbatch_size=16,\n\t\toptimizer='Adam',\n\t\tregression=False,\n\t\tmulticlassification=None,)\n\tgnn.train(train_data, valid_data, test_data, epochs=700, lr=3e-3, warm_up=True, load_weights=False, verbose=1, checkpoints=None, save_weights_only=True, workdir=ModulePath)  \n\t```  \n\tYou can set edge as your model output.   \n\t```python\n\tfrom matdgl.layers import EdgeMessagePassing\n\tdef MyModel(\n\t\tbond_dim,\n\t\tatom_dim=16,\n\t\tnum_atom=118,\n\t\tstate_dim=16,\n\t\tsp_dim=230,\n\t\tunits=32,\n\t\tmessage_steps=1,\n\t\treadout_units=64,\n\t\tbatch_size=16,\n\t\t):\n\t\tatom_features = layers.Input((), dtype=\"int32\", name=\"atom_features_input\")\n\t\tatom_features_ = layers.Embedding(num_atom, atom_dim, dtype=\"float32\", name=\"atom_features\")(atom_features)\n\t\tbond_features = layers.Input((bond_dim), dtype=\"float32\", name=\"bond_features\")\n\t\tlocal_env = layers.Input((6), dtype=\"float32\", name=\"local_env\")\n\t\tstate_attrs = layers.Input((), dtype=\"int32\", name=\"state_attrs_input\")   \n\t\tstate_attrs_ = layers.Embedding(sp_dim, state_dim, dtype=\"float32\", name=\"state_attrs\")(state_attrs)\n\n\t\tpair_indices = layers.Input((2), dtype=\"int32\", name=\"pair_indices\")\n\n\t\tatom_graph_indices = layers.Input(\n\t\t(), dtype=\"int32\", name=\"atom_graph_indices\"\n\t\t)\n\n\t\tbond_graph_indices = layers.Input(\n\t\t(), dtype=\"int32\", name=\"bond_graph_indices\"\n\t\t)\n\n\t\tpair_indices_per_graph = layers.Input((2), dtype=\"int32\", name=\"pair_indices_per_graph\")\n\n\t\tx = EdgeMessagePassing(units,\n\t\t\t\t\tedge_steps,\n\t\t\t\t\tkernel_regularizer=l2(reg0),\n\t\t\t\t\tsph=spherical_harmonics\n\t\t\t\t\t)([bond_features, local_env, pair_indices])\n\n\t\tx = PartitionPadding(batch_size)([x[1], bond_graph_indices])\n\t\tx = layers.BatchNormalization()(x)\n\t\tx = layers.GlobalAveragePooling1D()(x)\n\t\tx = layers.Dense(readout_units, activation=\"relu\", name='readout0')(x)\n\t\tx = layers.Dense(readout_units//2, activation=\"relu\", name='readout1')(x)\n\t\tx = layers.Dense(1, name='final')(x)\n\n\t\tmodel = Model(\n\t\tinputs=[atom_features, bond_features, local_env, state_attrs, pair_indices, atom_graph_indices,\n\t\t\t\t\tbond_graph_indices, pair_indices_per_graph],\n\t\toutputs=[x],\n\t\t)\n\t\treturn model\n\t```  \n\n\tThe Module GNN has some basic parameter necessary to be defined but not necessary to be used:  \n\t```python\n\tclass GNN:\n\t    def __init__(self,\n\t\tmodel: Model,\n\t\tatom_dim=16,\n\t\tbond_dim=32,\n\t\tnum_atom=118,\n\t\tstate_dim=16,\n\t\tsp_dim=230,\n\t\tbatch_size=16,\n\t\tregression=True,\n\t\toptimizer = 'Adam',\n\t\tmulticlassification=None,\n\t\t**kwargs,\n\t\t):\n\t\t\"\"\"\n\t\tpass\n\t\t\"\"\"  \n\t```  \n\n\n\u003ca name=\"MatDGL-framework\"\u003e\u003c/a\u003e\n## Framework  \nMatDGL \n\n\n\u003ca name=\"Implemented-models\"\u003e\u003c/a\u003e\n## Implemented-models  \nWe list currently supported GNN models:\n* **GCN** from Kipf and Welling: [Semi-Supervised Classification with Graph Convolutional Networks](https://arxiv.org/abs/1609.02907) (ICLR 2017)  \n* **GAT** from Veličković *et al.*: [Graph Attention Networks](https://arxiv.org/abs/1710.10903) (ICLR 2018)  \n* **GN** from Battaglia *et al.*: [Relational inductive biases, deep learning, and graph networks](https://arxiv.org/pdf/1806.01261v1)   \n* **Transformer** from Vaswani *et al.*: [Attention Is All You Need](https://arxiv.org/pdf/1706.03762) (NIPS 2017)  \n\n\n\u003ca name=\"Contributors\"\u003e\u003c/a\u003e\n## Contributors\nZongxiang Hu\n\n\n\u003ca name=\"References\"\u003e\u003c/a\u003e\n## References\n\n\n\u003ca name=\"Contact\"\u003e\u003c/a\u003e\n## Contact\nPlease contact me if you have any questions.  \nMail: huzongxiang@yahoo.com  \nWechat: voodoozx2015\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuzongxiang%2FMatDGL","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhuzongxiang%2FMatDGL","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhuzongxiang%2FMatDGL/lists"}