{"id":19005742,"url":"https://github.com/fedml-ai/spreadgnn","last_synced_at":"2025-10-12T00:32:40.018Z","repository":{"id":45543183,"uuid":"374843887","full_name":"FedML-AI/SpreadGNN","owner":"FedML-AI","description":"SpreadGNN: Serverless Multi-Task Learning Framework for Graph Neural Networks. Accepted to AAAI22.","archived":false,"fork":false,"pushed_at":"2022-08-24T16:07:10.000Z","size":62,"stargazers_count":47,"open_issues_count":2,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-10-07T18:43:58.725Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FedML-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-08T01:20:51.000Z","updated_at":"2025-04-08T03:26:29.000Z","dependencies_parsed_at":"2022-08-03T02:45:58.867Z","dependency_job_id":null,"html_url":"https://github.com/FedML-AI/SpreadGNN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/FedML-AI/SpreadGNN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FedML-AI%2FSpreadGNN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FedML-AI%2FSpreadGNN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FedML-AI%2FSpreadGNN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FedML-AI%2FSpreadGNN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FedML-AI","download_url":"https://codeload.github.com/FedML-AI/SpreadGNN/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FedML-AI%2FSpreadGNN/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279009508,"owners_count":26084609,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T18:29:04.309Z","updated_at":"2025-10-12T00:32:40.002Z","avatar_url":"https://github.com/FedML-AI.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SpreadGNN: Decentralized Multi-Task Federated Learning for Graph Neural Networks on Molecular Data\n\nThis repository is the official implementation of SpreadGNN: Decentralized Multi-Task Federated Learning for Graph Neural Networks on Molecular Data which accepted to AAAI'22.\n\n## 1. Introduction\n\n\nGraph Neural Networks (GNNs) are the first choice methods for graph machine learning problems thanks to their ability to learn state-of-the-art level representations from graph-structured data. However, centralizing a massive amount of real-world graph data for GNN training is prohibitive due to user-side privacy concerns, regulation restrictions, and commercial competition. Federated Learning is the de-facto standard for collaborative training of machine learning models over many distributed edge devices without the need for centralization. Nevertheless, training graph neural networks in a federated setting is vaguely defined and brings statistical and systems challenges. This work proposes SpreadGNN, a novel multi-task federated training framework capable of operating in the presence of partial labels and absence of a central server for the first time in the literature. SpreadGNN extends federated multi-task learning to realistic serverless settings for GNNs, and utilizes a novel optimization algorithm with a convergence guarantee, Decentralized Periodic Averaging SGD (DPA-SGD), to solve decentralized multi-task learning problems. We empirically demonstrate the efficacy of our framework on a variety of  non-I.I.D. distributed graph-level molecular property prediction datasets with partial labels. Our results show that SpreadGNN outperforms GNN models trained over a central server-dependent federated learning system, even in constrained topologies. \n\n\n## 2. Installation\n\n\n```bash\nconda create -n spreadgnn python=3.7\nconda activate spreadgnn\nconda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia\nconda install -c anaconda mpi4py grpcio \nconda install scikit-learn numpy h5py setproctitle networkx\npip install -r requirements.txt \ncd FedML; git submodule init; git submodule update; cd ../;\npip install -r FedML/requirements.txt\n```\n\n\n## 3. Data Preparation\nFor each dataset you want to try run the .sh file located in the dataset folder.\nFor more datasets, visit http://moleculenet.ai/\n\n\n## 4. Experiments \n\n\n### Distributed/Federated Molecule Property Classification experiments\n```\nsh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256  sider \"./../../../data/sider/\" 0\n\n##run on background\nnohup sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256  sider \"./../../../data/sider/\" 0 \u003e ./fedavg-graphsage.log 2\u003e\u00261 \u0026\n```\n\n### Distributed/Federated Molecule Property Regression experiments\n```\nsh run_fedavg_distributed_reg.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 freesolv \"./../../../data/freesolv/\" 0\n\n##run on background\nnohup sh run_fedavg_distributed_reg.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256 freesolv \"./../../../data/freesolv/\" 0 \u003e ./fedavg-graphsage.log 2\u003e\u00261 \u0026\n```\n\n#### Arguments for Distributed/Federated Training\nThis is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting.\n```\nCLIENT_NUM=$1 -\u003e Number of clients in dist/fed setting\nWORKER_NUM=$2 -\u003e Number of workers\nSERVER_NUM=$3 -\u003e Number of servers\nGPU_NUM_PER_SERVER=$4 -\u003e GPU number per server\nMODEL=$5 -\u003e Model name\nDISTRIBUTION=$6 -\u003e Dataset distribution. homo for IID splitting. hetero for non-IID splitting.\nROUND=$7 -\u003e Number of Distiributed/Federated Learning Rounds\nEPOCH=$8 -\u003e Number of epochs to train clients' local models\nBATCH_SIZE=$9 -\u003e Batch size \nLR=${10}  -\u003e learning rate\nSAGE_DIM=${11} -\u003e Dimenionality of GraphSAGE embedding\nNODE_DIM=${12} -\u003e Dimensionality of node embeddings\nSAGE_DR=${13} -\u003e Dropout rate applied between GraphSAGE Layers\nREAD_DIM=${14} -\u003e Dimensioanlity of readout embedding\nGRAPH_DIM=${15} -\u003e Dimensionality of graph embedding\nDATASET=${16} -\u003e Dataset name (Please check data folder to see all available datasets)\nDATA_DIR=${17} -\u003e Dataset directory\nCI=${18}\n```\n\n### Distributed/Federated Molecule Property Classification with FedGMTL \n```\nsh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256  sider \"./../../../data/sider/\" 0\n\n##run on background\nnohup sh run_fedavg_distributed_pytorch.sh 6 1 1 1 graphsage homo 150 1 1 0.0015 256 256 0.3 256 256  sider \"./../../../data/sider/\" 0 \u003e ./fedavg-graphsage.log 2\u003e\u00261 \u0026\n```\n\n#FedGMTL Classification experiments\n\n```\nsh run_fedgmtl.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 sider =./../../../data/sider/ 1 0\n```\n\n#FedGMTL Regression experiments\n\n```\nsh run_fedgmtl_reg.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 qm8 \"./../../../data/qm8/\" 1 0\n```\n\n#### Arguments for FedGMTL\t\nThis is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting.\n```\nCLIENT_NUM=$1 -\u003e Number of clients in dist/fed setting\nWORKER_NUM=$2 -\u003e Number of workers\nSERVER_NUM=$3 -\u003e Number of servers\nGPU_NUM_PER_SERVER=$4 -\u003e GPU number per server\nMODEL=$5 -\u003e Model name\nDISTRIBUTION=$6 -\u003e Dataset distribution. homo for IID splitting. hetero for non-IID splitting.\nPARTITION_ALPHA=$7 -\u003e Alpha parameter for Dirichlet distribution\nROUND=$8 -\u003e Number of Distributed/Federated Learning Rounds\nEPOCH=$9 -\u003e Number of epochs to train clients' local models\nBATCH_SIZE=${10} -\u003e Batch size \nLR=${11}  -\u003e Learning rate\nTASK_W=${12} -\u003e Task-Relationship regularizer weight\nTASK_W_DECAY=${13} -\u003e Decay for Task-Relationship regularizer\nWD=${14} -\u003e Weight Decay Coefficient\nHIDDEN_DIM=${15} -\u003e Dimensionality of GNN Hidden Layer\nNODE_DIM=${16}  -\u003e Dimensionality of Node embeddings\nDR=${17} -\u003e Dropout rate applied between GraphSAGE Layers\nREAD_DIM=${18} -\u003e Dimensionality of readout embedding\nGRAPH_DIM=${19}  -\u003e Dimensionality of graph embedding\nMASK_TYPE=${20} -\u003e Mask scenario (0,1,2)\nDATASET=${21} -\u003e Dataset name\nDATA_DIR=${22} -\u003e Directory\nCI=${23}\n```\n\n#SpreadGNN Classification experiments\n\n```\nsh run_spreadgnn.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 sider =./../../../data/sider/ 1 0\n```\n\n#SpreadGNN Regression experiments\n\n```\nsh run_spreadgnn_reg.sh 8 8 1 1 graphsage hetero 0.5 70 1 1 0.0015 0.3 1 0 64 64 0.3 64 64 1 qm8 \"./../../../data/qm8/\" 1 0\n```\n\n#### Arguments for SpreadGNN\nThis is an ordered list of arguments used in distributed/federated experiments. Note, there are additional parameters for this setting.\n```\nCLIENT_NUM=$1 -\u003e Number of clients in dist/fed setting\nWORKER_NUM=$2 -\u003e Number of workers\nSERVER_NUM=$3 -\u003e Number of servers\nGPU_NUM_PER_SERVER=$4 -\u003e GPU number per server\nMODEL=$5 -\u003e Model name\nDISTRIBUTION=$6 -\u003e Dataset distribution. homo for IID splitting. hetero for non-IID splitting.\nPARTITION_ALPHA=$7 -\u003e Alpha parameter for Dirichlet distribution\nROUND=$8 -\u003e Number of Distributed/Federated Learning Rounds\nEPOCH=$9 -\u003e Number of epochs to train clients' local models\nBATCH_SIZE=${10} -\u003e Batch size \nLR=${11}  -\u003e Learning rate\nTASK_W=${12} -\u003e Task-Relationship regularizer weight\nTASK_W_DECAY=${13} -\u003e Decay for Task-Relationship regularizer\nWD=${14} -\u003e Weight Decay Coefficient\nHIDDEN_DIM=${15} -\u003e Dimensionality of GNN Hidden Layer\nNODE_DIM=${16}  -\u003e Dimensionality of Node embeddings\nDR=${17} -\u003e Dropout rate applied between GraphSAGE Layers\nREAD_DIM=${18} -\u003e Dimensionality of readout embedding\nGRAPH_DIM=${19}  -\u003e Dimensionality of graph embedding\nMASK_TYPE=${20} -\u003e Mask scenario (0,1,2)\nDATASET=${21} -\u003e Dataset name\nDATA_DIR=${22} -\u003e Directory\nPERIOD=${23} -\u003e Communication Period for Parameter Exchange\nCI=${24}\n```\n\n\n## 6. Code Structure of SpreadGNN \n\n- `FedML`: a soft repository link generated using `git submodule add https://github.com/FedML-AI/FedML`.\n\n- `data`: provide data downloading scripts and store the downloaded datasets.\n\n\n- `data_preprocessing`: data loaders\n\n- `model`: advanced molecular ML models.\n\n- `trainer`: please define your own `trainer.py` by inheriting the base class in `FedML/fedml-core/trainer/fedavg_trainer.py`.\nSome tasks can share the same trainer.\n\n- `experiments/distributed`: \n1. `experiments` is the entry point for training. It contains experiments in different platforms. \n2. Every experiment integrates FOUR building blocks `FedML` (federated optimizers), `data_preprocessing`, `model`, `trainer`.\n\n\n## 5. Update FedML Submodule\n```\ncd FedML\ngit checkout master \u0026\u0026 git pull\ncd ..\ngit add FedML\ngit commit -m \"updating submodule FedML to latest\"\ngit push\n```\n\n\n \n## 6. Citation\n```\n\n@misc{he2021spreadgnn,\n      title={SpreadGNN: Decentralized Multi-Task Federated Learning for Graph Neural Networks on Molecular Data}, \n      author={Chaoyang He and Emir Ceyani and Keshav Balasubramanian and Murali Annavaram and Salman Avestimehr},\n      year={2021},\n      eprint={2106.02743},\n      archivePrefix={arXiv},\n      primaryClass={cs.LG}\n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffedml-ai%2Fspreadgnn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffedml-ai%2Fspreadgnn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffedml-ai%2Fspreadgnn/lists"}