{"id":20019237,"url":"https://github.com/jbris/nextflow-graph-machine-learning","last_synced_at":"2025-06-26T23:09:57.097Z","repository":{"id":193502650,"uuid":"684796811","full_name":"JBris/nextflow-graph-machine-learning","owner":"JBris","description":"A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.","archived":false,"fork":false,"pushed_at":"2024-11-20T07:16:48.000Z","size":5969,"stargazers_count":5,"open_issues_count":5,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-08T14:11:09.595Z","etag":null,"topics":["deep-learning","docker","docker-compose","dream5","gene-regulatory-network","gene-regulatory-network-inference","gene-regulatory-networks","graph-neural-network","graph-neural-networks","graphsage","machine-learning","minio","mlflow","mlops","nextflow","nextflow-pipeline","nextflow-pipelines","variational-autoencoder","variational-inference"],"latest_commit_sha":null,"homepage":"https://jbris.github.io/nextflow-graph-machine-learning/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JBris.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-29T21:54:05.000Z","updated_at":"2024-11-20T07:16:52.000Z","dependencies_parsed_at":"2024-02-25T09:36:52.076Z","dependency_job_id":"33d3fce6-d12a-447f-a9db-cbfe7c484be3","html_url":"https://github.com/JBris/nextflow-graph-machine-learning","commit_stats":null,"previous_names":["jbris/nextflow-machine-learning","jbris/nextflow-graph-machine-learning"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JBris%2Fnextflow-graph-machine-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JBris%2Fnextflow-graph-machine-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JBris%2Fnextflow-graph-machine-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JBris%2Fnextflow-graph-machine-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JBris","download_url":"https://codeload.github.com/JBris/nextflow-graph-machine-learning/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252414310,"owners_count":21744083,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","docker","docker-compose","dream5","gene-regulatory-network","gene-regulatory-network-inference","gene-regulatory-networks","graph-neural-network","graph-neural-networks","graphsage","machine-learning","minio","mlflow","mlops","nextflow","nextflow-pipeline","nextflow-pipelines","variational-autoencoder","variational-inference"],"created_at":"2024-11-13T08:26:33.863Z","updated_at":"2025-05-04T23:31:42.518Z","avatar_url":"https://github.com/JBris.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Nextflow Graph Machine Learning\r\n\r\n[![Validate Pipeline](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/validation.yml/badge.svg)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/validation.yml) [![Generate Documentation](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/docs.yml/badge.svg)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/docs.yml) [![pages-build-deployment](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/pages/pages-build-deployment)\r\n[![CodeQL](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/github-code-scanning/codeql/badge.svg?branch=main)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/github-code-scanning/codeql)\r\n\r\nWebsite: [Nextflow Graph Machine Learning](https://jbris.github.io/nextflow-graph-machine-learning/)\r\n\r\n*A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.*\r\n\r\n# Table of contents\r\n\r\n- [Nextflow Graph Machine Learning](#nextflow-graph-machine-learning)\r\n- [Table of contents](#table-of-contents)\r\n- [Introduction](#introduction)\r\n- [The Nextflow pipeline](#the-nextflow-pipeline)\r\n- [Python Environment](#python-environment)\r\n  - [MLOps](#mlops)\r\n- [ArangoDB](#arangodb)\r\n\r\n# Introduction\r\n\r\nThe purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs). In practice, GRN reconstruction is an unsupervised link prediction problem.\r\n\r\n[For developing GNNs, we use PyTorch Geometric.](https://pytorch-geometric.readthedocs.io/en/latest/)\r\n\r\n# The Nextflow pipeline\r\n\r\n[Nextflow has been included to orchestrate the GRN reconstruction pipeline.](https://www.nextflow.io/)\r\n\r\nThe pipeline is composed of the following steps:\r\n\r\n1. Exploratory data analysis: View the GRN and calculate some summary statistics.\r\n2. Processing: Process the graph feature matrix and edge list. Remove the disconnected subgraph.\r\n3. ArangoDB Importing: Import the graph into ArangoDB.\r\n4. GNN training: Train a GNN using SAGE convolutional layers.\r\n5. GNN training: Train a variational autoencoder GNN, and save the neural embeddings.\r\n\r\n[Run nextflow.sh to execute the full pipeline.](scripts/nextflow.sh)\r\n\r\n[Run clean_nf.sh to clean up the output and logging files from the Nextflow run.](scripts/clean_nf.sh)\r\n\r\n# Python Environment\r\n\r\n[Python dependencies are specified in this requirements.txt file.](services/python/requirements.txt). \r\n\r\nThese dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0\r\n\r\nExecute the following command to pull the image: *docker pull ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0*\r\n\r\n## MLOps\r\n\r\n* [A Docker compose file has been provided to launch an MLOps stack.](docker-compose.yml)\r\n* [See the .env file for Docker environment variables.](.env)\r\n* [The docker_up.sh script can be executed to launch the Docker services.](scripts/docker_up.sh)\r\n* [DVC is included for data version control.](https://dvc.org/)\r\n* [MLFlow is available for experiment tracking.](https://mlflow.org/)\r\n* [MinIO is available for storing experiment artifacts.](https://min.io/)\r\n\r\n# ArangoDB\r\n\r\n[This pipeline provides a simple demonstration for saving and retrieving graph data to ArangoDB, combined with NetworkX usage and integration.](https://docs.arangodb.com/3.11/data-science/adapters/arangodb-networkx-adapter/) \r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbris%2Fnextflow-graph-machine-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjbris%2Fnextflow-graph-machine-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjbris%2Fnextflow-graph-machine-learning/lists"}