Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jbris/nextflow-graph-machine-learning
A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.
https://github.com/jbris/nextflow-graph-machine-learning
deep-learning docker docker-compose dream5 gene-regulatory-network gene-regulatory-network-inference gene-regulatory-networks graph-neural-network graph-neural-networks graphsage machine-learning minio mlflow mlops nextflow nextflow-pipeline nextflow-pipelines variational-autoencoder variational-inference
Last synced: 4 days ago
JSON representation
A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.
- Host: GitHub
- URL: https://github.com/jbris/nextflow-graph-machine-learning
- Owner: JBris
- License: mit
- Created: 2023-08-29T21:54:05.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-02T11:52:11.000Z (8 months ago)
- Last Synced: 2024-04-02T13:15:29.941Z (8 months ago)
- Topics: deep-learning, docker, docker-compose, dream5, gene-regulatory-network, gene-regulatory-network-inference, gene-regulatory-networks, graph-neural-network, graph-neural-networks, graphsage, machine-learning, minio, mlflow, mlops, nextflow, nextflow-pipeline, nextflow-pipelines, variational-autoencoder, variational-inference
- Language: Python
- Homepage: https://jbris.github.io/nextflow-graph-machine-learning/
- Size: 5.69 MB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Nextflow Graph Machine Learning
[![Validate Pipeline](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/validation.yml/badge.svg)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/validation.yml) [![Generate Documentation](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/docs.yml/badge.svg)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/docs.yml) [![pages-build-deployment](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/pages/pages-build-deployment)
[![CodeQL](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/github-code-scanning/codeql/badge.svg?branch=main)](https://github.com/JBris/nextflow-graph-machine-learning/actions/workflows/github-code-scanning/codeql)Website: [Nextflow Graph Machine Learning](https://jbris.github.io/nextflow-graph-machine-learning/)
*A Nextflow pipeline demonstrating how to train graph neural networks for gene regulatory network reconstruction using DREAM5 data.*
# Table of contents
- [Nextflow Graph Machine Learning](#nextflow-graph-machine-learning)
- [Table of contents](#table-of-contents)
- [Introduction](#introduction)
- [The Nextflow pipeline](#the-nextflow-pipeline)
- [Python Environment](#python-environment)
- [MLOps](#mlops)
- [ArangoDB](#arangodb)# Introduction
The purpose of this project is to provide a simple demonstration of how to construct a Nextflow pipeline, with MLOps integration, for performing gene regulatory network (GRN) reconstruction using graph neural networks (GNNs). In practice, GRN reconstruction is an unsupervised link prediction problem.
[For developing GNNs, we use PyTorch Geometric.](https://pytorch-geometric.readthedocs.io/en/latest/)
# The Nextflow pipeline
[Nextflow has been included to orchestrate the GRN reconstruction pipeline.](https://www.nextflow.io/)
The pipeline is composed of the following steps:
1. Exploratory data analysis: View the GRN and calculate some summary statistics.
2. Processing: Process the graph feature matrix and edge list. Remove the disconnected subgraph.
3. ArangoDB Importing: Import the graph into ArangoDB.
4. GNN training: Train a GNN using SAGE convolutional layers.
5. GNN training: Train a variational autoencoder GNN, and save the neural embeddings.[Run nextflow.sh to execute the full pipeline.](scripts/nextflow.sh)
[Run clean_nf.sh to clean up the output and logging files from the Nextflow run.](scripts/clean_nf.sh)
# Python Environment
[Python dependencies are specified in this requirements.txt file.](services/python/requirements.txt).
These dependencies are installed during the build process for the following Docker image: ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0
Execute the following command to pull the image: *docker pull ghcr.io/jbris/nextflow-graph-machine-learning:1.0.0*
## MLOps
* [A Docker compose file has been provided to launch an MLOps stack.](docker-compose.yml)
* [See the .env file for Docker environment variables.](.env)
* [The docker_up.sh script can be executed to launch the Docker services.](scripts/docker_up.sh)
* [DVC is included for data version control.](https://dvc.org/)
* [MLFlow is available for experiment tracking.](https://mlflow.org/)
* [MinIO is available for storing experiment artifacts.](https://min.io/)# ArangoDB
[This pipeline provides a simple demonstration for saving and retrieving graph data to ArangoDB, combined with NetworkX usage and integration.](https://docs.arangodb.com/3.11/data-science/adapters/arangodb-networkx-adapter/)