{"id":37608549,"url":"https://github.com/krishnanlab/obnb","last_synced_at":"2026-01-16T10:14:55.337Z","repository":{"id":58689198,"uuid":"173994002","full_name":"krishnanlab/obnb","owner":"krishnanlab","description":"A Python toolkit for setting up benchmarking dataset using biomedical networks","archived":false,"fork":false,"pushed_at":"2025-10-20T23:40:58.000Z","size":2790,"stargazers_count":22,"open_issues_count":67,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-21T01:24:46.686Z","etag":null,"topics":["benchmark-datasets","computational-biology","machine-learning","network-biology"],"latest_commit_sha":null,"homepage":"https://obnb.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/krishnanlab.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2019-03-05T17:49:58.000Z","updated_at":"2025-02-18T13:50:00.000Z","dependencies_parsed_at":"2022-09-06T05:43:03.577Z","dependency_job_id":"282da3d3-ad10-460f-9b56-17f2e8aef671","html_url":"https://github.com/krishnanlab/obnb","commit_stats":null,"previous_names":[],"tags_count":15,"template":false,"template_full_name":null,"purl":"pkg:github/krishnanlab/obnb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fobnb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fobnb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fobnb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fobnb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/krishnanlab","download_url":"https://codeload.github.com/krishnanlab/obnb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/krishnanlab%2Fobnb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478049,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark-datasets","computational-biology","machine-learning","network-biology"],"created_at":"2026-01-16T10:14:55.208Z","updated_at":"2026-01-16T10:14:55.326Z","avatar_url":"https://github.com/krishnanlab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/obnb.svg)](https://badge.fury.io/py/obnb)\n[![Documentation Status](https://readthedocs.org/projects/obnb/badge/?version=latest)](https://obnb.readthedocs.io/en/latest/?badge=latest)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)](https://pycqa.github.io/isort/)\n\n[![Tests](https://github.com/krishnanlab/obnb/actions/workflows/tests.yml/badge.svg)](https://github.com/krishnanlab/obnb/actions/workflows/tests.yml)\n[![Test Examples](https://github.com/krishnanlab/obnb/actions/workflows/examples.yml/badge.svg)](https://github.com/krishnanlab/obnb/actions/workflows/examples.yml)\n[![Test Data](https://github.com/krishnanlab/obnb/actions/workflows/test_data.yml/badge.svg)](https://github.com/krishnanlab/obnb/actions/workflows/test_data.yml)\n\n# Open Biomedical Network Benchmark\n\nThe Open Biomedical Network Benchmark (OBNB) is a comprehensive resource for setting up benchmarking graph datasets using _biomedical networks_ and _gene annotations_.\nOur goal is to accelerate the adoption of advanced graph machine learning techniques, such as graph neural networks and graph embeddings, in network biology for gaining novel insights into genes' function, trait, and disease associations using biological networks.\nTo make this adoption convenient, OBNB also provides dataset objects compatible with popular graph deep learning frameworks, including [PyTorch Geometric (PyG)](https://github.com/pyg-team/pytorch_geometric) and [Deep Graph Library (DGL)](https://github.com/dmlc/dgl).\n\nA comprehensive benchmarking study with a wide-range of graph neural networks and graph embedding methods on OBNB datasets can be found in our benchmarking repository [`obnbench`](https://github.com/krishnanlab/obnbench).\n\n## Package usage\n\n### Construct default datasets\n\nWe provide a high-level dataset constructor to help users easily set up benchmarking graph datasets\nfor a combination of network and label. In particular, the dataset will be set up with study-bias\nholdout split (6/2/2), where 60% of the most well-studied genes according to the number of\nassociated PubMed publications are used for training, 20% of the least studied genes are used for\ntesting, and the rest of the 20% genes are used for validation. For more customizable data loading\nand processing options, see the [customized dataset construction](#customized-dataset-construction)\nsection below.\n\n```python\nfrom obnb.dataset import OpenBiomedNetBench\nfrom obnb.util.version import get_available_data_versions\n\nroot = \"datasets\"  # save dataset and cache under the datasets/ directory\nversion = \"current\"  # use the last archived version\n# Optionally, set version to the specific data version number\n# Or, set version to \"latest\" to download the latest data from source and process it from scratch\n\n# Download and process network/label data. Use the adjacency matrix as the ML feature\ndataset = OpenBiomedNetBench(root=root, graph_name=\"BioGRID\", label_name=\"DisGeNET\",\n                             version=version, graph_as_feature=True, use_dense_graph=True)\n\n# Check the specific archive data version used\nprint(dataset.version)\n\n# Check all available stable archive data versions\nprint(get_available_data_versions())\n```\n\nUsers can also load the dataset objects into ones that are compatible with PyG or DGL (see below).\n\n#### PyG dataset\n\n```python\nfrom obnb.dataset import OpenBiomedNetBenchPyG\ndataset = OpenBiomedNetBenchPyG(root, \"BioGRID\", \"DisGeNET\")\n```\n\n**Note**: requires installing PyG first (see [installation instructions](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html))\n\n#### DGL dataset\n\n```python\nfrom obnb.dataset import OpenBiomedNetBenchDGL\ndataset = OpenBiomedNetBenchDGL(root, \"BioGRID\", \"DisGeNET\")\n```\n\n**Note**: requires installing DGL first (see [installation instructions](https://www.dgl.ai/pages/start.html))\n\n### Evaluating standard models\n\nEvaluation of simple machine learning methods such as logistic regression and label propagation\ncan be done easily using the trainer objects.\n\n```python\nfrom obnb.model_trainer import SupervisedLearningTrainer, LabelPropagationTrainer\n\nsl_trainer = SupervisedLearningTrainer()\nlp_trainer = LabelPropagationTrainer()\n```\n\nThen, use the `fit_and_eval` method of the trainer to evaluate a given ML model over all tasks\nin a one-vs-rest setting.\n\n```python\nfrom sklearn.linear_model import LogisticRegression\nfrom obnb.model.label_propagation import OneHopPropagation\n\n# Initialize models\nsl_mdl = LogisticRegression(penalty=\"l2\", solver=\"lbfgs\")\nlp_mdl = OneHopPropagation()\n\n# Evaluate the models over all tasks\nsl_results = sl_trainer.fit_and_eval(sl_mdl, dataset)\nlp_results = lp_trainer.fit_and_eval(lp_mdl, dataset)\n```\n\n### Evaluating GNN models\n\nTraining and evaluation of Graph Neural Network (GNN) models can be done in a very similar fashion.\n\n```python\nfrom torch_geometric.nn import GCN\nfrom obnb.model_trainer.gnn import SimpleGNNTrainer\n\n# Use onehot encoded log degress as node feature by default\ndataset = OpenBiomedNetBench(root=root, graph_name=\"BioGRID\", label_name=\"DisGeNET\",\n                             auto_generate_feature=\"OneHotLogDeg\", version=version)\n\n# Train and evaluate a GCN\ngcn_mdl = GCN(in_channels=1, hidden_channels=64, num_layers=5, out_channels=dataest.num_tasks)\ngcn_trainer = SimpleGNNTrainer(device=\"cuda\", metric_best=\"apop\")\ngcn_results = gcn_trainer.train(gcn_mdl, dataset)\n```\n\n### Customized dataset construction\n\n#### Load network and labels\n\n```python\nfrom obnb import data\n\n# Load processed BioGRID data from archive.\ng = data.BioGRID(root, version=version)\n\n# Load DisGeNET gene set collections.\nlsc = data.DisGeNET(root, version=version)\n```\n\n#### Setting up data and splits\n\n```python\nfrom obnb.util.converter import GenePropertyConverter\nfrom obnb.label.split import RatioPartition\n\n# Load PubMed count gene property converter and use it to set up\n# 6/2/2 study-bias based train/val/test splits\npubmedcnt_converter = GenePropertyConverter(root, name=\"PubMedCount\")\nsplitter = RatioPartition(0.6, 0.2, 0.2, ascending=False,\n                          property_converter=pubmedcnt_converter)\n```\n\n#### Filter labeled data based on network genes and splits\n\n```python\n# Apply in-place filters to the labelset collection\nlsc.iapply(\n    filters.Compose(\n        # Only use genes that are present in the network\n        filters.EntityExistenceFilter(list(g.node_ids)),\n        # Remove any labelsets with less than 50 network genes\n        filters.LabelsetRangeFilterSize(min_val=50),\n        # Make sure each split has at least 10 positive examples\n        filters.LabelsetRangeFilterSplit(min_val=10, splitter=splitter),\n    ),\n)\n```\n\n#### Combine into dataset\n\n```python\nfrom obnb import Dataset\ndataset = Dataset(graph=g, feature=g.to_dense_graph().to_feature(), label=lsc, splitter=splitter)\n```\n\n## Installation\n\nOBNB can be installed easily via pip from [PyPI](https://pypi.org/project/obnb/):\n\n```bash\npip install obnb\n```\n\n### Install with extension modules (optional)\n\nOBNB provides interfaces with several other packages for network feature extractions, such as\n[PecanPy](https://github.com/krishnanlab/PecanPy) and [GraPE](https://github.com/AnacletoLAB/grape).\nTo enable those extensions, install `obnb` with the `ext` extra option enabled:\n\n```bash\npip install obnb[ext]\n```\n\n### Install graph deep learning libraries (optional)\n\nFollow installation instructions for [PyG](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html) or [DGL](https://www.dgl.ai/pages/start.html) to set up the graph deep learning library of your choice.\n\nAlternatively, we also provide an [installation script](install.sh) that helps you installthe graph deep-learning dependencies in a new conda environment `obnb`:\n\n```bash\ngit clone https://github.com/krishnanlab/obnb \u0026\u0026 cd obnb\nsource install.sh cu117  # other options are [cpu,cu118]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrishnanlab%2Fobnb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkrishnanlab%2Fobnb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkrishnanlab%2Fobnb/lists"}