{"id":25651097,"url":"https://github.com/dcolinmorgan/pynb","last_synced_at":"2025-10-05T13:11:00.005Z","repository":{"id":277309713,"uuid":"931933578","full_name":"dcolinmorgan/pyNB","owner":"dcolinmorgan","description":"python version of NestBoot-FDR","archived":false,"fork":false,"pushed_at":"2025-07-15T12:13:36.000Z","size":834,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-16T03:09:02.047Z","etag":null,"topics":["bootstrap","grn","stability"],"latest_commit_sha":null,"homepage":"https://academic.oup.com/bioinformatics/article/35/6/1026/5086392","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcolinmorgan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-13T04:59:07.000Z","updated_at":"2025-03-17T05:28:48.000Z","dependencies_parsed_at":"2025-07-15T14:17:04.350Z","dependency_job_id":"0dfbc655-2795-4018-8136-1509742c3f3d","html_url":"https://github.com/dcolinmorgan/pyNB","commit_stats":null,"previous_names":["dcolinmorgan/pynb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dcolinmorgan/pyNB","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcolinmorgan%2FpyNB","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcolinmorgan%2FpyNB/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcolinmorgan%2FpyNB/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcolinmorgan%2FpyNB/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcolinmorgan","download_url":"https://codeload.github.com/dcolinmorgan/pyNB/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcolinmorgan%2FpyNB/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272847349,"owners_count":25003177,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-30T02:00:09.474Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bootstrap","grn","stability"],"created_at":"2025-02-23T16:17:29.818Z","updated_at":"2025-10-05T13:10:59.937Z","avatar_url":"https://github.com/dcolinmorgan.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Network Bootstrap FDR\n\nA Python implementation of NB-FDR (Network Bootstrap False Discovery Rate) analysis for network inference. This package implements an algorithm to estimate bootstrap support for network links by comparing measured networks against a shuffled (null) dataset. It computes key metrics such as assignment fractions, evaluates overlap between inferred links, and determines a bootstrap support cutoff at the desired false discovery rate.\n\n```n.b. this package is not meant to run network inference, only to compute the FDR based on the inferred networks from multiple bootstrap runs. However, installing [workflow] installs tools needed to repeat figure below (i.e. snakemake \u0026 scenic+) ```\n\n## Overview\n\nIn high-throughput network analysis, bootstrapping is used to assess the stability of inferred links. NB-FDR leverages bootstrap iterations to compute the assignment fraction (i.e. the frequency at which a link is inferred) and compares these results against a null distribution obtained from shuffled data. The differences between the measured and shuffled data inform the support level guaranteed for a target FDR level.\n\nKey features of this package include:\n- **Computation of Assignment Fractions:** For both measured and null networks based on bootstrap runs.\n- **Comparison Between Measured and Null Distributions:** To determine a support metric that approximates (1 - FDR).\n- **Export of Results:** Summary statistics are saved as a text file.\n- **Visualization:** A dual-axis plot displays the bootstrap support metric (left y-axis) and normalized link frequencies (right y-axis) for both normal and shuffled data.\n- **Modular Design:** Clear separation of source code, tests, examples, and configuration.\n- **Snakemake Workflow:** Automated analysis pipeline for processing multiple samples.\n- **SCENIC+ Integration:** Optional integration with scenicplus for comprehensive gene regulatory network analysis.\n\n## Analysis Output as Figure\n\n![Analysis Output](output/analysis_plot.png)\n\n## Package Structure\n```\nnetwork_bootstrap/\n├── pyproject.toml         # Build and dependency configuration\n├── README.md              # Package overview and usage guide\n├── src/\n│   └── network_bootstrap/\n│       ├── __init__.py\n│       ├── nb_fdr.py      # Core implementation of NB-FDR analysis\n│       ├── utils.py       # Utility functions for network analysis\n│       └── workflow/      # Snakemake workflow for automated analysis\n│           ├── __init__.py\n│           ├── Snakefile\n│           ├── config/\n│           │   └── config.yaml\n│           └── scripts/\n│               ├── compute_assign_frac.py\n│               ├── nb_fdr_analysis.py\n│               ├── generate_plots.py\n│               └── compute_density.py\n├── tests/\n│   ├── __init__.py\n│   └── test_network_bootstrap.py   # Pytest-based tests\n└── examples/\n    ├── basic_usage.py     # Example script demonstrating package usage\n    └── run_workflow.py    # Example script for running the workflow\n```\n\n## Installation\n\nThe recommended way to install the package is to use a virtual environment. For example:\n\n```bash\npython -m venv venv\nsource venv/bin/activate           # On Windows use: venv\\Scripts\\activate\npip install -e \".[dev]\"            # For development\npip install -e \".[workflow]\"       # For Snakemake workflow and SCENIC+ capabilities\n```\n\nThis installs all required dependencies including `numpy`, `pandas`, `matplotlib`, and `pytest`. If you install with the `workflow` extra, you'll also get `snakemake` and `scenicplus` for running the automated analysis pipeline and gene regulatory network analysis.\n\n## Usage\n\n### Basic API Usage\n\nA complete working example is provided in the `examples/basic_usage.py` file. In summary, the workflow is as follows:\n\n1. **Process Input Data:**  \n   Load CSV files containing network data. Each file should include columns `gene_i`, `gene_j`, `run`, and `link_value` where `run` indicates the bootstrap run number.\n\n2. **Compute Assignment Fractions:**  \n   Use the `compute_assign_frac()` method to calculate the frequency (Afrac) and sign fraction (Asign_frac) for each network link.\n\n3. **Merge Measured and Null Data:**  \n   Combine the calculated metrics for the normal and shuffled networks.\n\n4. **Run NB-FDR Analysis:**  \n   Call the `nb_fdr()` method to compute core network metrics, which returns a `NetworkResults` dataclass.\n\n5. **Export and Visualize Results:**  \n   - **Text Summary:** Use `export_results()` to generate a text file summary.\n   - **Visualization:** Use `plot_analysis_results()` to create a dual-axis plot. The left y-axis displays a support metric (calculated as the difference in link frequencies between measured and null data normalized by the measured frequency, approximating (1 - FDR)), while the right y-axis shows normalized link frequency distributions.\n\nExample:\n\n```python\nfrom pathlib import Path\nfrom network_bootstrap.nb_fdr import NetworkBootstrap\nimport pandas as pd\n\ndef process_network_data(data_path: str, is_null: bool = False) -\u003e pd.DataFrame:\n    \"\"\"Process raw network data from a CSV file.\"\"\"\n    df = pd.read_csv(data_path)\n    df['run'] = df.run.str.extract(r'(\\d+)').astype(int)\n    return df[df['run'] \u003c 65].sort_values('run')\n\ndef main() -\u003e None:\n    \"\"\"Main execution function.\"\"\"\n    # Load data\n    normal_data = process_network_data('../data/normal_data.gz')\n    null_data = process_network_data('../data/null_data.gz', is_null=True)\n    \n    # Initialize analyzer\n    nb = NetworkBootstrap()\n    \n    # Run NB-FDR analysis\n    results = nb.nb_fdr(\n        normal_df=normal_data,\n        shuffled_df=null_data,\n        init=64,\n        data_dir=Path(\"output\"),\n        fdr=0.05,\n        boot=8\n    )\n    \n    # Print key results\n    print(f\"Network sparsity: {(results.xnet != 0).mean():.3f}\")\n    print(f\"Node count: {results.xnet.shape[0]:.3f}\")\n    print(f\"Edge count: {results.xnet.sum():.3f}\")\n    print(f\"False positive rate: {results.fp_rate:.3f}\")\n    print(f\"Support threshold: {results.support:.3f}\")\n\n    # Export results and plot analysis\n    nb.export_results(results, Path(\"output/results.txt\"))\n    \n    # Re-create merged DataFrame for plotting\n    agg_normal = nb.compute_assign_frac(normal_data, 64, 8)\n    agg_normal.rename(columns={'Afrac': 'Afrac_norm', 'Asign_frac': 'Asign_frac_norm'}, inplace=True)\n    agg_shuffled = nb.compute_assign_frac(null_data, 64, 8)\n    agg_shuffled.rename(columns={'Afrac': 'Afrac_shuf', 'Asign_frac': 'Asign_frac_shuf'}, inplace=True)\n    merged = pd.merge(agg_normal, agg_shuffled, on=['gene_i', 'gene_j'])\n    \n    nb.plot_analysis_results(merged, Path(\"output/analysis_plot.png\"), bins=32)\n\nif __name__ == '__main__':\n    main()\n```\n\n### Using the Snakemake Workflow\n\nThe package includes a Snakemake workflow for automating analysis of multiple samples. To use it:\n\n1. **Create Workflow Directory:**\n\n```python\nfrom network_bootstrap import create_workflow_directory\n\n# Create a directory with Snakefile and config.yaml\nworkflow_dir = create_workflow_directory(\"my_workflow\", overwrite=True)\n```\n\n2. **Prepare Input Data:**\n\nOrganize your input data in the format expected by the workflow:\n- Place normal data files at: `\u003coutput_dir\u003e/data/\u003csample\u003e/normal_data.csv`\n- Place shuffled data files at: `\u003coutput_dir\u003e/data/\u003csample\u003e/shuffled_data.csv`\n\n3. **Edit Configuration:**\n\nModify the `config/config.yaml` file to specify samples and parameters.\n\n4. **Run the Workflow:**\n\n```python\nfrom network_bootstrap import run_workflow\n\n# Dry run to check that everything is set up correctly\nrun_workflow(\"my_workflow\", dry_run=True)\n\n# Actual run with 4 cores\nrun_workflow(\"my_workflow\", cores=4)\n```\n\nAlternatively, you can run the workflow directly with the `snakemake` command:\n\n```bash\ncd my_workflow\nsnakemake --cores 4\n```\n\n5. **Examine Results:**\n\nThe workflow generates:\n- Assignment fraction data in `\u003coutput_dir\u003e/processed/\u003csample\u003e/`\n- Analysis results in `\u003coutput_dir\u003e/results/\u003csample\u003e/`\n- Plots in `\u003coutput_dir\u003e/plots/\u003csample\u003e/`\n- Network density information in `\u003coutput_dir\u003e/density/`\n\n### Integration with SCENIC+\n\nThe package can be used in conjunction with SCENIC+ for comprehensive gene regulatory network analysis. When you install the package with the `workflow` extra dependencies, you'll have access to SCENIC+ functionality that can be used to:\n\n1. Run network inference using SCENIC+ methods\n2. Evaluate networks with bootstrapped FDR through our NB-FDR implementation\n3. Visualize and analyze results within a unified framework\n\nTo use SCENIC+ with NB-FDR:\n\n1. **Install the package with workflow dependencies:**\n   ```bash\n   pip install -e \".[workflow]\"\n   ```\n\n2. **Create a custom Snakefile that combines SCENIC+ and NB-FDR:**\n   You can adapt the example Snakefile in `src/network_bootstrap/workflow/Snakefile` and the SCENIC+ Snakefile to create a workflow that:\n   - Runs SCENIC+ to infer networks\n   - Uses bootstrapping for multiple iterations\n   - Runs NB-FDR to assess stability and significance\n   - Produces integrated reports and visualizations\n\n3. **Recommended directory structure for SCENIC+ integration:**\n   ```\n   project/\n   ├── config/\n   │   └── config.yaml       # Combined configuration\n   ├── data/\n   │   ├── reference/        # Reference files for SCENIC+\n   │   └── input/            # Input files\n   ├── results/\n   │   ├── scenic/           # SCENIC+ results\n   │   └── nb_fdr/           # NB-FDR results\n   └── Snakefile             # Combined workflow file\n   ```\n\n## Testing\n\nTo run the tests with pytest, simply execute:\n\n```bash\npytest\n```\n\nThis command will run all tests contained in the `tests/` directory.\n\n## Contributing\n\nContributions and feedback are welcome! Please open issues or submit pull requests on GitHub.\n\n## References\n\n- [CancerGRN Analysis Example](https://dcolin.shinyapps.io/cancergrn/)\n- [Bioinformatics Article](https://academic.oup.com/bioinformatics/article/35/6/1026/5086392)\n- [SCENIC+ Documentation](https://scenicplus.readthedocs.io/)\n\n## License\n\nThis project is licensed under the [Your License Name] License.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcolinmorgan%2Fpynb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcolinmorgan%2Fpynb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcolinmorgan%2Fpynb/lists"}