{"id":30181251,"url":"https://github.com/multiomics-analytics-group/instanexus","last_synced_at":"2025-08-12T08:07:51.378Z","repository":{"id":308862548,"uuid":"1009235405","full_name":"Multiomics-Analytics-Group/InstaNexus","owner":"Multiomics-Analytics-Group","description":null,"archived":false,"fork":false,"pushed_at":"2025-08-08T09:51:32.000Z","size":14481,"stargazers_count":0,"open_issues_count":2,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-08-08T11:21:15.321Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Multiomics-Analytics-Group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-26T19:51:19.000Z","updated_at":"2025-08-08T09:51:36.000Z","dependencies_parsed_at":"2025-08-08T11:33:01.197Z","dependency_job_id":null,"html_url":"https://github.com/Multiomics-Analytics-Group/InstaNexus","commit_stats":null,"previous_names":["multiomics-analytics-group/instanexus"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/Multiomics-Analytics-Group/InstaNexus","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Multiomics-Analytics-Group%2FInstaNexus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Multiomics-Analytics-Group%2FInstaNexus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Multiomics-Analytics-Group%2FInstaNexus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Multiomics-Analytics-Group%2FInstaNexus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Multiomics-Analytics-Group","download_url":"https://codeload.github.com/Multiomics-Analytics-Group/InstaNexus/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Multiomics-Analytics-Group%2FInstaNexus/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270024697,"owners_count":24514054,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-12T02:00:09.011Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-12T08:07:46.663Z","updated_at":"2025-08-12T08:07:51.362Z","avatar_url":"https://github.com/Multiomics-Analytics-Group.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/instanexus_logo 2.svg\" width=\"600\" alt=\"InstaNexus logo\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\u003cem\u003eA de novo protein sequencing workflow\u003c/em\u003e\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/environment-conda-blue\" alt=\"Conda\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/license-MIT-green\" alt=\"License\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/python-3.9+-blue\" alt=\"Python\"\u003e\n\u003c/p\u003e\n\n---\n\n## Table of Contents\n- [Introduction](#introduction)\n- [Features](#features)\n- [Workflow Diagram](#workflow-diagram)\n- [Repository Structure](#repository-structure)\n- [Prerequisites and Installation](#prerequisites-and-installation)\n- [Getting Started](#getting-started)\n- [Hyperparameter Optimization](#hyperparameter-optimization)\n- [License](#license)\n- [Acknowledgments](#acknowledgments)\n- [References](#references)\n\n---\n\n## Introduction\n\nInstaNexus is a generalizable, end-to-end workflow for direct protein sequencing, tailored to reconstruct full-length protein therapeutics such as antibodies and nanobodies. It integrates AI-driven de novo peptide sequencing with optimized assembly and scoring strategies to maximize accuracy, coverage, and functional relevance.\n\nThis pipeline enables robust reconstruction of critical protein regions, advancing applications in therapeutic discovery, immune profiling, and protein engineering.\n\n---\n\n## Features\n\n- 🧬 Supports De Bruijn Graph and Greedy-based assembly\n- ⚗️ Handles multiple protease digestions (Trypsin, LysC, GluC, etc.)\n- 🧹 Integrated contaminant removal and confidence filtering\n- 🧩 Clustering, alignment, and consensus sequence reconstruction\n- 🔗 Integrates with external tools:\n  - [MMseqs2](https://github.com/soedinglab/MMseqs2) for fast clustering\n  - [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/) for high-quality alignment\n- 📊 Output-ready for downstream analysis and visualization\n\n---\n\n## Workflow Diagram\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"images/instanexus_panel.png\" width=\"900\" alt=\"InstaNexus Workflow\"\u003e\n\u003c/p\u003e\n\n---\n\n## Repository Structure\n\n| File / Folder       | Description                                                                  |\n|---------------------|------------------------------------------------------------------------------|\n| `environment.linux.yml`        | Conda environment definition with required dependencies for linux |\n| `environment.osx-arm64.yaml`   | Conda environment definition with required dependencies for OS    |\n| `README.md`         | Project documentation                                                        |\n| `examples/`         |                                                                              |\n| `fasta/`            | Known contaminants and example FASTA sequences                               |\n| `images/`           | Logos and workflow diagrams (PNG, SVG, PDF)                                  |\n| `inputs/`           | Example datasets (e.g., BSA, antibody, nanobody)                             |\n| `json/`             | JSON metadata for peptide color coding and analysis                          |\n| `notebooks/`        | Jupyter notebooks for visualization and exploration                          |\n| `src/`              | Core scripts to run the InstaNexus pipeline                                  |\n\n---\n\n## Prerequisites and Installation\n\n- [Conda](https://docs.conda.io/en/latest/)\n- [MMseqs2](https://github.com/soedinglab/MMseqs2)\n- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)\n\n\u003e [!IMPORTANT]\n\u003e MMseqs2 and Clustal Omega are available through Conda, but compatibility depends on your system architecture.\n\u003e - 🔍 [Clustal Omega on Anaconda.org](https://anaconda.org/search?q=clustalo)   \n\n---\n\n## Getting Started\n\nFollow these steps to clone the repository and set up the environment using Conda:\n\n### 1. Clone the repository\n\nTo clone and set up the environment:\n\n```bash\ngit clone https://github.com/your-username/instanexus.git\ncd instanexus\n```\n\n### 2. Create the conda environment\n\nCreate instanexus conda environment for linux\n\n```bash\nconda env create -f environment.linux.yml\n```\n\nCreate instanexus conda environment for OS\n\n```bash\nconda env create -f environment.osx-arm64.yaml\n```\n\n### 3. Activate the environment\n\n```bash\nconda activate instanexus\n```\n\n---\n\n## Hyperparameter Optimization\n\nTo launch the hyperparameter grid search, run the following command from the project root (the folder containing ```src/``` and ```json/```):\n\n```bash\npython -m src.opt.gridsearch\n```\n**Adjusting Parameters**\n\nGrid search parameters for both the De Bruijn graph (dbg) and Greedy (greedy) assembly methods are defined in:\n\n```bash\njson/gridsearch_params.json\n```\n\nTo test more (or fewer) combinations, edit the arrays for each parameter in this file.\n\n## License\n\nThis project is licensed under the [MIT License](LICENSE).\n\n---\n\n## Acknowledgments\n\nInstaNexus was developed at **DTU Biosustain** and **DTU Bioengineering**.\n\nWe are grateful to the **DTU Bioengineering Proteomics Core Facility** for maintenance and operation of mass spectrometry instrumentation.\n\nWe also thank the **Informatics Platform at DTU Biosustain** for their support during the development and optimization of InstaNexus.\n\nSpecial thanks to the users and developers of:\n- [MMseqs2](https://github.com/soedinglab/MMseqs2)\n- [Clustal Omega](https://www.ebi.ac.uk/Tools/msa/clustalo/)\n\n---\n\n## References\n\n1. Hauser, M., et al. **MMseqs2: ultra fast and sensitive sequence searching**. *Nature Biotechnology* 35, 1026–1028 (2016). https://doi.org/10.1038/nbt.3988  \n2. Sievers, F., et al. **Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega**. *Molecular Systems Biology* 7, 539 (2011). https://doi.org/10.1038/msb.2011.75\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmultiomics-analytics-group%2Finstanexus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmultiomics-analytics-group%2Finstanexus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmultiomics-analytics-group%2Finstanexus/lists"}