{"id":30740029,"url":"https://github.com/openmined/biovault","last_synced_at":"2026-01-23T02:53:43.006Z","repository":{"id":311716128,"uuid":"1044062260","full_name":"OpenMined/biovault","owner":"OpenMined","description":null,"archived":false,"fork":false,"pushed_at":"2026-01-13T07:11:37.000Z","size":2369,"stargazers_count":7,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-13T09:49:13.467Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMined.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"openmined"}},"created_at":"2025-08-25T06:08:15.000Z","updated_at":"2026-01-09T01:24:47.000Z","dependencies_parsed_at":"2025-09-11T08:27:52.834Z","dependency_job_id":"1aefb56f-ed8c-4e15-8ed5-886d068b92bc","html_url":"https://github.com/OpenMined/biovault","commit_stats":null,"previous_names":["openmined/biovault"],"tags_count":140,"template":false,"template_full_name":null,"purl":"pkg:github/OpenMined/biovault","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2Fbiovault","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2Fbiovault/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2Fbiovault/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2Fbiovault/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMined","download_url":"https://codeload.github.com/OpenMined/biovault/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMined%2Fbiovault/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28561842,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T03:31:16.861Z","status":"ssl_error","status_checked_at":"2026-01-19T03:31:15.069Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-03T23:47:48.465Z","updated_at":"2026-01-19T05:00:50.737Z","avatar_url":"https://github.com/OpenMined.png","language":"Rust","funding_links":["https://github.com/sponsors/openmined"],"categories":[],"sub_categories":[],"readme":"# BioVault\n\nBioVault is a free, open-source, permissionless network for collaborative genomics.\n\nBuilt with end-to-end encryption, secure enclaves, and data visitation, BioVault lets researchers and participants share insights without ever sharing raw data.\n\nhttps://biovault.net/\n\n## Quick Install (One-liner)\n\n```bash\ncurl -sSL https://raw.githubusercontent.com/openmined/biovault/main/install.sh | bash\n```\n\n## Prerequisites\n- [SyftBox](https://syftbox.net)\n- [NextFlow](https://www.nextflow.io)\n  - [Java 17+](https://openjdk.java.net/)\n- [Docker](https://www.docker.com) (optional)\n\n## Setup\nRun `bv check` and make sure you have the depenencies listed below.\n```\nbv check\n\nBioVault Dependency Check\n=========================\n\nChecking java...  (version 23)✓ Found\nChecking docker... ✓ Found (running)\nChecking nextflow... ✓ Found\nChecking syftbox... ✓ Found\n\n=========================\n✓ All dependencies satisfied!\n```\n\n## Automatic Setup\nYou can `bv setup` on some systems such as macOS and Google Colab and `bv` will help you to install the dependencies.\n\n## SyftBox\nSyftBox requires setup and authentication.\n\n## Tutorials:\n- [1) Hello World](tutorials/1_hello_world.md)\n- [2) Submit Your Project](tutorials/2_submit_your_project.md)\n- [3) Create a Biobank](tutorials/3_create_biobank.md)\n\n## Documentation\n\n- [Development Guide](DEV.md) - Setup and testing instructions\n- [Security](SECURITY.md) - How BioVault protects your data with SyftBox permissions\n\n## Development\n\nFor development setup and commands, see [DEV.md](DEV.md).\n\n\n\n## CLI Overview\n\nThe `bv` CLI provides commands to manage BioVault projects, data, messaging, and utilities.\n\nGlobal flags\n- `-v, --verbose` Increase log verbosity\n- `--config \u003cpath\u003e` Use a specific config file\n\nTop-level commands\n- `bv update` Check for updates and install the latest\n- `bv init [email]` Initialize a new BioVault repo; email is optional (detected from `SYFTBOX_EMAIL` if omitted)\n- `bv info` Show system information\n- `bv check` Check for required dependencies\n- `bv setup` Setup environment for known systems (e.g., Google Colab)\n- `bv project create [--name \u003cname\u003e] [--folder \u003cpath\u003e]` Create a new project scaffold\n- `bv run \u003cproject_folder\u003e \u003cparticipant_source\u003e [--test] [--download] [--dry-run] [--with-docker=\u003cbool\u003e] [--work-dir \u003cdir\u003e] [--resume]`\n  - `participant_source` can be a local file path, Syft URL, or HTTP URL (with optional `#fragment`)\n  - `--with-docker` defaults to `true`\n- `bv sample-data fetch [--participant-ids id1,id2,...] [--all]` Fetch sample data\n- `bv sample-data list` List available sample data\n- `bv participant add [--id \u003cID\u003e] [--aligned \u003cfile\u003e]` Add a participant record\n- `bv participant list` List participants\n- `bv participant delete \u003cID\u003e` Delete a participant\n- `bv participant validate [--id \u003cID\u003e]` Validate participant files (all if omitted)\n- `bv biobank list` List biobanks in SyftBox\n- `bv biobank publish [--participant-id \u003cID\u003e] [--all] [--http-relay-servers host1,host2,...]` Publish participants\n- `bv biobank unpublish [--participant-id \u003cID\u003e] [--all]` Unpublish participants\n- `bv config email \u003cemail\u003e` Set email address\n- `bv config syftbox [--path \u003cconfig.json\u003e]` Set SyftBox config path\n- `bv fastq combine \u003cinput_folder\u003e \u003coutput_file\u003e [--validate] [--no-prompt] [--stats-format tsv|yaml|json]` Combine/validate FASTQ files\n- `bv submit \u003cproject_path\u003e \u003cdestination\u003e` Submit a project (destination is datasite email or full Syft URL)\n- `bv samplesheet create \u003cinput_dir\u003e \u003coutput_file\u003e [--file_filter \u003cpattern\u003e] [--extract_cols \u003cpattern\u003e] [--ignore]` Create sample sheet CSV from files\n\nInbox and messaging\n- `bv inbox` Interactive inbox (default; uses single-key shortcuts)\n  - Shortcuts: `?`/`h` Help, `n` New, `s` Sync, `v` Change view, `q` Quit, `1..5` Tabs (Inbox, Sent, All, Unread, Projects)\n  - Arrow keys navigate; Enter opens the selected message or Quit\n- `bv inbox --plain [--sent] [--all] [--unread] [--projects] [--type \u003ctext|project|request\u003e] [--from \u003csender\u003e] [--search \u003cterm\u003e]`\n  - Non-interactive list output with filters\n- `bv message send \u003crecipient\u003e \u003cmessage\u003e [-s|--subject \u003csubject\u003e]` Send a message\n- `bv message reply \u003cmessage_id\u003e \u003cbody\u003e` Reply to a message\n- `bv message read \u003cmessage_id\u003e` Read a specific message\n- `bv message delete \u003cmessage_id\u003e` Delete a message\n- `bv message list [--unread]` List messages (optionally only unread)\n- `bv message thread \u003cthread_id\u003e` View a message thread\n- `bv message sync` Sync messages (check for new and update ACKs)\n\nExamples\n- Initialize and set email: `bv init you@example.com`\n- Create a new project: `bv project create --name demo --folder ./demo`\n- Run a project with test data: `bv run ./demo participants.yaml --test --download`\n- Combine FASTQs: `bv fastq combine ./fastq_pass ./combined/output.fastq.gz --validate`\n- Interactive inbox: `bv inbox` (press `?` for shortcuts)\n- Plain inbox list: `bv inbox --plain --unread`\n- Create sample sheet from genotype files:\n  ```bash\n  # Extract participant IDs from filenames matching a pattern\n  bv samplesheet create test_dir output.csv --extract_cols=\"{participant_id}_X_X_GSAv3-DTC_GRCh38-{date}.txt\"\n\n  # Example with files: 103704_X_X_GSAv3-DTC_GRCh38-07-01-2025.txt\n  # Produces CSV:\n  # participant_id,genotype_file_path\n  # 103704,/absolute/path/test_dir/103704_X_X_GSAv3-DTC_GRCh38-07-01-2025.txt\n  ```\n\n## File Import Workflow\n\nThe `bv files` commands provide a flexible workflow for importing genomic data files with automatic participant ID extraction and file type detection.\n\n### Complete Import Example\n\n#### 1. Scan directory to see what file types are available\n\n```bash\nbv files scan /path/to/data\n```\n\nOutput:\n```\n📊 Scan Results: /path/to/data\n\nExtensions Found:\n  .txt  323 files    6701.8 MB\n  .csv  4 files    32.6 MB\n\nTotal: 332 files\n```\n\n#### 2. Suggest patterns for extracting participant IDs from file paths\n\n```bash\nbv files suggest-patterns /path/to/data --ext .txt\n```\n\nOutput:\n```\n🔍 Detected Patterns:\n\n1. {parent} - Directory name as participant ID\n   Example: huE922FC/...\n   Sample extractions:\n     huE922FC/... → participant ID: huE922FC\n     huBF0F93/... → participant ID: huBF0F93\n```\n\n#### 3. Preview import with dry-run\n\n```bash\nbv files import /path/to/data --ext .txt --pattern {parent} --dry-run\n```\n\nOutput shows sample participant ID extractions without importing.\n\n#### 4. Export file list to CSV with pattern-based participant ID extraction\n\n```bash\nbv files export-csv /path/to/data --ext .txt --pattern {parent} -o genotype-files.csv\n```\n\nOutput:\n```\n📊 Found 323 files\n✓ Exported 323 files to genotype-files.csv\n```\n\n#### 5. Detect file types and update CSV\n\n```bash\nbv files detect-csv genotype-files.csv -o genotype-files.csv\n```\n\nOutput:\n```\n🔍 Detecting file types from genotype-files.csv\n📋 Processing 323 files\n🔍 Detecting... 323/323\n✓ Updated CSV written to genotype-files.csv\n```\n\n#### 6. Import files using the CSV\n\n```bash\nbv files import-csv genotype-files.csv\n```\n\nOutput:\n```\n📋 CSV Import Preview: genotype-files.csv\n  Files to import: 323\n```\n\n### Pattern Examples\n\n- `{parent}` - Use parent directory name as participant ID\n- `{filename}` - Use filename as participant ID\n- Custom patterns can extract from any part of the file path\n\n### Available Commands\n\n- `bv files scan \u003cpath\u003e` - Scan directory and show file type statistics\n- `bv files suggest-patterns \u003cpath\u003e --ext \u003cextension\u003e` - Analyze files and suggest participant ID extraction patterns\n- `bv files import \u003cpath\u003e --ext \u003cext\u003e --pattern \u003cpattern\u003e [--dry-run]` - Preview or import files with pattern\n- `bv files export-csv \u003cpath\u003e --ext \u003cext\u003e --pattern \u003cpattern\u003e -o \u003coutput.csv\u003e` - Export file list with participant IDs to CSV\n- `bv files detect-csv \u003cinput.csv\u003e -o \u003coutput.csv\u003e` - Detect file types and update CSV\n- `bv files import-csv \u003cfile.csv\u003e` - Import files from CSV into BioVault database\n\n## SyftBox VirtualEnv\nIf you need to run multiple syftbox instances checkout `sbenv` which will help you to isolate them on your machine:\nhttps://github.com/openmined/sbenv\n\nBioVault can auto detect when its in an `sbenv activate` environment and will target that isolated syftbox for all its usage.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmined%2Fbiovault","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmined%2Fbiovault","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmined%2Fbiovault/lists"}