{"id":19629517,"url":"https://github.com/vjcitn/biocpyinterop","last_synced_at":"2026-05-07T16:14:12.652Z","repository":{"id":173351398,"uuid":"650630246","full_name":"vjcitn/BiocPyInterop","owner":"vjcitn","description":"Material for Bioconductor 2023 workshop on interoperation with python","archived":false,"fork":false,"pushed_at":"2023-07-29T11:22:44.000Z","size":303,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"devel","last_synced_at":"2025-02-26T20:31:09.567Z","etag":null,"topics":["basilisk","bioconductor","cite-seq","genetics","hail","reticulate","scvi-tools","single-cell-omics","spark"],"latest_commit_sha":null,"homepage":"https://vjcitn.github.io/BiocPyInterop/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vjcitn.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-07T13:23:48.000Z","updated_at":"2023-07-26T11:44:15.000Z","dependencies_parsed_at":null,"dependency_job_id":"c158915f-9cc4-4d12-b708-0955bbe40255","html_url":"https://github.com/vjcitn/BiocPyInterop","commit_stats":null,"previous_names":["vjcitn/biocpyinterop"],"tags_count":0,"template":false,"template_full_name":"Bioconductor/BuildABiocWorkshop","purl":"pkg:github/vjcitn/BiocPyInterop","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vjcitn%2FBiocPyInterop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vjcitn%2FBiocPyInterop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vjcitn%2FBiocPyInterop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vjcitn%2FBiocPyInterop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vjcitn","download_url":"https://codeload.github.com/vjcitn/BiocPyInterop/tar.gz/refs/heads/devel","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vjcitn%2FBiocPyInterop/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32745362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-07T02:14:30.463Z","status":"ssl_error","status_checked_at":"2026-05-07T02:14:29.405Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["basilisk","bioconductor","cite-seq","genetics","hail","reticulate","scvi-tools","single-cell-omics","spark"],"created_at":"2024-11-11T11:59:10.121Z","updated_at":"2026-05-07T16:14:12.637Z","avatar_url":"https://github.com/vjcitn.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BiocPyInterop\n\nThis package is the basis of a workshop that will address use of python\nsoftware in Bioconductor.  We focus on use of basilisk in packages BiocHail and scviR.\n\n### Description of workshop\n\nAbstract: Multilingual data science strategies can increase efficiency of discovery by taking advantage of diverse data management and analysis strategies.  \n\nIn this workshop we will examine interplay between R, Python, and Apache Spark in genetic and single-cell applications.  CITE-seq studies simultaneously quantify surface protein and mRNA abundance in single cells.  We will use scviR to compare interpretations based on deep learning and sequential component-specific methods.  \n\nThe UK Biobank is the foundation of thousands of genome-wide association studies.  The Telomere-to-Telomere project produced the first gapless human reference genome.  Both of these resources will be explored using BiocHail.  Workshop attendees will acquire an understanding of Aaron Lun's [basilisk](https://bioconductor.org/packages/basilisk) package and its use in isolating specific collections of python modules, the anndata representations and scvi-tools analyses of CITE-seq data, and the hail.is approach to structuring and analyzing massive genetics data resources using Spark Resilient Distributed Data.  All programming will be carried out in R; quarto documents that mix R and python will also be illustrated.\n\n### Pre-requisites\n\n* Basic knowledge of R syntax\n* Interest in single-cell genomics, human genetics, deep learning\n\nIt will be helpful to have an acquaintance with \n\n* [a chapter of the OSCA book](http://bioconductor.org/books/3.17/OSCA.advanced/integrating-with-protein-abundance.html)\n* [an scviR vignette](https://bioconductor.org/packages/release/bioc/vignettes/scviR/inst/doc/citeseq_tut.html)\n* [a BiocHail vignette](https://bioconductor.org/packages/release/bioc/vignettes/BiocHail/inst/doc/gwas_tut.html)\n* [a look at BiocT2T](https://vjcitn.github.io/BiocT2T/) -- note that we will not work with the T2T 1KG extract in detail, as it involves a 40+ GB download, but the mechanics of working with it on your own will be explained\n\n### Participation\n\nThis is a 90 minute workshop that will cover\n\n- programming with basilisk to establish predictable python infrastructure and interoperation\n- exploration of torch-based tooling for single-cell analysis of a CITE-seq experiment\n- exploration of spark-based tooling for interaction with 1000 genomes genotypes (and, if time permits, UK Biobank phenotypes)\n\n### _R_ / _Bioconductor_ packages used\n\n- basilisk\n- OSCA.advanced\n- scviR\n- BiocHail\n\n### Time outline\n\n| Activity               | Time |\n|------------------------|------|\n|Verify setup            | 5m  |\n|Motivations/discussion  | 5m  |\n|basilisk and basilisk.utils | 10m |\n|  (concerns with bloat)  |  |\n|OSCA advanced: CITE-seq | 15m  |\n|break\t\t\t| 10 m |\n|scviR -- AnnData and tutorial VAE | 20m |\n|Hail: exploring 1KG and UKBB | 20m |\n|Review | 5m |\n\n### Workshop goals and objectives\n\n- Learning goals:\n    - understand basic issues with connecting R/Bioconductor to Python software tools for genomics\n    - relate aspects of the anndata class to aspects of SummarizedExperiment\n    - compare findings in a stepwise analysis of PBMC data in Bioconductor to those obtained\nwith findings from fitting an autoencoder to the same data in scvi-tools\n    - explore Hail's structures and methods for working with genotypes and phenotypes at scale\n\n\n\n- Learning objectives:\n    - Understand the user situation when basilisk manages python resources and usage\n        - review functions in basilisk.utils\n        - assess resource consumption (disk space used per version of basilisk/client package)\n        - process management: check ?getBasiliskFork\n    - Review an analysis of CITE-seq data on 6800 cells with Bioconductor in [OSCA advanced](http://bioconductor.org/books/3.17/OSCA.advanced/integrating-with-protein-abundance.html)\n        - review [ADT-based clustering and interpretation](http://bioconductor.org/books/3.17/OSCA.advanced/integrating-with-protein-abundance.html#clustering-and-interpretation)\n        - review [correlations between abundances of mRNA and surface proteins](http://bioconductor.org/books/3.17/OSCA.advanced/integrating-with-protein-abundance.html#finding-correlations-between-features)\n        - Examine the totalVI-based quantifications for similar findings\n            - use plotUMAP and adtProfiles\n            - understand graduated relationships between surface protein and mRNA abundance\n        - Assess the sensitivity of totalVI-based interpretations to details of autoencoder training\n    - Use BiocHail and spark\n        - examine an artificial GWAS with 1000 genomes genotypes and a fabricated phenotype\n        - understand how to use telomere-to-telomere variant calls with Hail\n\n\n\u003c!--\nAn example for a 45-minute workshop:\n\n| Activity                     | Time |\n|------------------------------|------|\n| Packages                     | 15m  |\n| Package Development          | 15m  |\n| Contributing to Bioconductor | 5m   |\n| Best Practices               | 10m  |\n\n### Workshop goals and objectives\n\nList \"big picture\" student-centered workshop goals and learning\nobjectives. Learning goals and objectives are related, but not the\nsame thing. These goals and objectives will help some people to decide\nwhether to attend the conference for training purposes, so please make\nthese as precise and accurate as possible.\n\n*Learning goals* are high-level descriptions of what\nparticipants will learn and be able to do after the workshop is\nover. *Learning objectives*, on the other hand, describe in very\nspecific and measurable terms specific skills or knowledge\nattained. The [Bloom's Taxonomy](#bloom) may be a useful framework\nfor defining and describing your goals and objectives, although there\nare others.\n\n### Learning goals\n\nSome examples:\n\n* describe how to...\n* identify methods for...\n* understand the difference between...\n\n### Learning objectives\n\n* analyze xyz data to produce...\n* create xyz plots\n* evaluate xyz data for artifacts\n\n## Workshop\n\nDivide the workshop into sections (`## A Section`). Include\nfully-evaluated _R_ code chunks. Develop exercises and solutions, and\nanticipate that your audience will walk through the code with you, or\nwork on the code idependently -- do not be too ambitious in the\nmaterial that you present.\n\n--\u003e\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvjcitn%2Fbiocpyinterop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvjcitn%2Fbiocpyinterop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvjcitn%2Fbiocpyinterop/lists"}