{"id":20752817,"url":"https://github.com/kpj/rwrap","last_synced_at":"2025-06-18T16:37:57.182Z","repository":{"id":40946842,"uuid":"209049360","full_name":"kpj/rwrap","owner":"kpj","description":"Seamlessly integrate R packages into Python.","archived":false,"fork":false,"pushed_at":"2023-04-12T06:09:47.000Z","size":84,"stargazers_count":21,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-30T20:21:35.210Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kpj.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-09-17T12:34:54.000Z","updated_at":"2023-12-13T17:45:24.000Z","dependencies_parsed_at":"2024-11-17T23:15:26.280Z","dependency_job_id":null,"html_url":"https://github.com/kpj/rwrap","commit_stats":{"total_commits":107,"total_committers":3,"mean_commits":"35.666666666666664","dds":0.2429906542056075,"last_synced_commit":"e40a1c90d478cb4eeb8c9e5b1596f11988e45af0"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/kpj/rwrap","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpj%2Frwrap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpj%2Frwrap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpj%2Frwrap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpj%2Frwrap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kpj","download_url":"https://codeload.github.com/kpj/rwrap/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kpj%2Frwrap/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260590439,"owners_count":23033060,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-17T08:42:50.452Z","updated_at":"2025-06-18T16:37:52.165Z","avatar_url":"https://github.com/kpj.png","language":"Python","readme":"# rwrap\n\n[![PyPI](https://img.shields.io/pypi/v/rwrap.svg?style=flat)](https://pypi.python.org/pypi/rwrap)\n[![Tests](https://github.com/kpj/rwrap/actions/workflows/main.yml/badge.svg)](https://github.com/kpj/rwrap/actions/workflows/main.yml)\n\nA thin wrapper around [rpy2](https://rpy2.github.io/doc/latest/html/index.html) with strong opinions on how data types should be converted. This enables easy usage of R packages from Python with no boilerplate code.\n\n\u003e Warning: still work-in-progress, issues and PRs welcome\n\n\n## Installation\n\n```bash\npip install rwrap\n```\n\n\n## Usage\n\n### Genomic Annotations\n\nAccessing Bioconductor's [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html) package can be as simple as follows:\n```python\nfrom rwrap import biomaRt\n\nbiomaRt\n## \u003cmodule 'biomaRt' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/biomaRt'\u003e\n\nsnp_list = [\"rs7329174\", \"rs4948523\", \"rs479445\"]\nensembl = biomaRt.useMart(\"ENSEMBL_MART_SNP\", dataset=\"hsapiens_snp\")\n\ndf = biomaRt.getBM(\n    attributes=[\"refsnp_id\", \"chr_name\", \"chrom_start\", \"consequence_type_tv\"],\n    filters=\"snp_filter\", values=snp_list, mart=ensembl\n)\n\nprint(df)  # pandas.DataFrame\n##    refsnp_id  chr_name  chrom_start     consequence_type_tv\n## 1   rs479445         1     60875960          intron_variant\n## 2   rs479445         1     60875960  NMD_transcript_variant\n## 3  rs4948523        10     58579338          intron_variant\n## 4  rs7329174        13     40983974          intron_variant\n```\n\n### Differential Gene Expression analysis workflow\n\nDifferentially expressed genes between conditions can be determined using [DESeq2](https://bioconductor.org/packages/release/bioc/html/DESeq2.html) and annotated with [biomaRt](https://bioconductor.org/packages/release/bioc/html/biomaRt.html):\n\n```python\nimport pandas as pd\nfrom rwrap import DESeq2, biomaRt, base, stats\n\nDESeq2\n## \u003cmodule 'DESeq2' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/DESeq2'\u003e\nbiomaRt\n## \u003cmodule 'biomaRt' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/biomaRt'\u003e\n\n# retrieve count data (https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP009615)\ndf_counts = pd.read_csv(\n    \"http://duffel.rail.bio/recount/v2/SRP009615/counts_gene.tsv.gz\", sep=\"\\t\"\n).set_index(\"gene_id\")\ndf_design = pd.DataFrame(\n    {\"condition\": [\"1\", \"2\", \"1\", \"2\", \"3\", \"4\", \"3\", \"4\", \"5\", \"6\", \"5\", \"6\"]},\n    index=df_counts.columns\n)\n\n# run differential gene expression analysis\ndds = DESeq2.DESeqDataSetFromMatrix(\n    countData=df_counts, colData=df_design, design=stats.as_formula(\"~ condition\")\n)\ndds = DESeq2.DESeq(dds)\n\nres = DESeq2.results(dds, contrast=(\"condition\", \"1\", \"2\"))\ndf_res = base.as_data_frame(res)\n\n# annotate result\nensembl = biomaRt.useEnsembl(biomart=\"genes\", dataset=\"hsapiens_gene_ensembl\")\ndf_anno = biomaRt.getBM(\n    attributes=[\"ensembl_gene_id_version\", \"gene_biotype\"],\n    filters=\"ensembl_gene_id_version\",\n    values=df_res.index,\n    mart=ensembl,\n).set_index(\"ensembl_gene_id_version\")\n\ndf_res = df_res.merge(df_anno, left_index=True, right_index=True).sort_values(\"padj\")\nprint(df_res.head())  # pd.DataFrame\n##                      baseMean  log2FoldChange     lfcSE      stat        pvalue          padj          gene_biotype\n## ENSG00000222806.1  158.010377       22.137400  2.745822  8.062214  7.492501e-16  2.853744e-11       rRNA_pseudogene\n## ENSG00000255099.1   65.879611       21.835651  2.915452  7.489627  6.906949e-14  1.315359e-09  processed_pseudogene\n## ENSG00000261065.1   92.351998       22.273400  3.144991  7.082182  1.419019e-12  1.351190e-08                lncRNA\n## ENSG00000249923.1  154.037908       18.364027  2.636083  6.966407  3.251381e-12  2.476772e-08                lncRNA\n## ENSG00000267658.1   64.371181      -19.545702  3.041247 -6.426871  1.302573e-10  8.268736e-07                lncRNA\n```\n\n### Geneset Enrichment Analysis\n\nGeneset enrichment analyses can be conducted using [clusterProfiler](https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html):\n\n```python\nfrom rwrap import clusterProfiler, base\n\nclusterProfiler\n## \u003cmodule 'clusterProfiler' from '/Library/Frameworks/R.framework/Versions/4.1/Resources/library/clusterProfiler'\u003e\n\ngenelist = [8318, 991, 9133, 890, 983, 4085, 7272, 1111, 891, 4174, 9232]\n\nres = clusterProfiler.enrichKEGG(gene=genelist, organism=\"hsa\", pvalueCutoff=0.05)\ndf = base.as_data_frame(res)\n\nprint(df.head())  # pd.DataFrame\n##                 ID                              Description GeneRatio   BgRatio        pvalue      p.adjust        qvalue                                             geneID  Count\n## hsa04110  hsa04110                               Cell cycle     11/11  126/8115  8.124144e-21  1.462346e-19  6.841384e-20  8318/991/9133/890/983/4085/7272/1111/891/4174/...     11\n## hsa04114  hsa04114                           Oocyte meiosis      6/11  131/8115  6.823856e-09  6.141470e-08  2.873202e-08                         991/9133/983/4085/891/9232      6\n## hsa04914  hsa04914  Progesterone-mediated oocyte maturation      5/11  102/8115  1.237164e-07  7.266746e-07  3.399647e-07                              9133/890/983/4085/891      5\n## hsa05166  hsa05166  Human T-cell leukemia virus 1 infection      6/11  222/8115  1.614832e-07  7.266746e-07  3.399647e-07                        991/9133/890/4085/1111/9232      6\n## hsa04218  hsa04218                      Cellular senescence      5/11  156/8115  1.036418e-06  3.731103e-06  1.745545e-06                              9133/890/983/1111/891      5\n```\n\n### More examples\n\nCheck the `tests/scripts` directory for more examples showing how to rewrite R scripts in Python.\n\n\n## Tests\n\nA comprehensive test suite aims at providing stability and avoiding regressions.\nThe examples in `tests/` are validated using `pytest`.\n\nRun tests as follows:\n\n```bash\n$ pytest tests/\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkpj%2Frwrap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkpj%2Frwrap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkpj%2Frwrap/lists"}