{"id":23757642,"url":"https://github.com/broadinstitute/celligner","last_synced_at":"2025-09-05T04:32:33.662Z","repository":{"id":62561313,"uuid":"310705223","full_name":"broadinstitute/celligner","owner":"broadinstitute","description":"tumor - cancer cell line alignment. Use it on the depmap portal or install it with pip.","archived":false,"fork":false,"pushed_at":"2023-12-05T19:30:59.000Z","size":31631,"stargazers_count":11,"open_issues_count":6,"forks_count":8,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-08-18T22:25:04.473Z","etag":null,"topics":["alignement","cancer","celligner","genomics","rnaseq","tumor"],"latest_commit_sha":null,"homepage":"https://cds.team/depmap/celligner/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/broadinstitute.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null},"funding":{"github":["broadinstitute"],"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-11-06T21:03:42.000Z","updated_at":"2025-03-01T12:01:50.000Z","dependencies_parsed_at":"2023-12-05T19:51:15.727Z","dependency_job_id":null,"html_url":"https://github.com/broadinstitute/celligner","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/broadinstitute/celligner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/broadinstitute%2Fcelligner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/broadinstitute%2Fcelligner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/broadinstitute%2Fcelligner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/broadinstitute%2Fcelligner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/broadinstitute","download_url":"https://codeload.github.com/broadinstitute/celligner/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/broadinstitute%2Fcelligner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273713308,"owners_count":25154607,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignement","cancer","celligner","genomics","rnaseq","tumor"],"created_at":"2024-12-31T19:49:50.363Z","updated_at":"2025-09-05T04:32:28.648Z","avatar_url":"https://github.com/broadinstitute.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/broadinstitute"],"categories":[],"sub_categories":[],"readme":"# Celligner\n\n![](docs/celligner_public22q2.png)\n\n__Celligner__ is a computational approach for aligning tumor and cell line transcriptional profiles.\n\nTo learn more, see the [paper](https://www.nature.com/articles/s41467-020-20294-x)\n\n## Remark\n\n__Celligner__ is initially an R project that you can find in the `R/` folder.\n\nA Python version was made that performs the same computations as the R version, but the results may differ slightly due to small implementation differences in the Louvain clustering and contrastive PCA steps.\n\n## Overview\n\nA **reference** expression dataset (e.g. CCLE cell lines) should be fit using the `fit()` function, and a **target** expression dataset (e.g. TCGA+ tumor samples) can then be aligned to this reference using the `transform()` function. See the `run_celligner.py` script for example usage. Celligner is unsupervised and does not require annotations to be run; as such they are not used in this version of the model but can be added post-hoc to aid in interpretation of the output. See the `celligner_output.ipynb` notebook for an example of how to draw an output UMAP.\n\nThe Celligner output can be explored at: [https://depmap.org/portal/celligner/](https://depmap.org/portal/celligner/)\n\n## Install\n\n\u003e To see the old R package installation instruction, see the `R/` folder.\n\nBefore running pip, make sure that you have R installed.\n\nTo install the latest version of Celligner in dev mode, run the following (note that Celligner requires the specific version of mnnpy that is associated with the repository as a submodule):\n\n```bash\ngit clone https://github.com/broadinstitute/celligner.git\ngit checkout new_dev\ncd celligner\npip install -e .\ncd mnnpy \npip install .\n```\n\nA dockerfile and build script is also provided.\n\n\n## Using Celligner\n\nCelligner has `fit()` and `transform()` functions in the style of scikit-learn models.\n\nA reference expression dataset (e.g. CCLE cell lines TPM expression) should first be fit:\n\n```python\nfrom celligner import Celligner\n\nmy_celligner = Celligner()\nmy_celligner.fit(CCLE_expression)\n```\n\nA target expression dataset (e.g. TCGA+ tumor samples) can then be aligned to this reference using the transform function:\n\n```python\nmy_celligner.transform(TCGA_expression)\n```\n\nThe combined transformed expression matrix can then be accessed via `my_celligner.combined_output`. Clusters, UMAP coordinates and tumor-model distances for all samples can be computed with `my_celligner.computeMetricsForOutput()`. There are also functions to save/load a fitted Celligner model as a .pkl file.\n\n### Aligning the target dataset to a new reference dataset\nThis use case is for the scenario where you want to align the same target dataset to a new reference dataset (which might be the same reference dataset as before with some new samples). In this case you can call transform without the target dataset to re-use the previous target dataset and skip re-doing some computation (see diagram below).\n\n```python\nmy_celligner.fit(new_reference_expression)\nmy_celligner.transform()\n```\n\n### Aligning a third dataset to the previous combined output\nThis use case is for the scenario where you have a third dataset (e.g. Met500 tumor samples), that you want to align the the previously aligned (e.g. CCLE+TCGA) dataset. This is the current approach for multi-dataset alignment taken by the Celligner app.\n\n```python\nmy_celligner.makeNewReference()\n# The value of k1 should be selected based on the size of the new dataset. \n# We use k=20 for Met500 (n=~850), and k1=10 for the PDX datasets (n=~250-450).\nmy_celligner.mnn_kwargs.update({\"k1\":20, \"k2\":50}) \nmy_celligner.transform(met500_TPM, compute_cPCs=False)\n```\n\n### Diagram \nThis diagram provides an overview of how Celligner works, including for the different use cases described above.\n\n![](docs/celligner_diagram.png)\n\n### Computational complexity\n\nDepending on the dataset, Celligner can be quite memory hungry.\nFor TCGA, expect at least _50-60Gb_ of memory to be used. You might need a powerfull computer, lots of _swap_ and to increase R's default _maximum allowed memory_.\n\nYou can also use the `low_memory=True` option to reduce the memory used by Celligner in the memory intensive `PCA` \u0026 `cPCA` methods.\n\n\n# R Celligner\n\nFor the original R version of celligner, please check the R/README.md file here: [https://github.com/broadinstitute.org/celligner/tree/master/R/README.md](https://github.com/broadinstitute.org/celligner/tree/master/R/README.md)\n\n---\n\n__Initial project:__\n\nAllie Warren @awarren\n\n__Initial python version:__\n\nJérémie Kalfon @jkobject\n\n__Current maintainer:__\n\nBarbara De Kegel @bdekegel\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbroadinstitute%2Fcelligner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbroadinstitute%2Fcelligner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbroadinstitute%2Fcelligner/lists"}