{"id":29625254,"url":"https://github.com/novartis/mober","last_synced_at":"2025-07-21T06:07:46.338Z","repository":{"id":59769954,"uuid":"537207161","full_name":"Novartis/MOBER","owner":"Novartis","description":"Multi-omics batch effect remover method","archived":false,"fork":false,"pushed_at":"2023-01-31T23:29:03.000Z","size":360,"stargazers_count":7,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-04-16T11:10:09.772Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Novartis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-15T21:09:08.000Z","updated_at":"2023-07-19T18:56:05.000Z","dependencies_parsed_at":"2023-02-16T23:55:15.175Z","dependency_job_id":null,"html_url":"https://github.com/Novartis/MOBER","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Novartis/MOBER","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2FMOBER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2FMOBER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2FMOBER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2FMOBER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Novartis","download_url":"https://codeload.github.com/Novartis/MOBER/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Novartis%2FMOBER/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266248501,"owners_count":23899056,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-07-21T06:07:45.800Z","updated_at":"2025-07-21T06:07:46.313Z","avatar_url":"https://github.com/Novartis.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![img](asset/MOBER_logo.png)\n\n**MOBER** (\u003cu\u003eM\u003c/u\u003eulti \u003cu\u003eO\u003c/u\u003erigin \u003cu\u003eB\u003c/u\u003eatch \u003cu\u003eE\u003c/u\u003effect \u003cu\u003eR\u003c/u\u003eemover) is a deep learning-based method that performs biologically relevant integration of transcriptional profiles from pre-clinical models and clinical tumors. MOBER can be used to guide the selection of cell lines and patient-derived xenografts and identify models that more closely resemble clinical tumors. We applied MOBER on transcriptional profiles from 932 cancer cell lines, 442 patient-derived xenografts and 11205 clinical tumors and identified pre-clinical models with greatest transcriptional fidelity to clinical tumors, and models that are transcriptionally unrepresentative of their respective clinical tumors. MOBER is interpretable by design, therefore allowing drug hunters to better understand the underlying biological differences between models and patients that are responsible for the observed lack of clinical translatability. \nMOBER can remove batch effects between any transcriptomics datasets of different origin while conserving relevant biological signals.\n\n \n![img](asset/MOBER_model.png)\n\nSee our latest [manuscript](https://doi.org/10.1101/2022.09.07.506964) and check our [web app](https://mober.pythonanywhere.com/) where the aligned data on cancer cell lines, patient-derived xenografts and clinical tumors can be explored interactively. \n  \n  \n### Installing MOBER\n1. cuda and pytorch\nFind cuda available cuda version with `module avail cuda`. Install [Pytorch](https://pytorch.org/) according the the latest cuda version you found. \n\n2. Install mober\n```linux\ngit clone https://github.com/Novartis/mober.git\ncd mober\npip install -e .\n```\n\nCheck if it is successfully installed: run `mober --help` in the terminal from any directories. \n  \n  \n### 1. Preparing input h5ad file for training\nThe input file should be in [anndata](https://anndata.readthedocs.io/en/latest/) format and saved as h5ad. In the file, the column \"**data_source**\" that specifies the batch ID of samples in the sample annotation `.obs` is **required**. The h5ad file can be generated in two ways:\n\n##### 1.1 For R users:\n```R\nSave a seurat obj to h5ad, with 'data_source' as a column in meta\n```\n\n##### 1.2 For Python users:\n```python\nimport scanpy as sc\nfrom scipy.sparse import csr_matrix\n# X, expression matrix, samples x genes\n# sampInfo, pd.DataFrame, with 'data_source' as one of the columns, and sample IDs as index\n# geneInfo, pd.DataFrame, with gene ids as index\n# X, sampInfo, geneInfo should be matched, in terms of sample order and gene order.\nadata = sc.AnnData(csr_matrix(X),obs=sampInfo,var=geneInfo)\nadata.write('name.h5ad')\n```\n  \n  \n### 2. Train MOBER\n```linux\nmober train \\\n--train_file input.h5ad \\\n--output_dir ../tmp_data/test\n```\nIn this case, the trained model will be in `../tmp_data/test/models` and the training metrics and parameters used for training are in `../tmp_data/test/metrics`, in tsv format.\n\n  \n### 3. Do projection\nOnce the model is trained, the projection can be done in two different ways:\n#### 3.1. through command line\n```linux\nmober projection \\\n--model_dir path_to_where_models_and_metrics_folders_are/models \\\n--onto TCGA \\#  should be one of batch IDs used in training.\n--projection_file input.h5ad \\\n--output_file outname.h5ad \\\n--decimals 4\n```\n\n#### 3.2 within python scripts, as projection step is fast and does not need GPU.\n```python\nfrom mober.core.projection import load_model, do_projection\nimport scanpy as sc\nimport torch\n\nmodel_dir = 'path_to_where_models_and_metrics_folders_are/models'\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\nadata = sc.read('projection_file.h5ad')\nmodel, features, label_encode = load_model(model_dir, device)\nadata = adata[:,features]\n\nproj_adata, z_adata = do_projection(model,adata, onto, label_encode, device, batch_size=1600)\nproj_adata.write('outname.h5ad')\n\n# proj_adata contains the projected values.\n# z_adata contains the sample embeddings in the latent space\n\n```\n  \n  \n### Get help about input arguments\n1. Train\n```linux\nmober train --help\n```\n\n2. Projection\n```linux\nmober projection --help\n```\n  \n  \n### Use GPU on HPC, minimal script\nCopy and modify following content in a text file, e.g. `sub.sh`, then run `qsub sub.sh` to submit the job to HPC.\n```linux\n#!/bin/bash\n#$ -cwd\n#$ -S /bin/bash\n#$ -l m_mem_free=32G\n#$ -l h_rt=24:00:00\n#$ -l gpu_card=4\n#$ -m e\n#$ -M your@email.com\n#$ -N mober\n#$ -o running.log\n#$ -e error.log\n#$ -V\n#$ -b n\n\n\nconda activate yourENV\nmodule module load cuda10.2/fft/10.2.89  # Found by module avail cuda\n\nmober train \\\n--train_file path/to/your/input.h5ad \\\n--output_dir output_path\n\n```\n\n## License\n\nThis project is licensed under the terms of MIT License.  \nCopyright 2022 Novartis International AG.\n\n    \n## Reference\n\nIf you use MOBER in your research, please consider citing our [manuscript](https://doi.org/10.1101/2022.09.07.506964),\n\n```\n@article {Dimitrieva2022.09.07.506964,\n\tauthor = {Dimitrieva, Slavica and Janssens, Rens and Li, Gang and Szalata, Artur and Gopal, Raja and Parmar, Chintan and Kauffmann, Audrey and Durand, Eric Y.},\n\ttitle = {Biologically relevant integration of transcriptomics profiles from cancer cell lines, patient-derived xenografts and clinical tumors using deep learning},\n\telocation-id = {2022.09.07.506964},\n\tyear = {2022},\n\tdoi = {10.1101/2022.09.07.506964},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/10.1101/2022.09.07.506964v2,\n\teprint = {https://www.biorxiv.org/content/10.1101/2022.09.07.506964v2.full.pdf},\n\tjournal = {bioRxiv}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnovartis%2Fmober","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnovartis%2Fmober","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnovartis%2Fmober/lists"}