{"id":29227034,"url":"https://github.com/royerlab/cytoself","last_synced_at":"2025-07-03T09:09:01.543Z","repository":{"id":43811037,"uuid":"361991448","full_name":"royerlab/cytoself","owner":"royerlab","description":"Self-supervised models for encoding protein localization patterns from microscopy images","archived":false,"fork":false,"pushed_at":"2024-09-13T22:56:37.000Z","size":52648,"stargazers_count":76,"open_issues_count":9,"forks_count":15,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-22T16:18:50.284Z","etag":null,"topics":["autoencoder","deep-learning","fluorescence","imaging","microscopy","opencell","protein","pytorch","self-supervised","self-supervised-learning","tensorflow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/royerlab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-27T05:29:13.000Z","updated_at":"2025-04-04T03:07:34.000Z","dependencies_parsed_at":"2023-02-01T05:16:14.209Z","dependency_job_id":"2cd88d08-c67f-4b17-b715-0950aae6736f","html_url":"https://github.com/royerlab/cytoself","commit_stats":{"total_commits":80,"total_committers":2,"mean_commits":40.0,"dds":"0.050000000000000044","last_synced_commit":"666e54de80ec3c3f1758edb8ae37e7d736a7654c"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/royerlab/cytoself","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fcytoself","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fcytoself/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fcytoself/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fcytoself/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/royerlab","download_url":"https://codeload.github.com/royerlab/cytoself/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royerlab%2Fcytoself/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263229260,"owners_count":23434013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autoencoder","deep-learning","fluorescence","imaging","microscopy","opencell","protein","pytorch","self-supervised","self-supervised-learning","tensorflow"],"created_at":"2025-07-03T09:09:00.884Z","updated_at":"2025-07-03T09:09:01.526Z","avatar_url":"https://github.com/royerlab.png","language":"Python","readme":"# cytoself\n\n[![Python 3.9](https://img.shields.io/badge/python-3.9-blue.svg)](https://www.python.org/downloads/release/python-397/)\n[![DOI](https://img.shields.io/badge/DOI-10.1038%2Fs41592--022--01541--z-%23403075)](https://doi.org/10.1038/s41592-022-01541-z)\n[![DOI](https://zenodo.org/badge/361991448.svg)](https://zenodo.org/doi/10.5281/zenodo.13761237)\n[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg)](https://opensource.org/licenses/BSD-3-Clause)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)\n[![codecov](https://codecov.io/gh/royerlab/cytoself_pytorch/branch/main/graph/badge.svg?token=2SMIDRRC5L)](https://codecov.io/gh/royerlab/cytoself)\n[![Tests](https://github.com/royerlab/cytoself/actions/workflows/pytest-codecov-conda.yml/badge.svg)](https://github.com/royerlab/cytoself/actions/workflows/pytest-codecov-conda.yml)\n\ncytoself in pytorch implementation. \nThe original cytoself implemented in tensorflow is archived in the branch [cytoself-tensorflow](https://github.com/royerlab/cytoself/tree/cytoself-tensorflow).\n\n**Note: Branch names have been changed.** `cytoself-pytorch` -\u003e `main`, the previous `main` -\u003e `cytoself-tensorflow`.\n\n\n![Rotating_3DUMAP](images/3DUMAP.gif)\n\ncytoself is a self-supervised platform for learning features of protein subcellular localization from microscopy \nimages [[1]](https://www.nature.com/articles/s41592-022-01541-z).\nThe representations derived from cytoself encapsulate highly specific features that can derive functional insights for \nproteins on the sole basis of their localization.\n\nApplying cytoself to images of endogenously labeled proteins from the recently released \n[OpenCell](https://opencell.czbiohub.org) database creates a highly resolved protein localization atlas\n[[2]](https://www.science.org/doi/10.1126/science.abi6983). \n\n[1] Kobayashi, Hirofumi, _et al._ \"Self-Supervised Deep-Learning Encodes High-Resolution Features of Protein \nSubcellular Localization.\" _Nature Methods_ (2022).\nhttps://www.nature.com/articles/s41592-022-01541-z \u003cbr /\u003e\n[2] Cho, Nathan H., _et al._ \"OpenCell: Endogenous tagging for the cartography of human cellular organization.\" \n_Science_ 375.6585 (2022): eabi6983.\nhttps://www.science.org/doi/10.1126/science.abi6983\n\n\n## How cytoself works\ncytoself uses images (cell images where only single type of protein is fluorescently labeled) and its identity \ninformation (protein ID) as a label to learn the localization patterns of proteins.\n\n\n![Workflow_diagram](images/workflow.jpg)\n\n\n## Installation\nRecommended: create a new environment and install cytoself on the environment from pypi\n\n(Optional) To run cytoself on GPUs, it is recommended to install pytorch GPU version before installing cytoself \nfollowing the [official instruction](https://pytorch.org/get-started/locally/). The way to install pytorch GPU may vary upon your OS and CUDA version.\n```shell script\nconda create -y -n cytoself python=3.9\nconda activate cytoself\n# (Optional: Install pytorch GPU following the official instruction)\npip install -e .\n```\n\n### (For the developers) Install from this repository\nInstall development dependencies\n\n```bash\npip install -r requirements/development.txt\npre-commit install\n```\n\n\n## How to use cytoself on the sample data \nDownload one set of the image and label data from [Data Availability](#data-availability).\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/royerlab/cytoself/blob/main/example_scripts/simple_example.ipynb)\nis available.\n\n\n### 1. Prepare Data\n\n```python\nfrom cytoself.datamanager.opencell import DataManagerOpenCell\n\ndata_ch = ['pro', 'nuc']\ndatapath = 'sample_data'  # path to download sample data\nDataManagerOpenCell.download_sample_data(datapath)  # donwload data\ndatamanager = DataManagerOpenCell(datapath, data_ch, fov_col=None)\ndatamanager.const_dataloader(batch_size=32, label_name_position=1)\n```\nA folder, `sample_data`, will be created and sample data will be downloaded to this folder.\nThe `sample_data` folder will be created in the \"current working directory,\" which is where you are running the code. \nUse `os.getcwd()` to check where the current working directory is.\n\n9 sets of data with 4 files for each protein (in total 36 files) will be downloaded. \nThe file name is in the form of `\u003cprotein_name\u003e_\u003cchannel or label\u003e.npy`.  \n\n* **`*_label.npy` file**:\nContains label information in 3 columns, i.e. Ensembl ID, protein name and localization.\n* **`*_pro.npy` file**:\nImage data of protein channel. Size 100x100. Images were cropped with nucleus being centered \n(see details in [paper](https://doi.org/10.1038/s41592-022-01541-z)).\n* **`*_nuc.npy` file**:\nImage data of nucleus channel. Size 100x100. Images were cropped with nucleus being centered \n(see details in [paper](https://doi.org/10.1038/s41592-022-01541-z)).\n* **`*_nucdist.npy` file**:\nData of nucleus distance map. Size 100x100. Images were cropped with nucleus being centered \n(see details in [paper](https://doi.org/10.1038/s41592-022-01541-z)).\n\n\n### 2. Create and train a cytoself model\n\n```python\nfrom cytoself.trainer.cytoselflite_trainer import CytoselfFullTrainer\n\nmodel_args = {\n    'input_shape': (2, 100, 100),\n    'emb_shapes': ((25, 25), (4, 4)),\n    'output_shape': (2, 100, 100),\n    'fc_output_idx': [2],\n    'vq_args': {'num_embeddings': 512, 'embedding_dim': 64},\n    'num_class': len(datamanager.unique_labels),\n    'fc_input_type': 'vqvec',\n}\ntrain_args = {\n    'lr': 1e-3,\n    'max_epoch': 1,\n    'reducelr_patience': 3,\n    'reducelr_increment': 0.1,\n    'earlystop_patience': 6,\n}\ntrainer = CytoselfFullTrainer(train_args, homepath='demo_output', model_args=model_args)\ntrainer.fit(datamanager, tensorboard_path='tb_logs')\n```\n\n### 3. Plot UMAP\n```python\nfrom cytoself.analysis.analysis_opencell import AnalysisOpenCell\n\nanalysis = AnalysisOpenCell(datamanager, trainer)\numap_data = analysis.plot_umap_of_embedding_vector(\n    data_loader=datamanager.test_loader,\n    group_col=2,\n    output_layer=f'{model_args[\"fc_input_type\"]}2',\n    title=f'UMAP {model_args[\"fc_input_type\"]}2',\n    xlabel='UMAP1',\n    ylabel='UMAP2',\n    s=0.3,\n    alpha=0.5,\n    show_legend=True,\n)\n```\nThe output UMAP plot will be saved at `demo_output/analysis/umap_figures/UMAP_vqvec2.png` by default.\n\n![Result_UMAP](images/UMAP_vqvec2.png)\n\n\n### 4. Plot feature spectrum\n```python\n# Compute bi-clustering heatmap\nanalysis.plot_clustermap(num_workers=4)\n\n# Prepare image data\nimg = next(iter(datamanager.test_loader))['image'].detach().cpu().numpy()[:1]\n\n# Compute index histogram\nvqindhist1 = trainer.infer_embeddings(img, 'vqindhist1')\n\n# Reorder the index histogram according to the bi-clustering heatmap\nft_spectrum = analysis.compute_feature_spectrum(vqindhist1)\n\n# Generate a plot\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nx_max = ft_spectrum.shape[1] + 1\nx_ticks = np.arange(0, x_max, 50)\nfig, ax = plt.subplots(figsize=(10, 3))\nax.stairs(ft_spectrum[0], np.arange(x_max), fill=True)\nax.spines[['right', 'top']].set_visible(False)\nax.set_xlabel('Feature index')\nax.set_ylabel('Counts')\nax.set_xlim([0, x_max])\nax.set_xticks(x_ticks, analysis.feature_spectrum_indices[x_ticks])\nfig.tight_layout()\nfig.show()\n```\n\n## Tested Environments\n\nRocky Linux 8.6, NVIDIA A100, CUDA 11.7 (GPU)\u003cbr/\u003e\nUbuntu 20.04.3 LTS, NVIDIA 3090, CUDA 11.4 (GPU)\u003cbr/\u003e\nUbuntu 22.04.3 LTS, NVIDIA 4090, CUDA 12.2 (GPU)\n\n\n## Known Issues\nThere seems to be compatibility issues of python multiprocessing on Windows, \ncausing a DataLoader unable to load data ([issue](https://github.com/royerlab/cytoself/issues/32), [issue](https://github.com/royerlab/cytoself/issues/33)). \nPlease try [the temporal workaround](https://github.com/royerlab/cytoself/issues/32#issuecomment-1815910434).\n\n\n## Data Availability\nThe full data used in this work can be found here.\nThe image data have the shape of `[batch, 100, 100, 4]`, in which the last channel dimension corresponds to `[target \nprotein, nucleus, nuclear distance, nuclear segmentation]`.\n\nDue to the large size, the whole data is split to 10 files. The files are intended to be concatenated together to \nform one large numpy file or one large csv.\n\n[Image_data00.npy](https://drive.google.com/file/d/15_CHBPT-p5JG44acP6D2hKd8jAacZatp/view?usp=sharing)  \n[Image_data01.npy](https://drive.google.com/file/d/1m7Cj2OALiZTIiHpvb9zFPG_I3j1wRnzK/view?usp=sharing)  \n[Image_data02.npy](https://drive.google.com/file/d/17nknzqlcYO3n9bAe4FwGVPkU-mJAhQ4j/view?usp=sharing)  \n[Image_data03.npy](https://drive.google.com/file/d/1vEsddF68dyOda-hwI-ptAL4vShBGl98Y/view?usp=sharing)  \n[Image_data04.npy](https://drive.google.com/file/d/1aB7WaRuhobG_IDl0l_PPeSJAxCYy-Pye/view?usp=sharing)  \n[Image_data05.npy](https://drive.google.com/file/d/1qb0waKcLprDtuFAdCec3WegWkmd-U45A/view?usp=sharing)  \n[Image_data06.npy](https://drive.google.com/file/d/1y-1vlfZ4eNhvTvpuqTZVL8DvSwYX3CH_/view?usp=sharing)  \n[Image_data07.npy](https://drive.google.com/file/d/1ejcPdh-d5lB1OcZ6x8SJx61pEUioZvB2/view?usp=sharing)  \n[Image_data08.npy](https://drive.google.com/file/d/1DOicAkruNsU5F4DWLzO2QrV6xU4kuVxs/view?usp=sharing)  \n[Image_data09.npy](https://drive.google.com/file/d/1a5YyHeRSRdJStG3KnFe2vsNjrsit9zbf/view?usp=sharing)  \n[Label_data00.csv](https://drive.google.com/file/d/1CVwvXW2KhVBbTBixwRXIIiMhrlGDXz-4/view?usp=sharing)  \n[Label_data01.csv](https://drive.google.com/file/d/1mTYe5icvWXNfY5wEsuQUhSwgtefBJpjg/view?usp=sharing)  \n[Label_data02.csv](https://drive.google.com/file/d/1HckmktklyPo6qbakrwtERsCT34mRdn7l/view?usp=sharing)  \n[Label_data03.csv](https://drive.google.com/file/d/1GBxDmWcl_o49i4lGujA8EgIn5G4htkBr/view?usp=sharing)  \n[Label_data04.csv](https://drive.google.com/file/d/1G4FpJnlqB3ejmdw3SF2w3DFYt8Wnq0fT/view?usp=sharing)  \n[Label_data05.csv](https://drive.google.com/file/d/1Vo1J09qP2TAoXwltCF84socz2TPV92JU/view?usp=sharing)  \n[Label_data06.csv](https://drive.google.com/file/d/1d7gJjLTQhOw-e9KZJY9pr6KOCIN8NBvp/view?usp=sharing)  \n[Label_data07.csv](https://drive.google.com/file/d/1kr5EF0RA3ZwSXmoaBFwFDVnrokh2EaOE/view?usp=sharing)  \n[Label_data08.csv](https://drive.google.com/file/d/1mXyedmLezzty2LSSH3asw0LQeu-ie9mz/view?usp=sharing)  \n[Label_data09.csv](https://drive.google.com/file/d/1Vdv1cD75VhvC3FdKTen-5rqLJnWpHvmb/view?usp=sharing)  \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyerlab%2Fcytoself","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froyerlab%2Fcytoself","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyerlab%2Fcytoself/lists"}