{"id":19091931,"url":"https://github.com/allencellmodeling/cyto-dl","last_synced_at":"2025-10-23T20:45:43.100Z","repository":{"id":202682184,"uuid":"559042540","full_name":"AllenCellModeling/cyto-dl","owner":"AllenCellModeling","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-26T20:43:24.000Z","size":31636,"stargazers_count":28,"open_issues_count":15,"forks_count":8,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-09-26T21:24:39.644Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://allencellmodeling.github.io/cyto-dl/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AllenCellModeling.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"licenses/LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-10-28T22:34:48.000Z","updated_at":"2025-09-26T20:52:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"bd3d8691-15c4-434a-84c4-3ff74e5851ca","html_url":"https://github.com/AllenCellModeling/cyto-dl","commit_stats":null,"previous_names":["allencellmodeling/cyto-dl"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/AllenCellModeling/cyto-dl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcyto-dl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcyto-dl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcyto-dl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcyto-dl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AllenCellModeling","download_url":"https://codeload.github.com/AllenCellModeling/cyto-dl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcyto-dl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":280689762,"owners_count":26374157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-23T02:00:06.710Z","response_time":142,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T03:17:35.394Z","updated_at":"2025-10-23T20:45:43.095Z","avatar_url":"https://github.com/AllenCellModeling.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003c!-- \u003cpicture\u003e\n  \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/AllenCellModeling/cyto-dl/blob/b73e6f357727e3b42adea8540c86f2475ea60379/docs/CytoDL-logo-1C-onDark.png\"\u003e\n  \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/AllenCellModeling/cyto-dl/blob/b73e6f357727e3b42adea8540c86f2475ea60379/docs/CytoDL-logo-1C-onLight.png\"\u003e\n  \u003cimg src=\"https://github.com/AllenCellModeling/cyto-dl/blob/b73e6f357727e3b42adea8540c86f2475ea60379/docs/CytoDL-logo-1C-onLight.png\"\u003e\n\u003c/picture\u003e --\u003e\n\n\u003ch1\u003eCytoDL\u003c/h1\u003e\n\n\u003ca href=\"https://pytorch.org/get-started/locally/\"\u003e\u003cimg alt=\"PyTorch\" src=\"https://img.shields.io/badge/PyTorch-ee4c2c?logo=pytorch\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pytorchlightning.ai/\"\u003e\u003cimg alt=\"Lightning\" src=\"https://img.shields.io/badge/-Lightning-792ee5?logo=pytorchlightning\u0026logoColor=white\"\u003e\u003c/a\u003e\n\u003ca href=\"https://hydra.cc/\"\u003e\u003cimg alt=\"Config: Hydra\" src=\"https://img.shields.io/badge/Config-Hydra-89b8cd\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/ashleve/lightning-hydra-template\"\u003e\u003cimg alt=\"Template\" src=\"https://img.shields.io/badge/-Lightning--Hydra--Template-017F2F?style=flat\u0026logo=github\u0026labelColor=gray\"\u003e\u003c/a\u003e\u003cbr\u003e\n\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"https://github.com/AllenCellModeling/cyto-dl/blob/acf7dad69f492c417b0e486f8f08c19f25575927/docs/CytoDL-overview_dark_1.png\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"https://github.com/AllenCellModeling/cyto-dl/blob/acf7dad69f492c417b0e486f8f08c19f25575927/docs/CytoDL-overview_light_1.png\"\u003e\n    \u003cimg src=\"https://github.com/AllenCellModeling/cyto-dl/blob/acf7dad69f492c417b0e486f8f08c19f25575927/docs/CytoDL-overview_light_1.png\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n## Description\n\nAs part of the [Allen Institute for Cell Science's](https://allencell.org) mission to understand the principles by which human induced pluripotent stem cells establish and maintain robust dynamic localization of cellular structure, `CytoDL` aims to unify deep learning approaches for understanding 2D and 3D biological data as images, point clouds, and tabular data.\n\nThe bulk of `CytoDL`'s underlying structure bases the [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template) organization - we highly recommend that you familiarize yourself with their (short) docs for detailed instructions on running training, overrides, etc.\n\nOur currently available code is roughly split into two domains: image-to-image transformations and representation learning. The image-to-image code (denoted im2im) contains configuration files detailing how to train and predict using models for resolution enhancement using conditional GANs (e.g. predicting 100x images from 20x images), semantic and instance segmentation, and label-free prediction. We also provide configs for Masked Autoencoder (MAE) and Joint Embedding Prediction Architecture ([JEPA](https://github.com/facebookresearch/jepa)) pretraining on 2D and 3D images using a Vision Transformer (ViT) backbone and for training segmentation decoders from these pretrained features. Representation learning code includes a wide variety of Variational Auto Encoder (VAE) architectures and contrastive learning methods such as [VICReg](https://github.com/facebookresearch/vicreg). Due to dependency issues, equivariant autoencoders are not currently supported on Windows.\n\nAs we rely on recent versions of pytorch, users wishing to train and run models on GPU hardware will need up-to-date NVIDIA drivers. Users with older GPUs should not expect code to work out of the box. Similarly, we do not currently support training/predicting on Mac GPUs. In most cases, cpu-based training should work when GPU training fails.\n\nFor im2im models, we provide a handful of example 3D images for training the basic image-to-image tranformation-type models and default model configuration files for users to become comfortable with the framework and prepare them for training and applying these models on their own data. Note that these default models are very small and train on heavily downsampled data in order to make tests run efficiently - for best performance, the model size should be increased and downsampling removed from the data configuration.\n\n## How to run\n\nInstall dependencies.\n\n```bash\n# clone project\ngit clone https://github.com/AllenCellModeling/cyto-dl\ncd cyto-dl\n\n# [OPTIONAL] create conda environment\nconda install -c conda-forge -n myenv python=3.10 fortran-compiler blas-devel\nconda activate myenv\n\n# If you have a recent version of pip (e.g., 25.0.1), the --no-deps may be unnecessary\npip install --no-deps -r requirements/requirements.txt\n\n# [OPTIONAL] install extra dependencies - equivariance related\npip install --no-deps -r requirements/equiv-requirements.txt\n\npip install -e .\n\n\n#[OPTIONAL] if you want to use default experiments on example data\npython scripts/download_test_data.py\n```\n\n### API\n\n```python\nfrom cyto_dl.api import CytoDLModel\n\nmodel = CytoDLModel()\nmodel.download_example_data()\nmodel.load_default_experiment(\"segmentation\", output_dir=\"./output\", overrides=[\"trainer=cpu\"])\nmodel.print_config()\nmodel.train()\n\n# [OPTIONAL] async training\nawait model.train(run_async=True)\n```\n\nMost models work by passing data paths in the data config. For training or predicting on datasets that are already in memory, you can pass the data directly to the model. Note that this use case is primarily for programmatic use (e.g. in a workflow or a jupyter notebook), not through the normal CLI. An experiment showing a possible config setup for this use case is demonstrated with the [im2im/segmentation_array](configs/experiment/im2im/segmentation_array.yaml) experiment. For training, data must be passed as a dictionary with keys \"train\" and \"val\" containing lists of dictionaries with keys corresponding to the data config.\n\n```python\nfrom cyto_dl.api import CytoDLModel\nimport numpy as np\n\nmodel = CytoDLModel()\nmodel.load_default_experiment(\"segmentation_array\", output_dir=\"./output\")\nmodel.print_config()\n\n# create CZYX dummy data\ndata = {\n    \"train\": [{\"raw\": np.random.randn(1, 40, 256, 256), \"seg\": np.ones((1, 40, 256, 256))}],\n    \"val\": [{\"raw\": np.random.randn(1, 40, 256, 256), \"seg\": np.ones((1, 40, 256, 256))}],\n}\nmodel.train(data=data)\n```\n\nFor predicting, data must be passed as a list of numpy arrays. The resulting predictions will be processed in a dictionary with one key for each task head in the model config and corresponding values in BC(Z)YX order.\n\n```python\nfrom cyto_dl.api import CytoDLModel\nimport numpy as np\nfrom cyto_dl.utils import extract_array_predictions\n\nmodel = CytoDLModel()\nmodel.load_default_experiment(\n    \"segmentation_array\", output_dir=\"./output\", overrides=[\"data=im2im/numpy_dataloader_predict\"]\n)\nmodel.print_config()\n\n# create CZYX dummy data\ndata = [np.random.rand(1, 32, 64, 64), np.random.rand(1, 32, 64, 64)]\n\n_, _, output = model.predict(data=data)\npreds = extract_array_predictions(output)\n```\n\nTrain model with chosen experiment configuration from [configs/experiment/](configs/experiment/)\n\n```bash\n#gpu\npython cyto_dl/train.py experiment=im2im/experiment_name.yaml trainer=gpu\n\n#cpu\npython cyto_dl/train.py experiment=im2im/experiment_name.yaml trainer=cpu\n\n```\n\nYou can override any parameter from command line like this\n\n```bash\npython cyto_dl/train.py trainer.max_epochs=20 datamodule.batch_size=64\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallencellmodeling%2Fcyto-dl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fallencellmodeling%2Fcyto-dl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fallencellmodeling%2Fcyto-dl/lists"}