{"id":43894378,"url":"https://github.com/recursionpharma/rxrx1-utils","last_synced_at":"2026-02-06T17:14:43.795Z","repository":{"id":39737144,"uuid":"193984046","full_name":"recursionpharma/rxrx1-utils","owner":"recursionpharma","description":"Starter code for the CellSignal NeurIPS 2019 competition.","archived":false,"fork":false,"pushed_at":"2025-06-06T12:34:27.000Z","size":1627,"stargazers_count":45,"open_issues_count":3,"forks_count":26,"subscribers_count":4,"default_branch":"trunk","last_synced_at":"2025-06-06T13:31:44.608Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/recursionpharma.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-06-26T22:09:30.000Z","updated_at":"2025-06-06T12:34:30.000Z","dependencies_parsed_at":"2022-08-29T00:21:10.509Z","dependency_job_id":"c2b134ce-9f1f-4e49-b668-2a7545447674","html_url":"https://github.com/recursionpharma/rxrx1-utils","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/recursionpharma/rxrx1-utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recursionpharma%2Frxrx1-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recursionpharma%2Frxrx1-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recursionpharma%2Frxrx1-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recursionpharma%2Frxrx1-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/recursionpharma","download_url":"https://codeload.github.com/recursionpharma/rxrx1-utils/tar.gz/refs/heads/trunk","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/recursionpharma%2Frxrx1-utils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29169401,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-06T16:33:35.550Z","status":"ssl_error","status_checked_at":"2026-02-06T16:33:30.716Z","response_time":59,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-06T17:14:43.289Z","updated_at":"2026-02-06T17:14:43.787Z","avatar_url":"https://github.com/recursionpharma.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![scorecard-score](https://github.com/recursionpharma/octo-guard-badges/blob/trunk/badges/repo/rxrx1-utils/maturity_score.svg?raw=true)](https://infosec-docs.prod.rxrx.io/octoguard/scorecards/rxrx1-utils)\n[![scorecard-status](https://github.com/recursionpharma/octo-guard-badges/blob/trunk/badges/repo/rxrx1-utils/scorecard_status.svg?raw=true)](https://infosec-docs.prod.rxrx.io/octoguard/scorecards/rxrx1-utils)\n# rxrx1-utils\n\nStarter code for the CellSignal NeurIPS 2019 competition [hosted on Kaggle](https://www.kaggle.com/c/recursion-cellular-image-classification).\n\nTo learn more about the dataset please visit [RxRx.ai](http://rxrx.ai).\n\n## Notebooks\n\nHere are some notebooks to illustrate how this code can be used.\n\n * [Image visualization][vis-notebook]\n * [Model training on TPUs][training-notebook]\n\n [vis-notebook]: https://colab.research.google.com/github/recursionpharma/rxrx1-utils/blob/trunk/notebooks/visualization.ipynb\n [training-notebook]: https://colab.research.google.com/github/recursionpharma/rxrx1-utils/blob/trunk/notebooks/training.ipynb\n\n## Setup\n\nThis starter code works with python 2.7 and above. To install the deps needed for training and visualization run:\n\n```\npip install -r  requirements.txt\n```\n\nIf you plan on using the preprocessing functionality you also need to install other deps:\n\n```\npip install -r preprocessing_requirements.txt\n```\n\n## Preprocessing\n\nReading individual image files can become an IO bottleneck during training. This is will be a common problem faced by people who use this dataset so we are also releasing an example script to pack the images into TFRecords and `zarr` files. We are also making available some pre-created TFRecords available in Google Cloud Storage. Read more about the [provided TFRecords below](#provided-tfrecords).\n\n\n### images2tfrecords\n\nScript that packs raw images from the `rxrx1` dataset into `TFRecord`s. This scripts runs locally or using Google DataFlow.\n\nRun `python -m rxrx.preprocess.images2tfrecords --help` for usage instructions.\n\n\n### images2zarr\n\nScript that packs raw images from the `rxrx1` dataset into `zarr`s. This script only runs locally but could easily be extended to run using Google DataFlow.\n\nThis script packs each site image into a single `zarr`. So, instead of having to load 6 separate channel `png`s for a singe image all of those channels will be saved together in a single `zarr` file.\nYou could extend the script to pack more images into a single `zarr` file similar to what is done for `TFRecord`s. This is left as an exercise to the IO bound reader. :) Read more about the Zarr format and library [here](https://zarr.readthedocs.io/en/stable/).\n\nRun `python -m rxrx.preprocess.images2zarr --help` for usage instructions.\n\n## Training on TPUs\n\nThis repo has barebones starter code on how to train a model on the RxRx1 dataset using Google Cloud TPUs.\n\nThe easiest way to see this in action is to look at this [notebook][training-notebook].\n\nYou can also spin up a VM to launch jobs from. To understand TPUs the best place to start is the [TPU quickstart guide][tpu-quickstart]. The `ctpu` command is helpful and you can find [its documentation][ctpu-docs] here. Note, that you can easily [download and install ctpu][download-ctpu] to you local machine.\n\n[tpu-quickstart]: https://cloud.google.com/tpu/docs/quickstart\n[ctpu-docs]: https://cloud.google.com/tpu/docs/ctpu-reference\n[download-ctpu]: https://github.com/tensorflow/tpu/tree/master/tools/ctpu#download\n\n### Example TPU workflow\n\nFirst spin up a VM:\n```\nctpu up -vm-only -forward-agent -forward-ports -name my-tpu-vm\n```\n\nThis command will create the VM and `ssh` you into it. Note how the `-vm-only` flag is used. This allows you to spin up the VM separate from the TPU which helps prevent spending money on idle TPUs.\n\nNext, setup the repo and install the dependencies:\n```\ngit clone git@github.com:recursionpharma/rxrx1-utils.git\ncd rxrx1-utils\npip install -r requirements.txt # optional if just training!\n```\n\nNote that for just training you can skip the `pip install` since the VM will have all the needed deps already.\n\nNext you need to spin up a TPU for training:\n```\nexport TPU_NAME=my-tpu-v3-8\nctpu up -name \"$TPU_NAME\" -preemptible -tpu-only -tpu-size v3-8\n```\n\nOnce that is complete you can start a training job:\n```\npython -m rxrx.main --model-dir \"gs://path-to-bucket/trial-id/\"\n```\nYou'll also want to launch a `tensorboard` to watch to check the results:\n\n```\ntensorboard --logdir=gs://path-to-bucket/\n```\nSince we used the `-forward-ports` in the `ctpu` command when starting the VM you will be able to view `tensorboard` on your localhost.\n\nOnce you are done with the TPU be sure to delete it!\n```\nctpu delete -name \"$TPU_NAME\" -tpu-only`\n```\n\nYou can then iterate on the code and spin up a TPU again when ready to try again.\n\nWhen you are done with your VM you can either stop it or delete it with the `ctpu` command, for example:\n```\nctpu delete -name my-tpu-vm\n```\n\n## Provided TFRecords\n\nAs noted above we are providing TFRecords. They live in the following buckets:\n\n```\ngs://rxrx1-us-central1/tfrecords\ngs://rxrx1-europe-west4/tfrecords\n```\n\nThe data lives in these two regional buckets because when you train with TPUs you want to train from buckets in the same region as your TPU. Remember to use the appropriate bucket that is in the same region as your TPU!\n\nThe directory structure of the TFRecords is as follows:\n\n```\n└── tfrecords\n         ├── by_exp_plate_site-42\n         │   ├── HEPG2-10_p1_s1.tfrecord\n         │   ├── HEPG2-10_p1_s2.tfrecord\n         │   ├── ….\n         │   ├── U2OS-03_p3_s2.tfrecord\n         │   ├── U2OS-03_p4_s2.tfrecord\n         │   └── U2OS-03_p4_s2.tfrecord\n         └── random-42\n           ├── train\n           │   ├── 001.tfrecord\n           │   ├── 002.tfrecord\n….\n```\nThe `random-42` denotes that the data has been split up randomly across different tfrecords, each record holding ~1000 examples. The `42` is the random seed used to generate this partition. The example code in this repository uses this version of the data.\n\nThe `by_exp_plate_site-42` is where each TFRecord contains an all of the images for a particular experiment, plate, and site grouping. Internally the well addresses are random in the TFRecord. The advantage of this grouping is that you can be selective on the experiments you train on. Due to the grouping each TFRecord here has only about ~277 examples per file.\n\nFor good training batch diversity it is recommended that you use the TF Dataset API to interleave examples from these various files. The provided `input_fn` in this repository already does this.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frecursionpharma%2Frxrx1-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frecursionpharma%2Frxrx1-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frecursionpharma%2Frxrx1-utils/lists"}