{"id":51305008,"url":"https://github.com/distributedscience/distributed-cellprofiler","last_synced_at":"2026-06-30T23:01:25.733Z","repository":{"id":51673957,"uuid":"63957917","full_name":"DistributedScience/Distributed-CellProfiler","owner":"DistributedScience","description":"Run encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.","archived":false,"fork":false,"pushed_at":"2026-04-15T18:15:19.000Z","size":10197,"stargazers_count":45,"open_issues_count":15,"forks_count":26,"subscribers_count":7,"default_branch":"master","last_synced_at":"2026-04-15T20:19:47.272Z","etag":null,"topics":["aws","cellprofiler","docker"],"latest_commit_sha":null,"homepage":"https://distributedscience.github.io/Distributed-CellProfiler/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DistributedScience.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2016-07-22T14:04:30.000Z","updated_at":"2026-04-15T18:14:34.000Z","dependencies_parsed_at":"2024-05-21T21:42:42.918Z","dependency_job_id":"a290b9c6-40ed-4783-9932-a38e9b71c606","html_url":"https://github.com/DistributedScience/Distributed-CellProfiler","commit_stats":null,"previous_names":["cellprofiler/distributed-cellprofiler"],"tags_count":9,"template":false,"template_full_name":null,"purl":"pkg:github/DistributedScience/Distributed-CellProfiler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedScience%2FDistributed-CellProfiler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedScience%2FDistributed-CellProfiler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedScience%2FDistributed-CellProfiler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedScience%2FDistributed-CellProfiler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DistributedScience","download_url":"https://codeload.github.com/DistributedScience/Distributed-CellProfiler/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DistributedScience%2FDistributed-CellProfiler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34986248,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-30T02:00:05.919Z","response_time":92,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","cellprofiler","docker"],"created_at":"2026-06-30T23:01:22.604Z","updated_at":"2026-06-30T23:01:25.725Z","avatar_url":"https://github.com/DistributedScience.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Distributed-CellProfiler\nRun encapsulated docker containers with CellProfiler in the Amazon Web Services infrastructure.\n\nThis code is an example of how to use AWS distributed infrastructure for running CellProfiler.\nThe configuration of the AWS resources is done using boto3 and the AWS CLI.\nThe worker is written in Python and is encapsulated in a docker container.\nThere are four AWS components that are minimally needed to run distributed jobs:\n\n1. An SQS queue\n2. An ECS cluster\n3. An S3 bucket\n4. A spot fleet of EC2 instances\n\nAll of them can be managed through the AWS Management Console.\nHowever, this code helps to get started quickly and run a job autonomously if all the configuration is correct.\nThe code prepares the infrastructure to run a distributed job.\nWhen the job is completed, the code is also able to stop resources and clean up components.\nIt also adds logging and alarms via CloudWatch, helping the user troubleshoot runs and destroy stuck machines.\n\n## Documentation\nComprehensive documentation, including troubleshooting, is available at [Distributed CellProfiler Documentation](https://distributedscience.github.io/Distributed-CellProfiler).\n\n## Running the code\n\n### Step 1\nEdit the config.py file with all the relevant information for your job.\nThen, start creating the basic AWS resources by running the following script:\n\n $ python run.py setup\n\nThis script initializes the resources in AWS.\nNotice that the docker registry is built separately, and you can modify the worker code to build your own.\nAny time you modify the worker code, you need to update the docker registry using the Makefile script inside the worker directory.\n\n### Step 2\nAfter the first script runs successfully, the job can now be submitted to AWS using EITHER of the following commands:\n\n $ python run.py submitJob files/exampleJob.json\n\n OR\n\n $ python run_batch_general.py\n\nRunning either script uploads the tasks that are configured in the json file.\nThis assumes that your data is stored in S3, and the json file has the paths to find input and output directories.\nYou have to customize the `exampleJob.json` file or the `run_batch_general.py` file with paths that make sense for your project.\nThe tasks that compose your job are CP groups, and each one will be run in parallel.\nYou need to define each task in your input file to guide the parallelization.\n\n### Step 3\nAfter submitting the job to the queue, we can add computing power to process all tasks in AWS.\nThis code starts a fleet of spot EC2 instances which will run the worker code.\nThe worker code is encapsulated in docker containers, and the code uses ECS services to inject them in EC2.\nAll of this is automated with the following command:\n\n $ python run.py startCluster files/exampleFleet.json\n\nAfter the cluster is ready, the code informs you that everything is setup, and saves the spot fleet identifier\nin a file for further reference.\n\n### Step 4\nWhen the cluster is up and running, you can monitor progress using the following command:\n\n $ python run.py monitor files/APP_NAMESpotFleetRequestId.json\n\nThe file APP_NAMESpotFleetRequestId.json is created after the cluster is setup in step 3.\nIt is important to keep this monitor running if you want to automatically shutdown computing resources when there are no more tasks in the queue (recommended).\n\n![Distributed-CellProfiler-Workflow](documentation/DCP-documentation/images/DCP-chronological_schematic.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdistributedscience%2Fdistributed-cellprofiler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdistributedscience%2Fdistributed-cellprofiler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdistributedscience%2Fdistributed-cellprofiler/lists"}