{"id":13738646,"url":"https://github.com/AllenCellModeling/cookiecutter-stepworkflow","last_synced_at":"2025-05-08T16:35:03.140Z","repository":{"id":56151243,"uuid":"236907209","full_name":"AllenCellModeling/cookiecutter-stepworkflow","owner":"AllenCellModeling","description":"AICS Cookiecutter Template for a simple data + code workflow","archived":false,"fork":false,"pushed_at":"2021-01-04T18:32:56.000Z","size":77,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-22T07:31:39.583Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AllenCellModeling.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-29T05:06:19.000Z","updated_at":"2020-11-24T02:40:06.000Z","dependencies_parsed_at":"2022-08-15T13:40:33.216Z","dependency_job_id":null,"html_url":"https://github.com/AllenCellModeling/cookiecutter-stepworkflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcookiecutter-stepworkflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcookiecutter-stepworkflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcookiecutter-stepworkflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AllenCellModeling%2Fcookiecutter-stepworkflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AllenCellModeling","download_url":"https://codeload.github.com/AllenCellModeling/cookiecutter-stepworkflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253105713,"owners_count":21855085,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T03:02:30.728Z","updated_at":"2025-05-08T16:35:02.789Z","avatar_url":"https://github.com/AllenCellModeling.png","language":"Python","readme":"# Cookiecutter StepWorkflow\n\n[![Example Repo Status](https://github.com/AllenCellModeling/cookiecutter-stepworkflow/workflows/Build%20Example%20Repo/badge.svg)](https://github.com/AllenCellModeling/cookiecutter-stepworkflow/tree/example-build)\n\n![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)\n\nAICS Cookiecutter template for a simple data + code workflow:\n\n  - git(hub) for code\n  - quilt for data\n  - prefect to combine\n\n[An Example Workflow Produced with this Template](https://github.com/AllenCellModeling/example_step_workflow)\n\n## Getting started with this template\nTo use this template for a new workflow, use the following commands and then follow the\nprompts from the terminal.\n\n```\npip install cookiecutter\ncookiecutter gh:AllenCellModeling/cookiecutter-stepworkflow\n```\n\n## Configuring your new project\nOnce you've followed the prompts, you should have a template repository that we need to\n\n  - install as a Python package\n  - connect to GitHub\n  - connect to Quilt\n\n### Install as a Python package\nFirst, we'll make a `conda` environment to house this project's python dependencies.\nIf you don't have `conda` installed, install it with\n[miniconda](https://docs.conda.io/en/latest/miniconda.html).\n\nWhatever you named your project, make a conda environment of the same name\n\n```bash\nconda create --name \u003cproject_name\u003e python=3.7\n```\n\nand activate it with\n\n```bash\nconda activate \u003cproject_name\u003e\n```\n\nTo install the project as a python package, `cd` into the project directory, and then\n\n```bash\ncd \u003cproject_name\u003e\npip install -e .[dev]\n```\n\nThis will install your package in editable mode with all the required development\ndependencies.\n\n### Connect to GitHub\n\nCreate an empty repository on GitHub that has the same name as your project (you need\nto do this via the GitHub website). Don't initialize it with a README or anything.\n\nOnce the GitHub repo is created, push your project up to Github with\n\n```bash\ngit remote add origin git@github.com:AllenCellModeling/\u003cproject_name\u003e.git\ngit push -u origin master\n```\n\nIf you get permissions errors, make sure you have [ssh keys installed](https://help.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent),\nor use `https://github.com` instead of `git@github.com:` in the origin address above.\n\nYour initial commit will show a broken build badge. To fix this, configure codecov and\na documentation generation access token following the instructions\n[here](https://github.com/AllenCellModeling/cookiecutter-pypackage).\n\n### Connect to Quilt\n\nAccess to quilt data in S3 requires two files\n\n`~/.aws/credentials`:\n\n```\n[default]\naws_access_key_id=\u003cyour_secret_access_key_id\u003e\naws_secret_access_key=\u003cyour_secret_access_key\u003e\n```\n\n`~/.aws/config`:\n\n```\n[default]\nregion=us-west-2\n```\n\n## Running your workflow\n\n### Example Step\nThis template comes with an example first workflow step `Raw`.  \n\n#### Run\nYou should be able to run this with the command\n\n```bash\n\u003cproject_name\u003e raw run\n```\n\nThis will write out some \"raw data\" (some randomly generated images) to\n`local_staging/raw`.\n\nYou should edit the `run` function of the `Raw` class in\n`\u003cproject_name\u003e/steps/raw/raw.py` to do something relevant to your workflow, e.g.\naggregating raw data and getting it ready to push to Quilt.\n\n#### Push\nTo push the data in `local_staging` to quilt, use\n\n```bash\n\u003cproject_name\u003e raw push\n```\n\nIf your git branch is on `master`, this will save your data in quilt to\n`aics/\u003cproject_name\u003e/master/raw`.\n\n#### Checkout\nTo download the remote data and overwrite your local data, use\n\n```bash\n\u003cproject_name\u003e raw checkout\n```\n\n#### Pull\nTo download the remote data needed as input to run a step, use\n\n```bash\n\u003cproject_name\u003e raw pull\n```\n\nSince `Raw` is the first step, and doesn't need any inputs, this doesn't do anything\nhere.\n\n### Add a new step\nTo make a new step in your workflow, in the main project directory use\n\n```bash\nmake_new_step \u003cStepName\u003e\n```\n\nThis will create a `StepName` class in `\u003cproject_name/steps/step_name/step_name.py\u003e`,\nwith a `run` method that is ready for you to edit.\n\nIf your step directly depends on the output of another step for input data to this\nstep, set the `direct_upstream_tasks` kwarg in the class `__init__` method to be a list\nof the steps this one depends on. The list should be of step _classes_, e.g.\n`direct_upstream_tasks = [Raw]`.\n\nFor your step to run successfully, you need to save a dataframe manifest of the files\nyou're writing out to `self.manifest`, and then save that as `manifest.csv`.  See the\n`Raw` step for an example.\n\n### Run everything at once\nTo run all of your steps at once, use\n\n```bash\n\u003cproject_name\u003e all run\n```\n\n`push` and `checkout` also work with `all` this way, to push or checkout all of your\ndata at once.\n\nIf you add a new step to your workflow, you should also edit\n`\u003cproject_name\u003e/bin/all.py` and in the `All` class, change `self.step_list` to include\nyour new steps, in the order in which you want to run them.\n\n### Branches\n\nYou won't be able to push data to Quilt unless your git status is clean. This is\nintended to maintain parity between the data we save, and the code that generated it.\nTo have alternate version of workflow data, just switch to a new git branch\n\n```bash\ngit checkout -b \u003cnew_branch_name\u003e\n```\n\nPushing data to quilt with e.g. `\u003cproject_name\u003e push raw` will then save your data to\n`aics/\u003cproject_name\u003e/\u003cnew_branch_name\u003e/raw`.\n\n## Optional configuration:\nSee the README [here](https://github.com/AllenCellModeling/cookiecutter-pypackage) for\nall of the optional infrastructure you can (and should) add, e.g. docs, testing, etc.\n","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAllenCellModeling%2Fcookiecutter-stepworkflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAllenCellModeling%2Fcookiecutter-stepworkflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAllenCellModeling%2Fcookiecutter-stepworkflow/lists"}