{"id":20253544,"url":"https://github.com/chanzuckerberg/swipe","last_synced_at":"2025-04-10T23:43:28.254Z","repository":{"id":39879704,"uuid":"354043997","full_name":"chanzuckerberg/swipe","owner":"chanzuckerberg","description":"SFN-WDL infrastructure for pipeline execution - a template repository and Terraform module for SFN-WDL based projects","archived":false,"fork":false,"pushed_at":"2025-04-02T23:51:06.000Z","size":65676,"stargazers_count":8,"open_issues_count":7,"forks_count":5,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-04-10T23:43:13.316Z","etag":null,"topics":["step-functions","wdl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chanzuckerberg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-04-02T14:32:14.000Z","updated_at":"2025-04-07T20:49:51.000Z","dependencies_parsed_at":"2023-02-12T05:02:30.386Z","dependency_job_id":"a34e888f-3168-4a78-bf95-1f8a2fec6622","html_url":"https://github.com/chanzuckerberg/swipe","commit_stats":null,"previous_names":[],"tags_count":65,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanzuckerberg%2Fswipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanzuckerberg%2Fswipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanzuckerberg%2Fswipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chanzuckerberg%2Fswipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chanzuckerberg","download_url":"https://codeload.github.com/chanzuckerberg/swipe/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248317726,"owners_count":21083527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["step-functions","wdl"],"created_at":"2024-11-14T10:25:30.996Z","updated_at":"2025-04-10T23:43:28.223Z","avatar_url":"https://github.com/chanzuckerberg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SWIPE: SFN-WDL infrastructure for pipeline execution\n\nSwipe is a terraform module for creating AWS infrastructure to run WDL workflows. Swipe uses Step Functions, Batch, S3, and Lambda to run WDL workflows in a scalable, performant, reliable, and observable way.\n\nWith swipe you can run a WDL workflow with S3 inputs with a single call to the AWS Step Functions API and get your results in S3 as well as paths to your results in your Step Function Output.\n\n## Why use swipe?\n\n- **Minimal infrastructure setup**: Swipe is an infrastructure module first so you won't need to do much infrastructure configuration like you might have to do with other tools that are primarily software. Once you configure the minimal swipe configuration variables in the terraform module and apply you can start using swipe at high scale right away.\n- **Highly optimized for working with large files**: Many bioinformatics tools are local tools designed to take files as input and produce files as input. Often, these files can get very large though many tools either don't support distributed approaches or would be made much slower with a distributed approach. Swipe is highly optimized for this use case. By default swipe:\n    - Configures AWS Batch to work with NVME drives for super fast file I/O operations\n    - Has a built in multi-threaded S3 uploader and downloader that can saturate 10 GB/sec network connection so your input and output files can be downloaded quickly\n    - Has a built in input cache so inputs common to all of your pipeline runs can be safely re-used across jobs. This is particularly useful if your pipeline uses large reference databases that don't change from run to run which is typical of many bioinformatics workloads.\n- **Cost savings while preserving pipeline throughput and latency**: Swipe tries each workflow first on a Spot instance for cost savings, then retries the workflow on-demand after the first failure. This results in high cost savings with a minimal sacrifice to both throughput and latency. If swipe retried on spot throughput may still be high, but by retrying on demand swipe also keeps latency (time for a single pipeline to complete) relatively low. This is useful if you have users waiting on results.\n- **Built in monitoring**: Swipe automatically monitors key workflow metrics and you can analyzing failures in the AWS console\n- **Easy integration**: Using AWS eventbridge you can easily route SNS notifications to notify other services of workflow status\n\n## Why not use swipe?\n\n- **You are not using AWS**: Swipe is highly opinionated about the infrastructure it runs on. If you are not using AWS you can't use swipe.\n- **You are running distributed big data jobs**: At time of writing, swipe is optimized for workflows with local files. If you intend to run distributed big data jobs, like Map Reduce jobs, swipe is probably not the right choice.\n\n## Usage\n\n### Basic Usage\n\n\n#### Create swipe infrastructure\n\nTo use swipe you first need to create the infrastructure with terraform. You will need an S3 bucket to store your inputs and outputs called a `workspace` and an S3 bucket to store your wdl files. They can be the same bucket but for clarity I will use two different buckets, and I recommend you do the same.\n\n\n```terraform\nresource \"aws_s3_bucket\" \"workspace\" {\n  bucket = \"my-test-app-swipe-workspace\"\n}\n\nresource \"aws_s3_bucket\" \"wdls\" {\n  bucket = \"my-test-app-swipe-wdls\"\n}\n\nmodule \"swipe\" {\n    source = \"github.com/chanzuckerberg/swipe?ref=v0.7.0-beta\"\n\n    app_name               = \"my-test-app\"\n    workspace_s3_prefixes  = [aws_s3_bucket.workspace.bucket]\n    wdl_workflow_s3_prefix = aws_s3_bucket.workspace.bucket\n}\n```\n\nThis will produce an output called `sfn_arns`, a map of stepfunction names to their ARNs. By default swipe creates a single default stepfunction called `default`.\n\n#### Upload your WDL workflow to S3\n\nNow we need to define a workflow to run. Here is a basic WDL workflow that leverages some files:\n\n```WDL\nversion 1.0\nworkflow hello_swipe {\n  input {\n    File hello\n    String docker_image_id\n  }\n\n  call add_world {\n    input:\n      hello = hello,\n      docker_image_id = docker_image_id\n  }\n\n  output {\n    File out = add_world.out\n  }\n}\n\ntask add_world {\n  input {\n    File hello\n    String docker_image_id\n  }\n\n  command \u003c\u003c\u003c\n    cat ~{hello} \u003e out.txt\n    echo world \u003e\u003e out.txt\n  \u003e\u003e\u003e\n\n  output {\n    File out = \"out.txt\"\n  }\n\n  runtime {\n      docker: docker_image_id\n  }\n}\n```\n\nLet's save this one as `hello.wdl` and upload it:\n\n```bash\naws s3 cp hello.wdl s3://my-test-app-swipe-wdls/hello.wdl\n```\n\nLet's also make a test input for file for it and upload that:\n\n```bash\ncat hello \u003e\u003e input.txt\naws s3 cp input.txt s3://my-test-app-swipe-workspace/input.txt\n```\n\n#### Run your wdl\n\nYou can run you WDL with inputs and an output path using the AWS API. Here I will use python and boto3 for easy readability:\n\n```python\nimport boto3\nimport json\n\nclient = boto3.client('stepfunctions')\n\nresponse = client.start_execution(\n    stateMachineArn='DEFAULT_STEP_FUNCTION_ARN',\n    name='my-swipe-run',\n    input=json.dumps({\n      \"RUN_WDL_URI\": \"s3://my-test-app-swipe-wdls/hello.wdl\",\n      \"OutputPrefix\": \"s3://my-test-app-swipe-workspace/outputs/\",\n      \"Input\": {\n          \"Run\": {\n              \"hello\": \"s3://my-test-app-swipe-workspace/input.txt\",\n          }\n      }\n    }),\n)\n```\n\nOnce your step function is complete your output should be at `s3://my-test-app-swipe-workspace/outputs/hello/out.txt`. Note that `out.txt` came from the WDL workflow.\n\n## Development\n\n### Requirements\n\n- Terraform\n- Python 3.9\n- Docker + Docker Compose\n\n### Running tests\n\nSetup python dependencies:\n\n```bash\nvirtualenv .venv\nsource .venv/bin/activate\npip install -r requirements.txt\n```\n\nBring up mock AWS infrastructure:\n\n```bash\nmake up\n```\n\nRun the tests:\n\n```bash\nmake test\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchanzuckerberg%2Fswipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchanzuckerberg%2Fswipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchanzuckerberg%2Fswipe/lists"}