{"id":25505281,"url":"https://github.com/aldbr/dirac-cwl-proto","last_synced_at":"2025-07-18T12:33:21.901Z","repository":{"id":207153147,"uuid":"718555994","full_name":"aldbr/dirac-cwl-proto","owner":"aldbr","description":"Proof of Concept: integrating CWL into Dirac","archived":false,"fork":false,"pushed_at":"2025-02-04T07:44:26.000Z","size":706,"stargazers_count":0,"open_issues_count":7,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-04T08:27:27.079Z","etag":null,"topics":["cwl","dirac","reproducibility","science","workflow-management","workload-management"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aldbr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-14T10:27:39.000Z","updated_at":"2025-02-04T07:44:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"25961bec-f1fa-4158-b7c7-1d2ac413f3c3","html_url":"https://github.com/aldbr/dirac-cwl-proto","commit_stats":null,"previous_names":["aldbr/dirac-cwl-proto"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aldbr%2Fdirac-cwl-proto","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aldbr%2Fdirac-cwl-proto/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aldbr%2Fdirac-cwl-proto/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aldbr%2Fdirac-cwl-proto/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aldbr","download_url":"https://codeload.github.com/aldbr/dirac-cwl-proto/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239605065,"owners_count":19667001,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cwl","dirac","reproducibility","science","workflow-management","workload-management"],"created_at":"2025-02-19T06:00:00.111Z","updated_at":"2025-02-19T06:00:01.003Z","avatar_url":"https://github.com/aldbr.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\"\u003e\n  \u003cimg alt=\"Dirac CWL Logo\" src=\"public/CWLDiracX.png\" width=\"300\" \u003e\n\u003c/p\u003e\n\n# Dirac CWL Prototype\n![Workflow tests](https://github.com/aldbr/dirac-cwl-proto/actions/workflows/main.yml/badge.svg?branch=main)\n\n\nThis Python prototype introduces a command-line interface (CLI) designed for the end-to-end execution of Common Workflow Language (CWL) workflows at different scales. It enables users to locally test CWL workflows, and then run them as jobs, transformations and/or productions.\n\n## Prototype Workflow\n\n### Local testing\n\nInitially, the user tests the CWL workflow locally using `cwltool`. This step involves validating the workflow's structure and ensuring that it executes correctly with the provided inputs.\n\n  \u003e - CWL task: workflow structure\n  \u003e - inputs of the task\n\nOnce the workflow passes local testing, the user can choose from 3 options for submission depending on the requirements.\n\n### Submission methods\n\n1. **Submission as Dirac Jobs**: For simple workflows with a limited number of inputs, CWL tasks can be submitted as individual jobs. In this context, they are run locally as if they were run on distributed computing resources. Additionally, users can submit the same workflow with different sets of inputs in a single request, generating multiple jobs at once.\n\n  \u003e - CWL task\n  \u003e - [inputs1, inputs2, ...]\n  \u003e - Dirac description (site, priority):  Dirac-specific attributes related to scheduling\n  \u003e - Metadata (job type): Dirac-specific attributes related to scheduling + execution\n\n2. **Submission as Dirac Transformation**: For workflows requiring continuous, real-time input data or large-scale execution, CWL tasks can be submitted as transformations. As new input data becomes available, jobs are automatically generated and executed as jobs. This method is ideal for ongoing data processing and scalable operations.\n\n  \u003e - CWL task (inputs already described within it)\n  \u003e - Dirac description (site, priority)\n  \u003e - Metadata (job type, group size, query parameters)\n\n3. **Submission as Dirac Productions**: For complex workflows that require multiple steps with different requirements, CWL tasks can be submitted as productions. This method allows the workflow to be split into multiple transformations, with each transformation handling a distinct step in the process. Each transformation can manage one or more jobs, enabling large-scale, multi-step execution.\n\n  \u003e - CWL task (inputs already described within it)\n  \u003e - Step Metadata (per step):\n  \u003e   - Dirac description (site, priority)\n  \u003e   - Metadata (job type, group size, query parameters)\n\n## Installation\n\nTo use this package, you first need to create a conda environment:\n\n```bash\nmamba env create -f environment.yaml\nconda activate dirac-cwl\n```\n\nThen, install the package:\n\n```bash\npip install -e .\n```\n\n## Usage\n\n```bash\ndirac-cwl job submit \u003cworkflow_path\u003e [--parameter-path \u003cinput_path\u003e] [--metadata-path \u003cmetadata_path\u003e]\n\ndirac-cwl transformation submit \u003cworkflow_path\u003e [--metadata-path \u003cmetadata_path\u003e]\n\ndirac-cwl production submit \u003cworkflow_path\u003e [--steps-metadata-path \u003csteps_metadata_path\u003e]\n```\n\nThis package contains modules and tools to manage CWL workflows:\n\n- `src/modules`: Python scripts for individual steps in workflows.\n- `src/cli`: Utility scripts for managing and executing CWL workflows.\n- `test/workflows`: CWL workflow definitions.\n\nTo use the workflows and inputs directly with `cwltool`, you need to add the `modules` directory to the `$PATH`:\n\n```bash\nexport PATH=$PATH:\u003c/path/to/dirac-cwl-proto/src/dirac_cwl_proto/modules\u003e\ncwltool \u003cworkflow_path\u003e \u003cinputs\u003e\n```\n\n## Contribute\n\n### Add a workflow\n\nTo add a new workflow to the project, follow these steps:\n\n- Create a new directory under `workflows` (e.g. `workflows/helloworld`)\n- Add one or more variants of a workflow under different directory (e.g. `helloworld/helloworld_basic/description.cwl` and `helloworld/helloworld_with_inputs/description.cwl`)\n- In a `type_dependencies` subdirectory, add the required files to submit a job/transformation/production from a given variant.\n\nDirectory Structure Example:\n\n```\nworkflows/\n└── my_new_workflow/\n    |\n    ├── my_new_workflow_complete/\n    |   └── description.cwl\n    ├── my_new_workflow_step1/\n    |   └── description.cwl\n    ├── my_new_workflow_step2/\n    |   └── description.cwl\n    |\n    └── type_dependencies/\n        ├── production/\n        |   └── steps_metadata.yaml\n        ├── transformation/\n        |   └── metadata.yaml\n        └── job/\n            ├── inputs1.yaml\n            └── inputs2.yaml\n```\n\n### Add a module\n\nIf your workflow requires calling a script, you can add this script as a module. Follow these steps to properly integrate the module:\n\n- Add the script: Place your script in the `src/dirac_cwl_proto/modules` directory.\n- Update `pyproject.toml`: Add the script to the `pyproject.toml` file to create a command-line interface (CLI) command.\n- Reinstall the package: Run `pip install .` to reinstall the package and make the new script available as a command.\n- Usage in CWL Workflow: Reference the command in your `description.cwl` file.\n\n**Example**\n\nLet’s say you have a script named `generic_command.py` located at `src/dirac_cwl_proto/modules/generic_command.py`. Here's how you can integrate it:\n\n- `generic_command.py` Example Script:\n\n```python\n#!/usr/bin/env python3\nimport typer\nfrom rich.console import Console\n\napp = typer.Typer()\nconsole = Console()\n\n@app.command()\ndef run_example():\n    console.print(\"This is an example command.\")\n\nif __name__ == \"__main__\":\n    app()\n```\n\n- Update `pyproject.toml`:\n\n```toml\n[project.scripts]\ngeneric-command = \"dirac_cwl_proto.modules.generic_command:app\"\n```\n\n- Reinstall the package with `pip install .`:\n- Reference in description.cwl:\n\n```yaml\nbaseCommand: [generic-command]\n```\n\n### Test your changes\n\n- Add your test in `test/test_workflows`.\n- Run `pytest`:\n\n```bash\npytest test/test_workflows.py\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faldbr%2Fdirac-cwl-proto","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faldbr%2Fdirac-cwl-proto","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faldbr%2Fdirac-cwl-proto/lists"}