{"id":18989749,"url":"https://github.com/cdcgov/cfa_azure","last_synced_at":"2025-04-22T11:09:25.137Z","repository":{"id":226499196,"uuid":"768086274","full_name":"CDCgov/cfa_azure","owner":"CDCgov","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-11T16:46:36.000Z","size":1059,"stargazers_count":9,"open_issues_count":2,"forks_count":6,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-04-11T17:44:09.834Z","etag":null,"topics":["azure","azure-batch","azure-blob-storage","azure-container-registry","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CDCgov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-03-06T12:52:39.000Z","updated_at":"2025-04-03T18:00:42.000Z","dependencies_parsed_at":"2024-04-22T15:36:26.514Z","dependency_job_id":"d8b356b7-6a4f-4613-a732-1a4ab9ac5cdd","html_url":"https://github.com/CDCgov/cfa_azure","commit_stats":null,"previous_names":["cdcgov/cfa_azure"],"tags_count":19,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CDCgov%2Fcfa_azure","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CDCgov%2Fcfa_azure/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CDCgov%2Fcfa_azure/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CDCgov%2Fcfa_azure/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CDCgov","download_url":"https://codeload.github.com/CDCgov/cfa_azure/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250228220,"owners_count":21395956,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","azure-batch","azure-blob-storage","azure-container-registry","python"],"created_at":"2024-11-08T17:07:46.458Z","updated_at":"2025-04-22T11:09:25.102Z","avatar_url":"https://github.com/CDCgov.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Version](https://img.shields.io/badge/dynamic/toml?url=https%3A%2F%2Fraw.githubusercontent.com%2FCDCgov%2Fcfa_azure%2Frefs%2Fheads%2Fmaster%2Fpyproject.toml\u0026query=%24.tool.poetry.version\u0026style=plastic\u0026label=version\u0026color=lightgray)\r\n![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit\u0026style=plastic\u0026link=https://raw.githubusercontent.com/CDCgov/cfa_azure/refs/heads/master/.pre-commit-config.yaml)\r\n![pre-commit](https://github.com/CDCgov/cfa_azure/workflows/pre-commit/badge.svg?style=plastic\u0026link=https://github.com/CDCgov/cfa_azure/actions/workflows/pre-commit.yaml)\r\n![CI](https://github.com/CDCgov/cfa_azure/workflows/Python%20Unit%20Tests%20with%20Coverage/badge.svg?style=plastic\u0026link=https://github.com/CDCgov/cfa_azure/actions/workflows/pre-commit.yaml\u0026link=https://github.com/CDCgov/cfa_azure/actions/workflows/ci.yaml)\r\n![GitHub License](https://img.shields.io/github/license/cdcgov/cfa_azure?style=plastic\u0026link=https://github.com/CDCgov/cfa_azure/blob/master/LICENSE)\r\n![Python](https://img.shields.io/badge/python-3670A0?logo=python\u0026logoColor=ffdd54\u0026style=plastic)\r\n![Azure](https://img.shields.io/badge/Microsoft-Azure-blue?logo=microsoftazure\u0026logoColor=white\u0026style=plastic)\r\n![GitHub commit activity](https://img.shields.io/github/commit-activity/m/cdcgov/cfa_azure?style=plastic)\r\n\r\n\r\n# cfa_azure Python Package\r\n## Created by Ryan Raasch (Peraton) for CFA\r\n\r\n# Outline\r\n- [Warnings](#warnings)\r\n- [Description](#description)\r\n- [Getting Started](#getting-started)\r\n- [Components](#components)\r\n  - [clients](#clients)\r\n    - [Logging](#logging)\r\n    - [Using Various Credential Methods](#using-various-credential-methods)\r\n    - [Persisting stdout and stderr to Blob Storage](#persisting-stdout-and-stderr-to-blob-storage)\r\n    - [Availability Zones](#availability-zones)\r\n    - [Updated Base Container Image](#updated-base-container-image)\r\n    - [Configuration](#configuration)\r\n    - [AzureClient Methods](#azureclient-methods)\r\n    - [Running Jobs and Tasks](#running-jobs-and-tasks)\r\n    - [Running Tasks from Yaml](#running-tasks-from-yaml)\r\n    - [Download Blob Files After Job Completes](#download-blob-files-after-job-completes)\r\n    - [Run DAGs](#run-dags)\r\n  - [automation](#automation)\r\n  - [local](#local)\r\n  - [batch_helpers](#batch_helpers)\r\n    - [Batch Helpers Functions](#batch-helpers-functions)\r\n  - [blob_helpers](#blob_helpers)\r\n    - [Blob Helpers Functions](#blob-helpers-functions)\r\n  - [helpers](#helpers)\r\n    - [Helpers Functions](#helpers-functions)\r\n  - [Common Use Case Scenarios](#common-use-case-scenarios)\r\n- [Public Domain Standard Notice](#public-domain-standard-notice)\r\n- [License Standard Notice](#license-standard-notice)\r\n- [Privacy Standard Notice](#privacy-standard-notice)\r\n- [Contributing Standard Notice](#contributing-standard-notice)\r\n- [Records Management Standard Notice](#records-management-standard-notice)\r\n- [Additional Standard Notices](#additional-standard-notices)\r\n\r\n# Warnings\r\n## ***Version 1.x.x WARNING***\r\nThe expected configuration.toml has changed several keys to make it easier on users to find the right information in the Azure Management Console. The following keys have changed:\r\n- `client_id` is now `batch_application_id`\r\n- `principal_id` is now `batch_object_id`\r\n- `application_id` is now `sp_application_id`\r\n\r\nRefer to the example_config.toml in the examples folder, found [here](examples/example_config.toml) to view the required keys/values needed in the configuration file.\r\n\r\n## ***Version 1.3.x WARNING***\r\nThe method `add_task()` no longer accepts parameters `use_uploaded_files` or `input_files`. Any files will need to be accounted for when specifying the docker command to run the task.\r\n\r\n# Description\r\nThe `cfa_azure` python module is intended to ease the challenge of working with Azure via multiple Azure python modules which require the correct steps and many lines of code to execute. `cfa_azure` simplifies many repeated workflows when interacting with Azure, Blob Storage, Batch, and more. For example, creating a pool in Azure may take different credentials and several clients to complete, but with `cfa_azure`, creating a pool is reduced to a single function with only a few parameters.\r\n\r\n# Getting Started\r\nIn order to use the `cfa_azure` library, you need [Python 3.10 or higher](https://www.python.org/downloads/), [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/), and any python package manager.\r\n\r\nTo install using pip:\r\n```bash\r\npip install git+https://github.com/CDCgov/cfa_azure.git\r\n```\r\n\r\n# Components\r\nThe `cfa_azure` module is composed of four submodules: `clients`, `automation`, `batch_helpers`, `blob_helpers`, `helpers`, and `local`. The module `clients` contains what we call the AzureClient, which combines the multiple Azure Clients needed to interact with Azure and consolidates to a single client. The `batch_helpers`, `blob_helpers` and `helpers` contains more fine-grained functions which are used within the `clients` module or independently for more control when working with Azure. The `automation` module introduces a simplified way to upload files and submit jobs/tasks to Batch via another configuration toml file. For help getting started with the `automation` module, please see [this overview](docs/automation_README.md).\r\n\r\nThe `local` submodule is meant to mimic the `cfa_azure` package but in a local environment, and contains submodules also called `client`, `automation` and `helpers`. This framework allows for users to easily switch between running code in Azure and locally. For example, someone with a working script importing the `AzureClient` by running `from cfa_azure.clients import AzureClient` could switch to running it locally by importing it through the `local` submodule like `from cfa_azure.local.clients import AzureClient`. The same holds for `local.automation` and `local.helpers`.\r\n\r\n**Note:** At this moment, not all functionality in `cfa_azure` is available in the `local` submodule, but there is enough for a standard workflow to be ran locally.\r\n\r\n# Module Tree\r\n```\r\n|cfa_azure\r\n    | clients\r\n        | AzureClient\r\n    | automation\r\n    | batch_helpers\r\n    | blob_helpers\r\n    | helpers\r\n    | local\r\n        | clients\r\n            | AzureClient\r\n        | automation\r\n        | helpers\r\n```\r\n\r\n## clients\r\nClasses:\r\n- AzureClient: a client object used for interacting with Azure. It initializes based on a supplied configuration file and creates various Azure clients under the hood. It can be used to upload containers, upload files, run jobs, and more.\r\n\r\n### Logging\r\nTo customize the logging capabilities of cfa_azure, two environment variables can be set. These are LOG_LEVEL and LOG_OUTPUT.\r\n\r\nLOG_LEVEL: sets the logging level. Choices are:\r\n- debug\r\n- info\r\n- warning\r\n- error\r\n\r\nLOG_OUTPUT: sets the output of the logs. Choices are:\r\n- file: saves log output to a file, nested within a ./logs/ folder\r\n- stdout: saves log output to stdout\r\n- both: saves log output to both file and stdout\r\n\r\n**Example**:\r\nRun the following in the terminal in which `cfa_azure` will be run.\r\n```bash\r\nexport LOG_LEVEL=\"info\"\r\nexport LOG_OUTPUT=\"stdout\"\r\n```\r\n\r\n\r\n### Using Various Credential Methods\r\n\r\nWhen instantiating a AzureClient object, there is an option to set which `credential_method` to use. Previously, only a service principal could be used. Now, there are three an options to choose `identity`, `sp`, or `env`.\r\n- `identity`: Uses the managed identity associated with the VM where the code is running.\r\n- `sp`: Uses a service principal for the credential. The following values must be set in the configuration file: tenant_id, sp_application_id, and the corresponding secret fetched from Azure Key Vault.\r\n- `env`: Uses environment variables to create the credential. When choosing `env`, the following environment variables will need to be set: `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, and `AZURE_CLIENT_SECRET`.\r\n\r\nYou can also use `use_env_vars=True` to allow the configuration to be loaded directly from environment variables, which may be helpful in containerized environments.\r\n\r\nBy default, the managed identity option will be used. In whichever credential method is used, a secret is pulled from the key vault using the credential to create a secret client credential for interaction with various Azure services.\r\n\r\n**Example:**\r\n```python\r\nfrom cfa_azure.clients import AzureClient\r\n\r\n# Using Managed Identity\r\nclient = AzureClient(config_path=\"./configuration.toml\", credential_method=\"identity\")\r\n\r\n# Using Service Principal (credentials from the config file)\r\nclient = AzureClient(config_path=\"./configuration.toml\", credential_method=\"sp\")\r\n\r\n# Using Environment Variables\r\nimport os\r\nos.environ[\"AZURE_TENANT_ID\"] = \"your-tenant-id\"\r\nos.environ[\"AZURE_CLIENT_ID\"] = \"your-client-id\"\r\nos.environ[\"AZURE_CLIENT_SECRET\"] = \"your-client-secret\" #pragma: allowlist secret\r\nclient = AzureClient(credential_method=\"env\", use_env_vars=True)\r\n```\r\n\r\n### Persisting stdout and stderr to Blob Storage\r\n\r\nIn certain situations, it is beneficial to save the stdout and stderr from each task to Blob Storage (like when using autoscale pools). It is possible to persist these to Blob Storage by specifying the blob container name in the `save_logs_to_blob` parameter when using `client.add_job()`. *Note that the blob container specified must be mounted to the pool being used for the job.\r\n\r\nFor example, if we would like to persist stdout and stderr to the blob container \"input-test\" for a job named \"persisting_test\", we would use the following code:\r\n```python\r\nclient.add_job(\"persisting_test\", save_logs_to_blob = \"input-test\")\r\n```\r\n\r\n### Availability Zones\r\n\r\nTo make use of Azure's availability zone functionality there is a parameter available in the `set_pool_info()` method called `availability_zones`. To use availability zones when building a pool, set this parameter to True. If you want to stick with the default Regional configuration, this parameter can be left out or set to False. Turn availability zone on like the following:\r\n```python\r\nclient.set_pool_info(\r\n  ...\r\n  availability_zones = True,\r\n  ...\r\n)\r\n```\r\n\r\n### Updated Base Container Image\r\n\r\nThe original base Ubuntu image used for Azure Batch nodes was Ubuntu 20.04, which is deprecated effective April 2025. There is a new image provided by default from `microsoft-dsvm`, which runs Ubuntu 22.04 for container workloads. This new image supports high performance compute (HPC) VMs as well as a limited number of non-HPC VMs. Going forward, `cfa_azure` will only support the creation of pools with the new `microsoft-dsvm` image.\r\nThe following non-HPC VMs can be used with the updated image:\r\n- d2s_v3\r\n- d4s_v3\r\n- d4d_v5\r\n- d4ds_v5\r\n- d8s_v3\r\n- d16s_v3\r\n- d32s_v3\r\n- e8s_v3\r\n- e16s_v3\r\n\r\nThere may be other compatible VMs as well, but note that the A-series VMs are no longer compatible.\r\n\r\n**Note:** all pools will need to be updated to the newer image by mid-April 2025.\r\n\r\n\r\n### Configuration\r\nAn AzureClient object can be instantiated and initialized with pool, mounted containers and container registries using a configuration file. This is especially useful if the same pool will be used for running multiple batch jobs and experiments. Use the following example to create a configuration file:\r\n\r\n[Configuration File](examples/client_configuration.toml)\r\n\r\n\r\nAfter creating the configuration file (e.g. client_configuration.toml), then use the following snippet to initialize the AzureClient object:\r\n```python\r\n  client = AzureClient(\"./client_configuration.toml\")\r\n```\r\n\r\n### AzureClient Methods\r\n- `create_pool`: creates a new Azure batch pool using default autoscale mode\r\n  **Example:**\r\n  ```python\r\n  client = AzureClient(\"./configuration.toml\")\r\n  client.create_pool(\"my-test-pool\")\r\n  ```\r\n- `download_job_stats`: downloads a csv of job statistics for the specified job in its current state, to the specified file_name if provided (without the .csv extension). If no file_name is provided, the csv is downloaded to {job_id}-stats.csv. There is also a parameter in the `monitor_job()` method with the same name that, when set to True, will save the job statistics when the job completes. Examples:\r\n```python\r\nclient.download_job_stats(job_id = \"example-job-name\", file_name = \"test-job-stats\")\r\n\r\nclient.monitor_job(job_id = \"example-job-name\", download_job_stats = True)\r\n```\r\n- `update_containers`: modifies the containers mounted on an existing Azure batch pool. It essentially recreates the pool with new mounts. Use force_update=True to recreate the pool without waiting for running tasks to complete.\r\n- `upload_files_to_container`: uploads files from a specified folder to an Azure Blob container. It also includes options like `force_upload` to allow or deny large file uploads without confirmation.\r\n  **Example:**\r\n```python\r\nclient.upload_files_to_container(\r\n    folder_names=[\"/path/to/folder\"],\r\n    input_container_name=\"my-input-container\",\r\n    blob_service_client=client.blob_service_client,\r\n    force_upload=True\r\n)\r\n```\r\n- `update_scale_settings`: modifies the scaling mode (fixed or autoscale) for an existing pool\r\n **Example:**\r\n  ```python\r\n  # Specify new autoscale formula that will be evaluated every 30 minutes\r\n  client.scaling = \"autoscale\"\r\n  client.update_scale_settings(\r\n      pool_name=\"my-test-pool\",\r\n      autoscale_formula_path=\"./new_autoscale_formula.txt\",\r\n      evaluation_interval=\"PT30M\"\r\n  )\r\n\r\n  # Set the pool name property to avoid sending pool_name parameter on every call to update_scale_settings\r\n  client.pool_name = \"my-test-pool\"\r\n\r\n  # Use default 15 minute evaluation interval\r\n  client.update_scale_settings(autoscale_formula_path=\"./new_autoscale_formula.txt\")\r\n\r\n  # Switch to fixed scaling mode with 10 on-demand EC2 nodes and requeuing of current jobs\r\n  client.scaling = \"fixed\"\r\n  client.update_scale_settings(dedicated_nodes=10, node_deallocation_option='Requeue')\r\n\r\n  # Switch to fixed scaling mode with 15 spot EC2 nodes and forced termination of current jobs\r\n  client.update_scale_settings(low_priority_nodes=15, node_deallocation_option='Terminate')\r\n  ```\r\n- `update_containers`: modifies the containers mounted on an existing Azure batch pool. It essentially recreates the pool with new mounts.\r\n **Example:**\r\n  ```python\r\n  # First create a pool\r\n  client = AzureClient(\"./configuration.toml\")\r\n  client.set_input_container(\"some-input-container\")\r\n  client.set_output_container(\"some-output-container\")\r\n  client.create_pool(pool_name=\"my-test-pool\")\r\n\r\n  # Now change the containers mounted on this pool\r\n  client.update_containers(\r\n      pool_name=\"my-test-pool\",\r\n      input_container_name=\"another-input-container\",\r\n      output_container_name=\"another-output-container\",\r\n      force_update=False\r\n  )\r\n  ```\r\n  If all the nodes in pool were idle when update_containers() method was invoked, Azure Batch service will recreate the pool with new containers mounted to /input and /output paths respectively. However, if any nodes in pool were in Running state, then the following error shall be displayed:\r\n\r\n  *There are N compute nodes actively running tasks in pool. Please wait for jobs to complete or retry with `force_update=True`.*\r\n\r\n  As the message suggests, you can either wait for existing jobs to complete in the pool and retry the `update_containers()` operation. Or you can change the `force_update` parameter to `True and re-run the `update_containers()` operation to immediately recreate the pool with new containers.\r\n\r\n### Running Jobs and Tasks\r\n - `add_task`: adds task to existing job in pool. You can also specify which task it depends on.  By default, dependent tasks will only run if the parent task succeeds. However, this behavior can be overridden by specifying `run_dependent_tasks_on_fail=True` on the parent task. When this property is set to True, any runtime failures in parent task will be ignored. However, execution of dependent tasks will only begin after completion (regardless of success or failure) of the parent task.\r\n\r\n **Example:** Run tasks in parallel without any dependencies.\r\n  ```python\r\n  task_1 = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"docker\", \"command\"],  # replace with actual command\r\n  )\r\n  task_2 = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"other\", \"docker\", \"command\"], # replace with actual command\r\n  )\r\n  ```\r\n**Example:** Run tasks sequentially and terminate the job if parent task fails\r\n  ```python\r\n  parent_task = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"docker\", \"command\"],  # replace with actual command\r\n  )\r\n  child_task = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"other\", \"docker\", \"command\"], # replace with actual command\r\n      depends_on=parent_task,\r\n  )\r\n  ```\r\n**Example:** Run tasks sequentially with 1-to-many dependency. Run the child tasks even if parent task fails.\r\n  ```python\r\n  parent_task = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"docker\", \"command\"],  # replace with actual command\r\n      run_dependent_tasks_on_fail=True,\r\n  )\r\n  child_task_1 = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"other\", \"docker\", \"command\"], # replace with actual command\r\n      depends_on=parent_task,\r\n  )\r\n  child_task_2 = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"another\", \"docker\", \"command\"], # replace with actual command\r\n      depends_on=parent_task,\r\n  )\r\n  ```\r\n**Example:** Create many-to-1 dependency with 2 parent tasks that run before child task. Second parent task is optional: job should not terminate if it fails.\r\n  ```python\r\n  parent_task_1 = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"docker\", \"command\"],  # replace with actual command\r\n  )\r\n  parent_task_2 = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"some\", \"other\", \"docker\", \"command\"],  # replace with actual command\r\n      run_dependent_tasks_on_fail=True,\r\n  )\r\n  child_task = client.add_task(\r\n      \"test_job_id\",\r\n      docker_cmd=[\"another\", \"docker\", \"command\"], # replace with actual command\r\n      depends_on=[parent_task_1, parent_task_2]\r\n  )\r\n  ```\r\n\r\n  **Example**: Use integer values for task IDs and specify dependent tasks as a range.\r\n  ```python\r\n  #create job\r\nclient.add_job(job_id = \"task_dep_range\",task_id_ints = True)\r\n#submit tasks\r\nfor item in range(20):\r\n    \u003csubmit tasks\u003e\r\n\r\n#add dependent task which depends on tasks 1 to 20.\r\nclient.add_task(\"python3 some_cmd.py\", depends_on_range = (1, 20))\r\n  ```\r\n\r\n### Running Tasks from Yaml\r\nTasks can also be added to a job based on a yaml file containing various parameters and flags. The yaml is parsed into command line arguments and appended to a base command to be used as the docker command in Azure Batch. The yaml/argument parsing utilizes [pygriddler](https://github.com/CDCgov/pygriddler). The basic structure for this method is `client.add_tasks_from_yaml(job_id, base_cmd, file_path)`.\r\nFor example, a yaml called params.yaml that has the following structure\r\n```yaml\r\nbaseline_parameters:\r\n  p_infected_initial: 0.001\r\n\r\ngrid_parameters:\r\n  scenario: [pessimistic, optimistic]\r\n  run: [1, 2, 3]\r\n\r\nnested_parameters:\r\n  - scenario: pessimistic\r\n    R0: 4.0\r\n    p_infected_initial: 66\r\n    infectious_period: 2.0\r\n    infer(flag): x\r\n    run_checks(flag): x\r\n  - scenario: optimistic\r\n    R0: 2.0\r\n    infectious_period: 0.5\r\n```\r\nrun with the following method\r\n```python\r\nclient.add_tasks_from_yaml(job_id = \"args_example\",\r\n  base_cmd = \"python3 main.py\",\r\n  file_path = \"params.yaml\"\r\n  )\r\n```\r\nwill produce 6 tasks with the following docker_cmds passed to Batch:\r\n```bash\r\n'python3 main.py  --scenario pessimistic --run 1 --p_infected_initial 66 --R0 4.0 --infectious_period 2.0 --infer --run_checks'\r\n'python3 main.py  --scenario pessimistic --run 2 --p_infected_initial 66 --R0 4.0 --infectious_period 2.0 --infer --run_checks'\r\n'python3 main.py  --scenario pessimistic --run 3 --p_infected_initial 66 --R0 4.0 --infectious_period 2.0 --infer --run_checks'\r\n'python3 main.py  --scenario optimistic --run 1 --p_infected_initial 0.001 --R0 2.0 --infectious_period 0.5'\r\n'python3 main.py  --scenario optimistic --run 2 --p_infected_initial 0.001 --R0 2.0 --infectious_period 0.5'\r\n'python3 main.py  --scenario optimistic --run 3 --p_infected_initial 0.001 --R0 2.0 --infectious_period 0.5'\r\n```\r\n\r\n### Download Blob Files After Job Completes\r\nSometimes there will be outputs from a job that you know will need to be downloaded locally. This can be accomplished by using the `download_after_job()` method. It accepts `job_id`, `blob_paths`, `target`, and `container_name` as parameters. This method should be placed at the end of your script after submitting tasks so that it monitors the job and downloads the specified output when the tasks finish running. Blob paths can be directories or specific file paths. The contents of a director will be downloaded keeping the structure of the directory that exists in Blob Storage.\r\nExample:\r\n```python\r\nclient.download_after_job(\r\n  job_id = \"sample_job\",\r\n  blob_paths = [\"folder1\", \"folder2/subfolder\", file.txt\"],\r\n  target = \"dload\",\r\n  container_name = \"output-test\"\r\n)\r\n```\r\n\r\n### Run DAGs\r\nAn instance of the AzureClient can run DAGs in a user-specified job. It takes in Task objects from the `batch` module, along with a job_id and other `add_task()` parameters. It determines which order to submit the tasks and sets the appropriate dependencies in Azure Batch. See [the DAGs documentation](/examples/DAGs/README.md) for more information.\r\n\r\n## automation\r\nPlease view [this documentation](docs/automation_README.md) on getting started with the `automation` module.\r\n\r\n## local\r\nPlease view [this documentation](docs/local_README.md) for more information regarding the `local` module.\r\n\r\n## Helper functions\r\nThe CFA Azure library provides a collection of functions that help manage Azure Batch, Blob Storage, Identity Management and Configuration. These functions have been grouped into 3 different modules: `batch_helpers`, `blob_helpers` and `helpers`. In the following sections, each module and its functions are described.\r\n\r\n### batch_helpers\r\nThe `batch_helpers` module provides a collection of functions that helps manage Azure Batch resources and perform key tasks. Below is an expanded overview of each function.\r\n\r\n#### Batch Helpers Functions\r\n- `check_pool_exists`: checks if a specified pool exists in Azure Batch\r\n```python\r\ncheck_pool_exists(\"resource_group_name\", \"account_name\", \"pool_name\", batch_mgmt_client)\r\n```\r\n- `create_batch_pool`: creates a Azure Batch Pool based on info using the provided configuration details\r\n```python\r\ncreate_batch_pool(batch_mgmt_client, pool_config)\r\n```\r\n- `delete_pool`: deletes the specified pool from Azure Batch\r\n```python\r\ndelete_pool(\"resource_group_name\", \"account_name\", \"pool_name\", batch_mgmt_client)\r\n```\r\n- `generate_autoscale_formula`: generates a generic autoscale formula for use based on a specified maximum number of nodes\r\n```python\r\ngenerate_autoscale_formula(max_nodes=8)\r\n```\r\n- `get_autoscale_formula`: finds and reads `autoscale_formula.txt` from working directory or subdirectory\r\n```python\r\nget_autoscale_formula(filepath=\"/path/to/formula.txt\")\r\n```\r\n- `get_batch_mgmt_client`: creates a Batch Management Client for interacting with Azure Batch, such as pools and jobs\r\n```python\r\nbatch_mgmt_client = get_batch_mgmt_client(config, DefaultAzureCredential())\r\n```\r\n- `get_batch_pool_json`: creates a dict based on config for configuring an Azure Batch pool\r\n```python\r\npool_config = get_batch_pool_json(\"input-container\", \"output-container\", config)\r\n```\r\n- `get_deployment_config`: retrieves deployment configuration for Azure Batch pool, including container registry settings and optional HPC image\r\n```python\r\nget_deployment_config(\"container_image_name\", \"container_registry_url\", \"container_registry_server\", config, DefaultAzureCredential())\r\n```\r\n- `get_network_config`: gets the network configuration based on the config information\r\n```python\r\nget_network_config(config: str)\r\n```\r\n- `get_pool_full_info`: retrieves the full information of a specified Azure Batch pool\r\n```python\r\nget_pool_full_info(\"resource_group_name\", \"account_name\", \"pool_name\", batch_mgmt_client)\r\n```\r\n- `get_pool_info`: gets the basic information for a specified Azure Batch pool\r\n```python\r\nget_pool_info(\"resource_group_name\", \"account_name\", \"pool_name\", batch_mgmt_client)\r\n```\r\n- `get_pool_mounts`: lists all mounted Blob containers for a given Azure Batch pool\r\n```python\r\nget_pool_mounts(\"pool_name\", \"resource_group_name\", \"account_name\", batch_mgmt_client)\r\n```\r\n- `get_rel_mnt_path`: retrieves the relative mount path for a specified Blob container in an Azure Batch pool\r\n```python\r\nget_rel_mnt_path(\"blob_name\", \"pool_name\", \"resource_group_name\", \"account_name\", batch_mgmt_client)\r\n```\r\n- `get_user_identity`: retrieves the user identity based on the provided config information\r\n```python\r\nget_user_identity(config)\r\n```\r\n\r\n### blob_helpers\r\nThe `blob_helpers` module provides a collection of functions that helps manage Azure Blob Storage resources and perform key tasks. Below is an expanded overview of each function.\r\n\r\n#### Blob Helpers Functions\r\n- `blob_glob`: provides an iterator over all files within specified Azure Blob Storage location that match the specified prefix.\r\n```python\r\nblob_glob(\"blob_url\", \"account_name\", \"container_name\", \"container_client\")\r\n```\r\n- `blob_search`: provides an iterator over all files within specified Azure Blob Storage location that match the specified prefix and file pattern. It can optionally take a sort key.\r\n```python\r\nblob_search(\"blob_url\", \"account_name\", \"container_name\", \"container_client\")\r\nblob_search(\"blob_url\", \"account_name\", \"container_name\", \"container_client\", \"sort_key\")\r\n```\r\n**Example: List Azure blob files from a folder**\r\n```python\r\nfrom cfa_azure.blob_helpers import blob_glob\r\nfor blob in blob_glob(\"src/dynode/mechanistic*.py\", account_name='cfaazurebatchprd', container_name='input'):\r\n    print(blob)\r\n\r\n# sort all files within input/ folder by last_modified date and display name\r\nfor blob in blob_glob('input/', account_name='cfaazurebatchprd', container_name='input-test', sort_key='last_modified'):\r\n    print(blob['name'])\r\n\r\n# sort all markdown files by last_modified date and display name\r\nfor blob in blob_glob('*.md', account_name='cfaazurebatchprd', container_name='input-test', sort_key='last_modified'):\r\n    print(blob['name'])\r\n```\r\n```\r\n- `read_blob_stream`: reads file from specified path in Azure Storage and return its contents as bytes without mounting the container to a local filesystem\r\n```python\r\nread_blob_stream(\"blob_url\", \"account_name\", \"container_name\", \"container_client\")\r\n```\r\n**Example: Read Azure blob file into Polars or Pandas data frames**\r\n```python\r\nfrom cfa_azure.blob_helpers import read_blob_stream\r\ndata_stream = read_blob_stream(\"input/AZ.csv\", account_name='cfaazurebatchprd', container_name='input-test')\r\n\r\n# Read into Polars dataframe\r\nimport polars\r\ndf = polars.read_csv(data_stream.readall())\r\nprint(df)\r\n\r\n# Read into Pandas dataframe\r\nimport pandas\r\ndf = pandas.read_csv(data_stream)\r\nprint(df)\r\n\r\n# Read large file into Pandas dataframe within chunking\r\nimport pandas\r\nchunk_size=1000      # 1000 rows at a time\r\nfor chunk in pd.read_csv(data_stream, chunksize=chunk_size):\r\n    print(chunk)\r\n```\r\n- `write_blob_stream`: write bytes to a file in specified path\r\n```python\r\nwrite_blob_stream(\"data\", \"blob_url\", \"account_name\", \"container_name\", \"container_client\")\r\n```\r\n**Example: Write Polars or Pandas dataframe into Azure blob storage**\r\n```python\r\nfrom cfa_azure.blob_helpers import write_blob_stream\r\n\r\n# Write Polars dataframe\r\nimport polars\r\ndf = .... # Read some data into Polars dataframe\r\nblob_url = \"input/AZ_03072025_a.csv\"\r\nwrite_blob_stream(df.write_csv().encode('utf-8'), blob_url=blob_url, account_name='cfaazurebatchprd', container_name='input-test')\r\n\r\n# Write Pandas dataframe\r\nimport pandas\r\ndf = .... # Read some data into Pandas dataframe\r\ndata = df.to_csv(index=False).encode('utf-8')\r\nblob_url = \"input/AZ_03072025_a.csv\"\r\nwrite_blob_stream(data, blob_url=blob_url, account_name='cfaazurebatchprd', container_name='input-test')\r\n```\r\n- `check_blob_existence`: checks whether a blob exists in the specified container\r\n```python\r\ncheck_blob_existence(c_client, \"blob_name\")\r\n```\r\n- `check_virtual_directory_existence`: checks whether any blobs exist with the specified virtual directory path\r\n```python\r\ncheck_virtual_directory_existence(c_client, \"vdir_path\")\r\n```\r\n- `create_blob_containers`: uses create_container() to create input and output containers in Azure Blob\r\n```python\r\ncreate_blob_containers(blob_service_client, \"input-container\", \"output-container\")\r\n```\r\n- `delete_blob_snapshots`: deletes a blob and all its snapshots in a container\r\n```python\r\ndelete_blob_snapshots(\"blob_name\", \"container_name\", blob_service_client)\r\n```\r\n- `delete_blob_folder`: deletes all blobs in a specified folder in a container\r\n```python\r\ndelete_blob_folder(\"folder_path\", \"container_name\", blob_service_client)\r\n```\r\n- `download_file`: downloads a file from Azure Blob storage to a specified location\r\n```python\r\ndownload_file(c_client, \"src_path\", \"dest_path\")\r\n```\r\n- download_directory: downloads a directory using prefix matching from Azure Blob storage\r\n```python\r\ndownload_directory(\"container_name\", \"src_path\", \"dest_path\", blob_service_client, include_extensions=\".txt\", verbose=True)\r\n```\r\n- `format_extensions`: formats file extensions into a standard format for use\r\n```python\r\nformat_extensions([\".txt\", \"jpg\"])\r\n```\r\n- `get_blob_service_client`: creates a Blob Service Client for interacting with Azure Blob\r\n```python\r\nblob_service_client = get_blob_service_client(config, DefaultAzureCredential())\r\n```\r\n- `list_blobs_flat`: lists all blobs in a specified container\r\n```python\r\nlist_blobs_flat(\"container_name\", blob_service_client)\r\n```\r\n- `list_containers`: lists the containers in Azure Blob Storage Account\r\n```python\r\nlist_containers(blob_service_client)\r\n```\r\n- `upload_blob_file`: uploads a specified file to Azure Blob storage\r\n```python\r\nupload_blob_file(\"file_path\", location=\"folder/subfolder\", container_client=container_client, verbose=True)\r\n```\r\n- `upload_files_in_folder`: uploads all files in specified folder to the specified container\r\n```python\r\nupload_files_in_folder(\"/path/to/folder\", \"container-name\", blob_service_client)\r\n```\r\n\r\n### helpers\r\nThe `helpers` module provides a collection of functions that helps manage Azure resources and perform key tasks, such as interacting with configuration management, and data transformations. Below is an expanded overview of each function.\r\n\r\n#### Helpers Functions\r\n- `read_config`: reads in a configuration toml file and returns it as a Python dictionary\r\n```python\r\nread_config(\"/path/to/config.toml\")\r\n```\r\n- `create_container`: creates an Azure Blob container if it doesn't already exist\r\n```python\r\ncreate_container(\"my-container\", blob_service_client)\r\n```\r\n- `get_sp_secret`: retrieves the user's service principal secret from the key vault based on the provided config file\r\n```python\r\nget_sp_secret(config, DefaultAzureCredential())\r\n```\r\n- `get_sp_credential`: retrieves the service principal credential\r\n```python\r\nget_sp_credential(config)\r\n```\r\n- `get_batch_service_client`: creates a Batch Service Client object for interacting with Batch jobs\r\n```python\r\nbatch_client = get_batch_service_client(config, DefaultAzureCredential())\r\n```\r\n- `add_job`: creates a new job to the specified Azure Batch pool. By default, a job remains active after completion of enclosed tasks. You can optionally specify the *mark_complete_after_tasks_run* argument to *True* if you want job to auto-complete after completion of enclosed tasks.\r\n```python\r\nadd_job(\"job-id\", \"pool-id\", True, batch_client)\r\n```\r\n- `add_task_to_job`: adds a task to the specified job based on user-input Docker command\r\n```python\r\nadd_task_to_job(\"job-id\", \"task-id\", \"docker-command\", batch_client)\r\n```\r\n- `monitor_tasks`: monitors the tasks running in a job\r\n```python\r\nmonitor_tasks(\"example-job-id\", batch_client)\r\n```\r\n- `list_files_in_container`: lists out all files stored in the specified Azure container\r\n```python\r\nlist_files_in_container(container_client)\r\n```\r\n- `df_to_yaml`: converts a pandas dataframe to yaml file, which is helpful for configuration and metadata storage\r\n```python\r\ndf_to_yaml(dataframe, \"output.yaml\")\r\n```\r\n- `yaml_to_df`: converts a yaml file to pandas dataframe\r\n```python\r\nyaml_to_df(\"input.yaml\")\r\n```\r\n- `edit_yaml_r0`: takes in a YAML file and produces replicate YAML files with the `r0` changed based on the specified range (i.e. start, stop, and step)\r\n```python\r\nedit_yaml_r0(\"input.yaml\", start=1, stop=5, step=1)\r\n```\r\n- `get_log_level`: retrieves the logging level from environment variables or defaults to debug\r\n```python\r\nget_log_level()\r\n```\r\n- `check_autoscale_parameters`: checks which arguments are incompatible with the provided scaling mode\r\n```python\r\ncheck_autoscale_parameters(\"autoscale\", dedicated_nodes=5)\r\n```\r\n- `get_rel_mnt_path`: retrieves the relative mount path for a specified Blob container in an Azure Batch pool\r\n```python\r\nget_rel_mnt_path(\"blob_name\", \"pool_name\", \"resource_group_name\", \"account_name\", batch_mgmt_client)\r\n```\r\n- `check_env_req`: checks if all necessary environment variables exist for the Azure client\r\n```python\r\ncheck_env_req()\r\n```\r\n- `check_config_req`:checks if the provided configuration file contains all necessary components for the Azure client\r\n```python\r\ncheck_config_req(config)\r\n```\r\n- `get_container_registry_client`: retrieves a Container Registry client for Azure\r\n```python\r\nget_container_registry_client(\"endpoint\", DefaultAzureCredential(), \"audience\")\r\n```\r\n- `check_azure_container_exists`: checks if a container with the specified name, repository, and tag exists in Azure Container Registry\r\n```python\r\ncheck_azure_container_exists(\"registry_name\", \"repo_name\", \"tag_name\", DefaultAzureCredential())\r\n```\r\n- `format_rel_path`: formats a given relative path by removing the leading slash if present\r\n```python\r\nformat_rel_path(\"/path/to/resource\")\r\n```\r\n- `get_timeout`: converts a given duration string (in ISO 8601 format) to minutes\r\n```python\r\nget_timeout(\"PT1H30M\")\r\n```\r\n- `check_job_exists`: checks whether a job with the specified ID exists in Azure Batch\r\n```python\r\ncheck_job_exists(\"job_id\", batch_client)\r\n```\r\n- `get_completed_tasks`: returns the number of completed tasks for the specified job\r\n```python\r\nget_completed_tasks(\"job_id\", batch_client)\r\n```\r\n- `check_job_complete`: checks if the specified job is complete\r\n```python\r\ncheck_job_complete(\"job_id\", batch_client)\r\n```\r\n- `get_job_state`: returns the state of the specified job, such as 'completed' or 'active'\r\n```python\r\nget_job_state(\"job_id\", batch_client)\r\n```\r\n- `package_and_upload_dockerfile`: packages a Dockerfile and uploads it to the specified registry and repo with the designated tag\r\n```python\r\npackage_and_upload_dockerfile(\"registry_name\", \"repo_name\", \"tag\", use_device_code=True)\r\n```\r\n- `upload_docker_image`: uploads a Docker image to a specified Azure Container Registry repo with an optional tag\r\n```python\r\nupload_docker_image(\"image_name\", \"registry_name\", \"repo_name\", tag=\"latest\", use_device_code=False)\r\n```\r\n\r\n## Common Use Case Scenarios\r\n\r\n**Example Workflow**: Uploading files to Blob Storage, creating an Azure Batch Pool, adding jobs, and monitoring tasks.\r\n\r\n```python\r\n# Step 1: Read configuration\r\nconfig = read_config(\"config.toml\")\r\n\r\n# Step 2: Create Blob containers\r\nblob_service_client = get_blob_service_client(config, DefaultAzureCredential())\r\ncreate_blob_containers(blob_service_client, \"input-container\", \"output-container\")\r\n\r\n# Step 3: Upload files to the container\r\nupload_files_in_folder(\"/path/to/folder\", \"input-container\", blob_service_client)\r\n\r\n# Step 4: Create an Azure Batch Pool\r\nbatch_mgmt_client = get_batch_mgmt_client(config, DefaultAzureCredential())\r\npool_config = get_batch_pool_json(\"input-container\", \"output-container\", config)\r\ncreate_batch_pool(batch_mgmt_client, pool_config)\r\n\r\n# Step 5: Create a job and add tasks\r\nbatch_client = get_batch_service_client(config, DefaultAzureCredential())\r\nadd_job(\"job-id\", \"pool-id\", True, batch_client)\r\nadd_task_to_job(\"job-id\", \"task-id\", \"docker command\", batch_client)\r\n\r\n# Step 6: Monitor the tasks\r\nmonitor_tasks(\"job-id\", batch_client)\r\n```\r\n\r\n## Public Domain Standard Notice\r\nThis repository constitutes a work of the United States Government and is not\r\nsubject to domestic copyright protection under 17 USC § 105. This repository is in\r\nthe public domain within the United States, and copyright and related rights in\r\nthe work worldwide are waived through the [CC0 1.0 Universal public domain dedication](https://creativecommons.org/publicdomain/zero/1.0/).\r\nAll contributions to this repository will be released under the CC0 dedication. By\r\nsubmitting a pull request you are agreeing to comply with this waiver of\r\ncopyright interest.\r\n\r\n## License Standard Notice\r\nThe repository utilizes code licensed under the terms of the Apache Software\r\nLicense and therefore is licensed under ASL v2 or later.\r\n\r\nThis source code in this repository is free: you can redistribute it and/or modify it under\r\nthe terms of the Apache Software License version 2, or (at your option) any\r\nlater version.\r\n\r\nThis source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY\r\nWARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A\r\nPARTICULAR PURPOSE. See the Apache Software License for more details.\r\n\r\nYou should have received a copy of the Apache Software License along with this\r\nprogram. If not, see http://www.apache.org/licenses/LICENSE-2.0.html\r\n\r\nThe source code forked from other open source projects will inherit its license.\r\n\r\n## Privacy Standard Notice\r\nThis repository contains only non-sensitive, publicly available data and\r\ninformation. All material and community participation is covered by the\r\n[Disclaimer](DISCLAIMER.md)\r\nand [Code of Conduct](code-of-conduct.md).\r\nFor more information about CDC's privacy policy, please visit [http://www.cdc.gov/other/privacy.html](https://www.cdc.gov/other/privacy.html).\r\n\r\n## Contributing Standard Notice\r\nAnyone is encouraged to contribute to the repository by [forking](https://help.github.com/articles/fork-a-repo) or creating a new branch\r\nand submitting a pull request. (If you are new to GitHub, you might start with a\r\n[basic tutorial](https://help.github.com/articles/set-up-git).) By contributing\r\nto this project, you grant a world-wide, royalty-free, perpetual, irrevocable,\r\nnon-exclusive, transferable license to all users under the terms of the\r\n[Apache Software License v2](http://www.apache.org/licenses/LICENSE-2.0.html) or\r\nlater.\r\n\r\nAll comments, messages, pull requests, and other submissions received through\r\nCDC including this GitHub page may be subject to applicable federal law, including but not limited to the Federal Records Act, and may be archived. Learn more at [http://www.cdc.gov/other/privacy.html](http://www.cdc.gov/other/privacy.html).\r\n\r\nHelp make this package/repo more robust and stable by creating issues as you see fit. Please use the following issues template as an outline for your issue: [issue template](.github/ISSUE_TEMPLATE/cfa_azure_issue_template.md)\r\n\r\n## Records Management Standard Notice\r\nThis repository is not a source of government records, but is a copy to increase\r\ncollaboration and collaborative potential. All government records will be\r\npublished through the [CDC web site](http://www.cdc.gov).\r\n\r\n## Additional Standard Notices\r\nPlease refer to [CDC's Template Repository](https://github.com/CDCgov/template) for more information about [contributing to this repository](https://github.com/CDCgov/template/blob/main/CONTRIBUTING.md), [public domain notices and disclaimers](https://github.com/CDCgov/template/blob/main/DISCLAIMER.md), and [code of conduct](https://github.com/CDCgov/template/blob/main/code-of-conduct.md).\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdcgov%2Fcfa_azure","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcdcgov%2Fcfa_azure","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcdcgov%2Fcfa_azure/lists"}