{"id":44979623,"url":"https://github.com/edanalytics/ea_airflow_util","last_synced_at":"2026-02-18T18:03:13.590Z","repository":{"id":64846088,"uuid":"465044203","full_name":"edanalytics/ea_airflow_util","owner":"edanalytics","description":"Utilities for Airflow projects","archived":false,"fork":false,"pushed_at":"2025-11-19T15:53:28.000Z","size":371,"stargazers_count":4,"open_issues_count":9,"forks_count":0,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-11-19T17:25:41.796Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/edanalytics.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-03-01T20:21:20.000Z","updated_at":"2025-06-13T19:12:23.000Z","dependencies_parsed_at":"2024-02-07T20:28:39.322Z","dependency_job_id":"0b057eb7-d5e2-435c-b463-7f41c838b0ad","html_url":"https://github.com/edanalytics/ea_airflow_util","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"purl":"pkg:github/edanalytics/ea_airflow_util","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edanalytics%2Fea_airflow_util","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edanalytics%2Fea_airflow_util/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edanalytics%2Fea_airflow_util/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edanalytics%2Fea_airflow_util/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/edanalytics","download_url":"https://codeload.github.com/edanalytics/ea_airflow_util/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/edanalytics%2Fea_airflow_util/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29588777,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T16:55:40.614Z","status":"ssl_error","status_checked_at":"2026-02-18T16:55:37.558Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-02-18T18:03:12.882Z","updated_at":"2026-02-18T18:03:13.582Z","avatar_url":"https://github.com/edanalytics.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Overview\n`ea_airflow_util` contains additional Airflow functionality used within EDU that falls outside the scope of `edu_edfi_airflow`.\n\n\n# Callables\nVarious Airflow callables have been defined in bespoke submodules under `ea_airflow_util.callables`.\nThese have been outlined below.\n\n## airflow\nAirflow utility helpers that are used for argument-passing and parameter-checking in DAGs\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\nWhen importing this submodule, be careful not to overwrite `airflow` in your namespace!\n```python\n# Do not do this!  This will overwrite `import airflow`.\nfrom ea_airflow_util.callables import airflow\n\n# Use one of these instead!\nfrom ea_airflow_util.callables import airflow as airflow_util\nfrom ea_airflow_util.callables.airflow import xcom_pull_template\n```\n\n### xcom_pull_template(task_ids, key)\nBuild an `xcom_pull` string for passing arguments between tasks.\nEither a task-ID or task operator can be passed.\nThe default return key `return_value` is the final return value of the operator.\n\n### skip_if_not_in_params_list(param_name, value)\nVerify whether a value is defined in a passed parameter list, and raise an `AirflowSkipException` otherwise.\nRaise an error if the parameter is not defined.\n\n-----\n\n\u003c/details\u003e\n\n\n## casing\nString-casing callables (i.e., snake_casing or CamelCasing)\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### snake_case(string)\nConvert a string to `string_case`.\n\n### record_to_string_case(record)\nConvert the keys of a JSON record into snake_case.\nRaise an error if a name-collision occurs after formatting.\n\n-----\n\n\u003c/details\u003e\n\n\n## ftp\nFTP- and SFTP-utility helpers\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### download_all(ftp_conn_id, remote_dir, local_dir, endswith)\nDownload all files from an FTP to disk, optionally filtering on file-extension endings.\n\n-----\n\n\u003c/details\u003e\n\n\n## gsheets\nGoogle-sheets authentication and parsing helpers\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### get_google_client_from_airflow(gcp_conn_id, key_field)\nCreate a Google Sheets client populated with key data in an Airflow connection.\nThe key data can be saved in a separate, linked file, or as a JSON structure in the connection.\nThe Airflow connection key field can be specified; otherwise, both will be tried.\n\n### get_google_spreadsheet_by_url(google_cloud_client, google_sheets_url)\nCall the Google Sheets API and retrieve a Spreadsheet based on a given URL.\nIf API Rate Limit has been reached, use Truncated exponential backoff strategy to retry.\n\n### parse_google_worksheet(worksheet)\nParse a gspread worksheet and retrieve the relevant data.\n\n### get_worksheet_from_google_spreadsheet(spreadsheet, sheet_index, sheet_name)\nParse a Google spreadsheet and return a specific worksheet by index or name.\nIf neither is specified, retrieve the zeroth worksheet.\n\n### get_and_serialize_google_survey_url_to_jsonl(gcp_conn_id, survey_url, output_dir)\nUnified method for retrieving data from a Google survey and writing to disk as JSON lines.\n\n-----\n\n\u003c/details\u003e\n\n\n## jsonl\nJSON utility helpers. Most Airflow tasks write data to disk and database as JSON lines.\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### serialize_json_records_to_disk(json_records, output_path, **kwargs)\nWrite an iterator of dictionaries to an output path as JSON lines.\nOptional arguments customize the output.\n\n\n### translate_csv_file_to_jsonl(local_path, output_path, **kwargs)\nTransform a CSV file to JSON lines.\nIf output_path is not specified, rewrite the CSV file with a .jsonl file extension.\nOptional arguments customize the output.\n\n-----\n\n\u003c/details\u003e\n\n\n## s3\nHelpers for getting data to S3\n\nNote: There are some SQL/Snowflake callables that interface with S3.\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### disk_to_s3(s3_conn_id, local_path, base_dir, bucket, delete_local, **kwargs)\nUpload local files to S3.\nOptional arguments apply schema-checking and path-mutation.\n\n### list_s3_keys(s3_hook, s3_bucket, s3_key)\nInternal utility function for listing S3 keys.\nNote: this method uses a pre-instantiated S3 hook instead of a connection ID.\n\n-----\n\n\u003c/details\u003e\n\n\n## sharefile\nHelpers when interfacing with Sharefile\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### list_sharefile_objects(sharefile_conn_id, remote_dir)\nList object names in a specified Sharefile directory.\n\n### sharefile_to_disk(sharefile_conn_id, sharefile_path, local_path, ds_nodash, ts_nodash, delete_remote=False, file_pattern=None, recursive=True)\nTransfer all files from a ShareFile folder to a local date-stamped directory, optionally deleting the remote copy.\n\n### disk_to_sharefile(sf_conn_id, sf_folder_path, local_path)\nPost a file or the contents of a directory to the specified Sharefile folder\n\n### s3_to_sharefile(s3_conn_id, s3_key, sf_conn_id, sf_folder_path):\nCopy a single file from S3 to Sharefile\n\n-----\n\n\u003c/details\u003e\n\n\n## slack\nThis package contains several callback functions which can be used with Slack webhooks to alert at task failures or successes, or when SLAs are missed.\nEach function takes the Slack Airflow connection ID as their primary argument.\nThe contents of the callback messages are filled automatically via the DAG run context.\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\nAirflow callbacks only accept expected arguments, not kwargs.\nBecause these custom Slack callback functions expect the additional argument `http_conn_id`, this argument must be filled before applying the callbacks to the DAG.\nThis can be done using the `functools.partial()` function, as follows:\n```python\nfrom functools import partial\n\non_failure_callback = partial(slack_alert_failure , http_conn_id=HTTP_CONN_ID)\non_success_callback = partial(slack_alert_success , http_conn_id=HTTP_CONN_ID)\nsla_miss_callback   = partial(slack_alert_sla_miss, http_conn_id=HTTP_CONN_ID)\n```\n\n### slack_alert_failure()\n\u003e🔴 Task Failed.  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}  \n\n### slack_alert_success()\n\u003e✔ Task Succeeded.  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}  \n\n### slack_alert_sla_miss()\n\u003e🆘 **SLA has been missed.**  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}\n\nNote, due to different definitions of task-failure/success callbacks and SLA callbacks, `Log Url` is unavailable in SLA callback messages.\nThis will be investigated further and patched in a future update.\n\n### slack_alert_download_failure(remote_path, local_path, error)\n\u003e🔴 File did not download  \n**Remote Path**: {remote_path}  \n**Local Path**: {local_path}  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}  \n**Error**: {error}\n\n### slack_alert_s3_upload_failure(local_path, file_key, error)\n\u003e🔴 File did not upload to S3  \n**File Path**: {local_path}  \n**File Key**: {file_key}  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}  \n**Error**: {error}  \n\n### slack_alert_insert_failure(file_key, table, error)\n\u003e🔴 File did not insert to database  \n**File Key**: {file_key}  \n**Dest Table**: {table}  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}  \n**Error**: {error}\n\n### slack_alert_file_format_failure(local_path, file_type, cols_expected, cols_found)\n\u003e🔴 File did not match expected spec  \n**File Path**: {local_path}  \n**File Type**: {file_type}  \n**Exp. Cols**: {cols_expected}  \n**Found Cols**: {cols_found}  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}\n\n### slack_alert_match_spec_failure(local_path, error)\n\u003e🔴 File did not match file spec  \n**File Path**: {local_path}  \n**Task**: {task_id}  \n**Dag**: {dag_id}  \n**Execution Time**: {logical_date}  \n**Log Url**: {log_url}  \n**Error**: {error}\n\n-----\n\n\u003c/details\u003e\n\n\n## snowflake\nHelpers for getting data out of and into Snowflake\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### snowflake_to_disk(snowflake_conn_id, query, local_path, **kwargs)\nCopy data from Snowflake to local disk using a passed query.\nOptional arguments alter formatting and chunking when writing to disk.\n\n-----\n\n\u003c/details\u003e\n\n\n## sql\nHelpers for getting data out of and into different (non-Snowflake) SQL dialects\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### mssql_to_disk(conn_string, tables, local_path)\nCopy data from MySQL to local disk.\n\n### s3_to_postgres(pg_conn_id, s3_conn_id, dest_table, column_customization, options, s3_key, s3_region, **kwargs)\nCopy data from an S3 filepath into Postgres.\nOptional arguments alter table clean-up and import logic.\n\n### s3_dir_to_postgres(pg_conn_id, s3_conn_id, dest_table, column_customization, options, s3_key, s3_region, **kwargs)\nCopy all files from an S3 directory into Postgres.\nOptional arguments alter table clean-up and import logic.\n\n-----\n\n\u003c/details\u003e\n\n\n## ssm\nSSM ParameterStore helpers for extracting parameter strings from AWS.\nThis code is used exclusively in `AWSParamStoreToAirflowDAG`.\n\n\n## variable\nUtility methods for checking and updating Airflow variables\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### update_variable(var, value)\nUpdate an Airflow variable with the specified value.\nA callable can be passed in `value` to update the variable in-place.\n\n### check_variable(var, condition, force)\nCompare the current value of a variable against a passed boolean condition.\nRaise an `AirflowSkipException` if the result is False.\nAlways succeed if `force is True`. \n\n-----\n\n\u003c/details\u003e\n\n\n## zip\nCompressed-file utility methods\n\n\u003cdetails\u003e\n\u003csummary\u003eSee more:\u003c/summary\u003e\n\n-----\n\n### extract_zips(local_dir, extract_dir, filter_lambda, remove_zips)\nExtract zip files from a local_dir to an extract_dir, optionally filtering on filepath.\n\n-----\n\n\u003c/details\u003e\n\n\n\n# DAGs\nAll DAGs defined in this package utilize the `EACustomDAG` behind the scenes.\nThis means that unconventional DAG arguments like `slack_conn_id` can be passed to any DAG.\n\n## EACustomDAG\nThis is a DAG factory that pre-instantiates default arguments and UDMs used across our projects.\nBy default, `max_active_runs` is set to 1, and catchup arguments are turned off.\nAny non-standard DAG-kwargs are ignored.\n\nIf a Slack connection ID is passed through `slack_conn_id`, failure and SLA callbacks are automatically instantiated.\nThis argument can also be accessed in UDMs under the key `slack_conn_id`.\n\n\n## AirflowDBCleanDAG\nThe Airflow database backend does not remove historic records by default.\nThis DAG removes data older than a specified number of retention days.\nNote that the DAG errors when attempting to remove data newer than 30 days.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n Argument       | Description                                                           |\n|----------------|-----------------------------------------------------------------------|\n| retention_days | number of days of log-data to preserve (default `90`)                 |\n| dry_run        | whether to complete a dry-run instead of a real run (default `False`) |\n| verbose        | whether to turn on verbose logging (default `False`)                  |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\n\u003c/details\u003e\n\n\n\n## RunDbtDag\n`RunDbtDag` is an Airflow DAG that completes a full DBT run with optional post-run behavior.\nSeed tables are fully refreshed, all models are run, and all tests are tested.\nThis emulates the behavior of a `dbt build` call, but with more control over parameters and failure states.\n\nIf all tests succeed, schemas are optionally swapped (e.g. from `rc` to `prod`).\nAdditionally, DBT artifacts are optionally uploaded using the [Brooklyn Data dbt_artifacts](https://github.com/brooklyn-data/dbt_artifacts) `upload_dbt_artifacts_v2` operation.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument                    | Description                                                                                            |\n|-----------------------------|--------------------------------------------------------------------------------------------------------|\n| environment                 | environment name for the DAG label                                                                     |\n| dbt_repo_path               | path to the project `/dbt` folder                                                                      |\n| dbt_target_name             | name of the DBT target to select                                                                       |\n| dbt_bin_path                | path to the environment `/dbt` folder                                                                  |\n| full_refresh                | boolean flag for whether to apply the `--full-refresh` flag to incremental models (default `False`)    |\n| full_refresh_schedule       | Cron schedule for when to automatically kick off a full refresh run                                    |\n| opt_swap                    | boolean flag for whether to swap target schema with `opt_dest_schema` after each run (default `False`) |\n| opt_dest_schema             | optional destination schema to swap target schema with if `opt_swap=True`                              |\n| opt_swap_target             | target used to rerun views if `opt_swap=True` (default `opt_dest_schema)                               |\n| upload_artifacts            | boolean flag for whether to upload DBT artifacts at the end of the run (default `False`)               |\n| dbt_incrementer_var         | optional Airflow variable to increment after successful `dbt run`                                       |\n| trigger_dags_on_run_success | optional list of dags to be triggered by a successful dbt_run                                          |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\n\u003c/details\u003e\n\n![RunDbtDag](./images/RunDbtDag.png)\n\n\n\n## UpdateDbtDocsDag\n`UpdateDbtDocsDag` is an Airflow DAG that generates the three [DBT docs](https://docs.getdbt.com/reference/commands/cmd-docs) metadata files and uploads them to a bucket on AWS S3.\nIf an AWS Cloudfront instance is pointed to this S3 bucket, a static website is built that is identical to the one generated by `dbt docs generate`.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument            | Description                                                                                       |\n|---------------------|---------------------------------------------------------------------------------------------------|\n| dbt_repo_path       | path to the project `/dbt` folder                                                                 |\n| dbt_target_name     | name of the DBT target to select                                                                  |\n| dbt_bin_path        | path to the environment `/dbt` folder                                                             |\n| dbt_docs_s3_conn_id | S3 Airflow connection ID where S3 bucket to upload DBT documentations files is defined in `schema` |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\n\u003c/details\u003e\n\n![UpdateDbtDocsDag](./images/UpdateDbtDocsDag.png)\n\n\n## DbtSnapshotDag\nDAG to run `dbt snapshot`\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument            | Description                                                                                       |\n|---------------------|---------------------------------------------------------------------------------------------------|\n| dbt_repo_path       | path to the project `/dbt` folder                                                                 |\n| dbt_target_name     | name of the DBT target to select                                                                  |\n| dbt_bin_path        | path to the environment `/dbt` folder                                                             |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\n\u003c/details\u003e\n\n\n\n## S3ToSnowflakeDag\nThis DAG transfers data from an S3 bucket location into the Snowflake raw data lake.\nIt should be used when data sources are not available from an Ed-Fi ODS but need to be brought into the data warehouse.\n\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument               | Description                                                                                      |\n|------------------------|--------------------------------------------------------------------------------------------------|\n| tenant_code            | ODS-tenant representation to be saved in Snowflake tables                                        |\n| api_year               | ODS API-year to be saved in Snowflake tables                                                     |\n| snowflake_conn_id      | Airflow connection with Snowflake credentials                                                    |\n| database               | database in which tables are found                                                               |\n| schema                 | schema in which tables are found                                                                 |\n| data_source            | table data source to copy data into (`{data_source}__{resource_name}`)                           |\n| resource_names         | array of table resource names to copy data into (`{data_source}__{resource_name}`)                   |\n| transform_script       | additional transformations to complete on data before transfer to Snowflake                      |\n| s3_source_conn_id      | Airflow connection with S3 source credentials                                                    |\n| s3_dest_conn_id        | Airflow connection with S3 destination credentials                                               |\n| s3_dest_file_extension | new file extension under which to save transformed data                                          |\n| pool                   | Airflow pool to use for copying tasks                                                            |\n| full_replace           | boolean flag for whether to delete all data from the table before copying over (default `False`) |\n| do_delete_from_source  | boolean flag for whether to delete the data after copying over (default `True`)                  |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\n\u003c/details\u003e\n\n\n\n## SFTPToSnowflakeDag\nThis DAG transfers data from an SFTP source into the Snowflake raw data lake.\nIt should be used when data sources are not available from an Ed-Fi ODS but need to be brought into the data warehouse.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument               | Description                                                                                   |\n|------------------------|-----------------------------------------------------------------------------------------------|\n| s3_conn_id             | Airflow connection with S3 credentials                                                        |\n| snowflake_conn_id      | Airflow connection with Snowflake credentials                                                 |\n| database               | database in which tables are found                                                            |\n| schema                 | schema in which tables are found                                                              |\n| pool                   | Airflow pool to use for copying tasks                                                         |\n| do_delete_from_local   | boolean flag for whether to delete the data from the SFTP after copying over (default `True`) |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\n\u003c/details\u003e\n\n\n\n## AWSParamStoreToAirflowDAG\nThe Cloud Engineering and Integration team saves Ed-Fi ODS credentials as parameters in AWS Systems Manager Parameter Store.\nEach Stadium implementation has a shared SSM-prefix, which is further delineated by tenant-code and/or API year.\nThere are three parameters associated with each ODS-connection:\n```text\n{SSM_PREFIX}/{TENANT_CODE}/key\n{SSM_PREFIX}/{TENANT_CODE}/secret\n{SSM_PREFIX}/{TENANT_CODE}/url\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument            | Description                                                                                                     |\n|---------------------|-----------------------------------------------------------------------------------------------------------------|\n| region_name         | AWS region where parameters are stored                                                                          |\n| connection_mapping  | Optional one-to-one mapping between Parameter Store prefixes and ODS credentials                                |\n| prefix_year_mapping | Optional mapping between a shared SSM-prefix and a given Ed-Fi year for dynamic connections                     |\n| tenant_mapping      | Optional mapping between tenant-code name in Parameter Store and its identity in Stadium in dynamic connections |\n| join_numbers        | Optional boolean flag to strip underscores between district and number in dynamic connections (default `True`)  |\n\nAdditional `EACustomDAG` arguments (e.g. `slack_conn_id`) can be passed as kwargs.\n\n-----\n\nThere are three types of mappings that can be defined in the Parameter Store DAG.\nArguments `connection_mapping` and `prefix_year_mapping` are mutually-exclusive.\nArgument `tenant_mapping` is optional, and is only applied if `prefix_year_mapping` is defined.\n\nIn Stadium implementations with fewer tenants, it is suggested to manually map the `{SSM_PREFIX}/{TENANT_CODE}` strings to their Ed-Fi connection name in Airflow using `connection_mapping`.\nFor example:\n```python\nconnection_mapping = {\n    '/startingblocks/api/2122/sc-state': 'edfi_scde_2022',\n    '/startingblocks/api/2223/sc-state': 'edfi_scde_2023',\n    '/startingblocks/api/sc/state-2324': 'edfi_scde_2024',\n}\n```\n\nIn Stadium implementations with many tenants, an explicit one-to-one mapping between prefixes and connections may be untenable.\nIn cases like these, the `prefix_year_mapping` argument maps shared SSM-prefixes to API years and dynamically builds Airflow credentials.\nFor example:\n```python\nprefix_year_mapping = {\n    '/startingblocks/api/districts-2122': 2022,\n    '/startingblocks/api/sc/districts-2223': 2023,\n}\n```\n\nConnection pieces between the prefixes and `url`, `key`, and `secret` are assumed to be tenant-codes, and connections are built dynamically.\nSome standardization is always applied to inferred tenant-codes: spaces and dashes are converted to underscores.\n\nHowever, in the case that the dynamically-inferred tenant-code does not match its identity in Stadium, the `tenant_mapping` can be used to force a match.\nFor example:\n```python\ntenant_mapping = {\n    'fortmill': 'fort_mill',\n    'york-4'  : 'fort_mill',\n}\n```\n\nUsing the example `prefix_year_mapping` and `tenant_mapping` defined above on the following Parameter Store keys will create a single Airflow connection: `edfi_fort_mill_2023`.\n```text\n/startingblocks/api/sc/districts-2223/fortmill/url\n/startingblocks/api/sc/districts-2223/fortmill/key\n/startingblocks/api/sc/districts-2223/fortmill/secret\n```\n\nFinally, there is an optional boolean argument `join_numbers` that is turned on by default.\nWhen true, dynamically-inferred tenant-codes are standardized further to remove underscores between district name and code.\nFor example, `york_1` becomes `york1`.\n\nWhen tenant-identification is not the penultimate element of the path, use the string `{tenant_code}` to automatically infer it for the mapping.\nFor example, `/ed-fi/apiClients/districts-2425-ds5/{tenant_code}/prod/Stadium` will find parameters that match the path shape, but will label paths based on the inferred `tenant_code`.\n\n\u003c/details\u003e\n\n\n\n# Providers\nFinally, this package contains a handful of custom DBT operators to be used as an alternative to PythonOperators.\n\n## LoopS3FileTransformOperator\nThis operator extends Airflow's built-in `S3FileTransformOperator` to iterate over multiple files.\nIn addition, the new `dest_s3_file_extension` argument provides greater transparency in output type.\nSee [parent documentation](https://airflow.apache.org/docs/apache-airflow/1.10.13/_api/airflow/operators/s3_file_transform_operator/index.html) for more information.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument               | Description                                                                                   |\n|------------------------|-----------------------------------------------------------------------------------------------|\n| source_s3_keys         | array of S3 filepaths to transform                                                            |\n| dest_s3_prefix         | destination S3 filepath in which to save transformed files (default: original filepath)       |\n| dest_s3_file_extension | new file extension to give transformed files (default: original extension)                    |\n| select_expression      | S3 select expression                                                                          |\n| transform_script       | location of the executable transformation script                                              |\n| script_args            | optional arguments to pass to the transformation script                                       |\n| source_aws_conn_id     | source s3 connection                                                                          |\n| source_verify          | whether to verify SSL certificates for S3 connection (default: SSL certificates are verified) |\n| dest_aws_conn_id       | destination s3 connection                                                                     |\n| dest_verify            | whether to verify SSL certificates for S3 connection (default: SSL certificates are verified) |\n| replace                | replace destination S3 key if it already exists (default `True`)                              |\n\nAdditional Airflow operator args and kwargs can be passed during initialization.\n\n-----\n\n\u003c/details\u003e\n\n\n\n## DbtRunOperationOperator\nThis operator overrides `DbtBaseOperator` to allow us to pass the `--args` flag to `run-operation`.\n\nThis operation is the equivalent of `dbt run-operation {op_name} --args '{json.dumps(arguments)}'`\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument  | Description                                   |\n|-----------|-----------------------------------------------|\n| op_name   | name of the DBT macro to run in the operation |\n| arguments | argument dictionary to pass to the macro      |\n\nAdditional Airflow operator args and kwargs can be passed during initialization.\n\n-----\n\n\u003c/details\u003e\n\n\n\n## SFTPHook\nThis hook overrides `SSHHook` to interact with FTPs and SFTPs.\nSee [parent documentation](https://airflow.apache.org/docs/apache-airflow-providers-ssh/stable/_api/airflow/providers/ssh/hooks/ssh/index.html) for input arguments and usage.\n\n## SharefileHook\nThis hook is built for interacting with ShareFile servers.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument          | Description                                            |\n|-------------------|--------------------------------------------------------|\n| sharefile_conn_id | name of the Airflow connection with ShareFile metadata |\n\nNote that the connection in Airflow must be configured in an unusual way:\n- Host should be the API endpoint\n- Schema should be the authentication URL\n- Login/Password are filled out as normal\n- Extra should be a dictionary structured as follows:\n    ```{\"grant_type\": \"password\", \"client_id\": client_id, \"client_secret\": client_secret}```\n\n-----\n\nMethods:\n- get_conn()\n- download(item_id, local_path)\n- upload_file(folder_id, local_file)\n- folder_id_from_path(folder_path)\n- delete(item_id)\n- get_path_id(path)\n- item_info(id)\n- find_files(folder_id)\n- find_folders(folder_id)\n- get_access_controls(item_id)\n- get_user(user_id)\n- get_children(item_id)\n- file_to_memory(item_id)\n- download_to_disk(item_id, local_path)\n\n\u003c/details\u003e\n\n\n\n## SharefileToDiskOperator\nThis operator transfers all files from a ShareFile folder to a local date-stamped directory, optionally deleting the remote copy.\n\n\u003cdetails\u003e\n\u003csummary\u003eArguments:\u003c/summary\u003e\n\n-----\n\n| Argument          | Description                                                          |\n|-------------------|----------------------------------------------------------------------|\n| sharefile_conn_id | name of the Airflow connection with ShareFile metadata               |\n| sharefile_path    | the root directory to transfer                                       |\n| local_path        | local path to stream ShareFile files into                            |\n| delete_remote     | boolean flag to delete original files on ShareFile (default `False`) |\n\nAdditional Airflow operator args and kwargs can be passed during initialization.\n\n-----\n\n\u003c/details\u003e\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedanalytics%2Fea_airflow_util","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fedanalytics%2Fea_airflow_util","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fedanalytics%2Fea_airflow_util/lists"}