{"id":19977790,"url":"https://github.com/uipath/processmining-pm-utils","last_synced_at":"2025-08-18T03:38:01.217Z","repository":{"id":37861405,"uuid":"458252349","full_name":"UiPath/ProcessMining-pm-utils","owner":"UiPath","description":"Utility functions for process mining related dbt projects.","archived":false,"fork":false,"pushed_at":"2025-05-21T14:26:14.000Z","size":237,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-07-27T18:15:56.499Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UiPath.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-11T16:03:03.000Z","updated_at":"2025-05-21T14:14:37.000Z","dependencies_parsed_at":"2025-07-27T18:04:20.210Z","dependency_job_id":null,"html_url":"https://github.com/UiPath/ProcessMining-pm-utils","commit_stats":null,"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"purl":"pkg:github/UiPath/ProcessMining-pm-utils","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UiPath%2FProcessMining-pm-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UiPath%2FProcessMining-pm-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UiPath%2FProcessMining-pm-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UiPath%2FProcessMining-pm-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UiPath","download_url":"https://codeload.github.com/UiPath/ProcessMining-pm-utils/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UiPath%2FProcessMining-pm-utils/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270940594,"owners_count":24671676,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-18T02:00:08.743Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T03:29:17.228Z","updated_at":"2025-08-18T03:38:01.183Z","avatar_url":"https://github.com/UiPath.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# pm-utils\nUtility functions for process mining related dbt projects.\n\n## Installation instructions\nSee the instructions *How do I add a package to my project?* in the [dbt documentation](https://docs.getdbt.com/docs/building-a-dbt-project/package-management). The pm-utils is a public git repository, so the packages can be installed using the git syntax:\n\n```\npackages:\n  - git: \"https://github.com/UiPath/ProcessMining-pm-utils.git\"\n    revision: [tag name of the release]\n```\n\nThis package contains some date/datetime conversion macros. You can override the default format that is used in the macros by defining variables in your `dbt_project.yml`. The following shows an example configuration of the possible date and time formatting variables and their default values:\n\n```\nvars:\n  # Date and time formats.\n  # For SQL Server defined by integers and for Snowflake defined by strings.\n  date_format: 23     # default: SQL Server: 23, Snowflake: 'YYYY-MM-DD'\n  time_format: 14     # default: SQL Server: 14, Snowflake: 'hh24:mi:ss.ff3'\n  datetime_format: 21 # default: SQL Server: 21, Snowflake: 'YYYY-MM-DD hh24:mi:ss.ff3'\n```\n\n## Contents\nThis dbt package contains macros for SQL functions to run the dbt project on multiple databases. The databases that are currently supported are Snowflake and SQL Server.\n\n- [SQL generators](#SQL-generators)\n  - [id](#id-source)\n  - [mandatory](#mandatory-source)\n  - [optional](#optional-source)\n  - [optional_table](#optional_table-source)\n  - [star](#star-source)\n- [Data type cast functions](#Data-type-cast-functions)\n  - [as_varchar](#as_varchar-source)\n  - [to_boolean](#to_boolean-source)\n  - [to_date](#to_date-source)\n  - [to_double](#to_double-source)\n  - [to_integer](#to_integer-source)\n  - [to_timestamp](#to_timestamp-source)\n  - [to_varchar](#to_varchar-source)\n- [Date time functions](#Date-time-functions)\n  - [date_from_timestamp](#date_from_timestamp-source)\n  - [dateadd](#dateadd-source)\n  - [datediff](#datediff-source)\n  - [diff_weekdays](#diff_weekdays-source)\n  - [timestamp_from_date](#timestamp_from_date-source)\n  - [timestamp_from_parts](#timestamp_from_parts-source)\n- [String functions](#String-functions)\n  - [charindex](#charindex-source)\n  - [concat](#concat-source)\n  - [json](#json-source)\n- [Aggregate functions](#Aggregate-functions)\n  - [stddev](#stddev-source)\n  - [string_agg](#string_agg-source)\n- [Tests](#Tests)\n  - [test_equal_rowcount](#test_equal_rowcount-source)\n  - [test_exists](#test_exists-source)\n  - [test_not_negative](#test_not_negative-source)\n  - [test_not_null](#test_not_null-source)\n  - [test_one_column_not_null](#test_one_column_not_null-source)\n  - [test_unique_combination_of_columns](#test_unique_combination_of_columns-source)\n  - [test_unique](#test_unique-source)\n- [Post hooks](#Post-hooks)\n  - [create_index](#create_index-source)\n  - [record_count](#record_count-source)\n\n### SQL generators\n\n#### id ([source](macros/SQL_generators/id.sql))\nThis macro generates an id field that can be used as a column for the current model.\n\nUsage:\n`{{ pm_utils.id() }}`\n\n#### mandatory ([source](macros/SQL_generators/mandatory.sql))\nUse this macro for the mandatory columns in your source tables. Use the optional argument `data_type` to indicate the data type of the column. Possible values are: `boolean`, `date`, `double`, `integer`, `datetime`, and `text`. When no data type is set, the column is considered to be text.\n\nYou can also set `id` as data type, which expects the values to be integers.\n\nUsage:\n`{{ pm_utils.mandatory(source('source_name', 'table_name'), '\"Column_A\"', 'data_type') }}`\n\nTo keep the SQL in the model more readable, you can define a Jinja variable for the reference to the source table:\n\n`{% set source_table = source('source_name', 'table_name') %}`\n\nVariables:\n- date_format\n- datetime_format\n\nThese variables are only required when the `data_type` is used with the values `date` or `datetime`.\n\n#### optional ([source](macros/SQL_generators/optional.sql))\nThis macro checks in a table whether a column is present. If the column is not present, it creates the column with `null` values. If the column is present, it selects the column from the table. Use this macro to allow for missing columns in your source tables when that data is optional. Use the optional argument `data_type` to indicate the data type of the column. Possible values are: `boolean`, `date`, `double`, `integer`, `datetime`, and `text`. When no data type is set, the column is considered to be text.\n\nYou can also set `id` as data type, which creates the column with unique integer values if the column is not present or when it only contains null values. If the column is present, the values are expected to be integers.\n\nUsage:\n`{{ pm_utils.optional(source('source_name', 'table_name'), '\"Column_A\"', 'data_type') }}`\n\nAlternatively, you can use this macro for non-source data. Use instead of the source function the ref function: `ref(table_name)`. In that case, data type casting is not applied.\n\nTo keep the SQL in the model more readable, you can define a Jinja variable for the reference to the source table:\n\n`{% set source_table = source('source_name', 'table_name') %}`\n\nVariables:\n- date_format\n- datetime_format\n\nThese variables are only required when the `data_type` is used with the values `date` or `datetime`.\n\n#### optional_table ([source](macros/SQL_generators/optional_table.sql))\nThis macro checks whether the source table is present. If the table is not present, it creates a table without records in your target schema. If the table is present, it selects the table from the source schema. Use this macro to allow for missing source tables when that data is optional.\n\nUsage:\n`{{ pm_utils.optional_table(source('source_name', 'table_name')) }}`\n\nNote: you can only apply the macro for source tables in combination with the `optional()` macro applied to all its fields.\n\n#### star ([source](macros/SQL_generators/star.sql))\nThis macro generates a select statement of all fields that are available on the given relation. This relation can be a source or a model in the dbt project. Optionally, you can provide a list of fields as the second argument that need to be excluded from the select statement.\n\nYou can choose to exclude fields from the select statement, for example:\n- When you don't want to expose a field on next transformation steps.\n- When you apply logic to a field and don't want to keep the original.\n- When you join tables and a field with the same name is available on multiple tables.\n\nMake sure to put the relation also in the from clause. Otherwise, the table from which you select can't be found.\n\nUsage:\n\nSelect all fields from model `Table_A`.\n```\nselect\n    {{ pm_utils.star(ref('Table_A')) }}\nfrom {{ ref('Table_A') }}\n```\n\nSelect all fields from source `Table_A`.\n```\nselect\n    {{ pm_utils.star(source('sources', 'Table_A')) }}\nfrom source('sources', 'Table_A')\n```\n\nSelect all fields from `Table_A`, except for the field `Creation_date`. More fields can be added to the except list. Additional select statements can be written before and after the `star()` macro by separating the statements with a comma.\n```\nselect\n    {{ pm_utils.star(ref('Table_A'), except=['Creation_date']) }},\n    {{ pm_utils.to_date('\"Creation_date\"') }} as \"Creation_date\"\nfrom {{ ref('Table_A') }}\n```\n\n### Data type cast functions\n\n#### as_varchar ([source](macros/data_type_cast_functions/as_varchar.sql))\nThis macro converts a string to the data type `nvarchar(2000)` for SQL Server. Use the macro `to_varchar()` to convert a field to this data type.\n\nUsage: \n`{{ pm_utils.as_varchar('[expression]') }}`\n\n#### to_boolean ([source](macros/data_type_cast_functions/to_boolean.sql))\nThis macro converts a field to a boolean field.\n\nUsage: \n`{{ pm_utils.to_boolean('[expression]') }}`\n\n#### to_date ([source](macros/data_type_cast_functions/to_date.sql))\nThis macro converts a field to a date field. The expression can be in a date or a datetime format.\n\nUsage: \n`{{ pm_utils.to_date('[expression]') }}`\n\nVariables:\n- date_format\n\n#### to_double ([source](macros/data_type_cast_functions/to_double.sql))\nThis macro converts a field to a double field.\n\nUsage: \n`{{ pm_utils.to_double('[expression]') }}`\n\n#### to_integer ([source](macros/data_type_cast_functions/to_integer.sql))\nThis macro converts a field to an integer field.\n\nUsage: \n`{{ pm_utils.to_integer('[expression]') }}`\n\n#### to_timestamp ([source](macros/data_type_cast_functions/to_timestamp.sql))\nThis macro converts a field to a timestamp field. \n\nUsage: \n`{{ pm_utils.to_timestamp('[expression]') }}`\n\nVariables:\n- datetime_format\n\n#### to_varchar ([source](macros/data_type_cast_functions/to_varchar.sql))\nThis macro converts a field to the data type `nvarchar(2000)` for SQL Server. Use the macro `as_varchar()` to convert a string to the this data type.\n\nUsage: \n`{{ pm_utils.to_varchar('[expression]') }}`\n\n### Date time functions\n\n#### date_from_timestamp ([source](macros/date_time_functions/date_from_timestamp.sql))\nThis macro extracts the date part from a datetime field. \n\nUsage: \n`{{ pm_utils.date_from_timestamp('[expression]') }}`\n\n#### dateadd ([source](macros/date_time_functions/dateadd.sql))\nThis macro adds the specified number of units for the `datepart` to a date or datetime expression. The `datepart` can be any of the following values: year, quarter, month, week, day, hour, minute, second, millisecond. The number of units will be interpreted as an integer value.\n\nUsage: \n`{{ pm_utils.dateadd('[datepart]', '[number]', '[date_expression]') }}`\n\n#### datediff ([source](macros/date_time_functions/datediff.sql))\nThis macro computes the difference between two date or datetime expressions based on the specified `datepart` and returns an integer value. The datepart can be any of the following values: year, quarter, month, week, day, hour, minute, second, millisecond. Weeks are defined from Sunday to Saturday.\n\nUsage: \n`{{ pm_utils.datediff('[datepart]', '[start_date_expression]', '[end_date_expression]') }}`\n\n#### diff_weekdays ([source](macros/date_time_functions/diff_weekdays.sql))\nThis macro computes the number of days between a start and end date. It returns one day when the start and end date are on the same date. The Saturdays and Sundays are excluded from the number of days.\n\nUsage: \n`{{ pm_utils.diff_weekdays('[start_date_expression]', '[end_date_expression]') }}`\n\n#### timestamp_from_date ([source](macros/date_time_functions/timestamp_from_date.sql))\nThis macro creates a timestamp based on only a date field. The time part of the timestamp is set to 00:00:00. \n\nUsage:\n`{{ pm_utils.timestamp_from_date('[expression]') }}`\n\n#### timestamp_from_parts ([source](macros/date_time_functions/timestamp_from_parts.sql))\nThis macro creates a timestamp based on a date field and string containing the time field.\n\nVariables:\n- time_format\n\nUsage: \n`{{ pm_utils.timestamp_from_parts('[date_expression]', '[time_expression]') }}`\n\n### String functions\n\n#### charindex ([source](macros/string_functions/charindex.sql))\nThis macro returns the starting position of the first occurrence of a string in another string. The search is not case-sensitive. If the string is not found, the function returns 0. This macro can be used to check whether a string contains another string.\n\nUsage: \n`{{ pm_utils.charindex('[expression_to_find]', '[field]', '[start_location]') }}`\n\n#### concat ([source](macros/string_functions/concat.sql))\nThis macro concatenates two or more strings together. In case a value is `null` it is concatenated as the empty string `''`. \n\nUsage: \n`{{ pm_utils.concat('\"Field_A\"', '\"Field_B\"') }}`\n\nTo pass a string as argument, make sure to use double quotes:\n`{{ pm_utils.concat('\"Field_A\"', \"' - '\", '\"Field_B\"') }}`\n\n#### json ([source](macros/string_functions/json.sql))\nThis macro returns the value defined in the path of a JSON string. The first argument indicates the field that stores the JSON string and the second argument is the path for which the value should be returned.\n\nUsage:\n`{{ pm_utils.json('[field]', '[path]') }}`\n\n### Aggregate functions\n\n#### stddev ([source](macros/aggregate_functions/stddev.sql))\nThis macro computes the standard deviation of a set of values, `null` values are ignored in the calculation. This macro can only be used as an aggregate function. For SQL Server, at least one of the values provided should not be `null`.\n\nUsage: \n`{{ pm_utils.stddev('[expression]') }}`\n\n#### string_agg ([source](macros/aggregate_functions/string_agg.sql))\nThis macro aggregates string fields separated by the given delimiter. If no delimiter is specified, strings are separated by a comma followed by a space. This macro can only be used as an aggregate function. For SQL Server, the maximum supported length is 2000. \n\nUsage:\n`{{ pm_utils.string_agg('[expression]', '[delimiter]') }}`\n\n### Tests\n\n#### test_equal_rowcount ([source](macros/tests/test_equal_rowcount.sql))\nThis generic test evaluates whether two models have the same number of rows.\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    tests:\n      - pm_utils.equal_rowcount:\n          compare_model: 'Model_B'\n```\n\n#### test_exists ([source](macros/tests/test_exists.sql))\nThis generic test evaluates whether a model is available or if a column is available in the model. When used to check the existence of a column, the check is only executed when the model exists to prevent the same error occurring multiple times. You should add this test on the model level whenever the existence of the model is uncertain (e.g. source tests).\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    tests:\n      - pm_utils.exists\n    columns:\n      - name: '\"Column_A\"'\n        tests:\n          - pm_utils.exists\n```\n\n#### test_not_negative ([source](macros/tests/test_not_negative.sql))\nThis generic test evaluates whether the values of the column are not negative.\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    columns:\n      - name: '\"Column_A\"'\n        tests:\n          - pm_utils.not_negative\n```\n\n#### test_not_null ([source](macros/tests/test_not_null.sql))\nThis generic test evaluates whether the values of the column are not null or empty. The test is only executed when the column exists on the table.\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    columns:\n      - name: '\"Column_A\"'\n        tests:\n          - pm_utils.not_null\n```\n\n#### test_one_column_not_null ([source](macros/tests/test_one_column_not_null.sql))\nThis generic test evaluates whether exactly one out of the specified columns does contain a value. This test can be defined by two or more columns. The test is only executed when all columns exist on the table.\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    tests:\n      - pm_utils.one_column_not_null:\n          columns:\n            - 'Column_A'\n            - 'Column_B'\n```\n\n#### test_unique_combination_of_columns ([source](macros/tests/test_unique_combination_of_columns.sql))\nThis generic test evaluates whether the combination of columns is unique. This test can be defined by two or more columns. The test is only executed when all columns exist on the table.\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    tests:\n      - pm_utils.unique_combination_of_columns:\n          combination_of_columns:\n            - 'Column_A'\n            - 'Column_B'\n```\n\n#### test_unique ([source](macros/tests/test_unique.sql))\nThis generic test evaluates whether the values of the column are unique. The test is only executed when the column exists on the table.\n\nUsage:\n```\nmodels:\n  - name: Model_A\n    columns:\n      - name: '\"Column_A\"'\n        tests:\n          - pm_utils.unique\n```\n\n### Post hooks\n\n#### create_index ([source](macros/post_hooks/create_index.sql))\nThis macro creates a clustered columnstore index on the current model for SQL Server. This macro should be used in a dbt post-hook.\n\nUsage:\n```\n{{ config(\n    post_hook=\"{{ pm_utils.create_index() }}\"\n) }}\n```\n\nIn case you want to create the index on a source table, refer to the table using the source function in the argument. Use the macro in a pre-hook of the model where you use the source table.\n\n```\n{{ config(\n    pre_hook=\"{{ pm_utils.create_index(source('[source_name]', '[source_table]')) }}\"\n) }}\n```\n\n#### record_count ([source](macros/post_hooks/record_count.sql))\nThis macro counts the number of records in the current relation using `{{ this }}`. This macro should be used in a dbt post-hook.\nA warning or error message is logged based on whether the record count of the table exceeds the value in the variables that indicate the max record count.\n\nUsage:\n```\n{{ config(\n    post_hook=\"{{ pm_utils.record_count() }}\"\n) }}\n```\n\nVariables:\n- max_records_error\n- max_records_warning\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuipath%2Fprocessmining-pm-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fuipath%2Fprocessmining-pm-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fuipath%2Fprocessmining-pm-utils/lists"}