{"id":13595620,"url":"https://github.com/GoogleCloudPlatform/professional-services-data-validator","last_synced_at":"2025-04-09T13:32:56.711Z","repository":{"id":37066736,"uuid":"254738197","full_name":"GoogleCloudPlatform/professional-services-data-validator","owner":"GoogleCloudPlatform","description":"Utility to compare data between homogeneous or heterogeneous environments to ensure source and target tables match","archived":false,"fork":false,"pushed_at":"2024-10-28T12:20:55.000Z","size":2329,"stargazers_count":404,"open_issues_count":91,"forks_count":117,"subscribers_count":37,"default_branch":"develop","last_synced_at":"2024-10-29T15:48:10.551Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoogleCloudPlatform.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-04-10T21:18:19.000Z","updated_at":"2024-10-25T01:49:47.000Z","dependencies_parsed_at":"2022-07-12T02:49:59.759Z","dependency_job_id":"3f8e6467-232b-4c3b-8585-e4232bfecf7b","html_url":"https://github.com/GoogleCloudPlatform/professional-services-data-validator","commit_stats":{"total_commits":749,"total_committers":52,"mean_commits":"14.403846153846153","dds":0.5473965287049399,"last_synced_commit":"cc0f60a4921f2a37a9b376c7646978674c7a1dd2"},"previous_names":[],"tags_count":37,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fprofessional-services-data-validator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fprofessional-services-data-validator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fprofessional-services-data-validator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fprofessional-services-data-validator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoogleCloudPlatform","download_url":"https://codeload.github.com/GoogleCloudPlatform/professional-services-data-validator/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223394656,"owners_count":17138591,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:53.821Z","updated_at":"2025-04-09T13:32:56.691Z","avatar_url":"https://github.com/GoogleCloudPlatform.png","language":"Python","readme":"# Data Validation Tool\n\nThe Data Validation Tool is an open sourced Python CLI tool based on the\n[Ibis framework](https://ibis-project.org/docs/)\nthat compares heterogeneous data source tables with multi-leveled validation\nfunctions.\n\nData validation is a critical step in a data warehouse, database, or data lake\nmigration project where data from both the source and the target tables are\ncompared to ensure they are matched and correct after each migration step\n(e.g. data and schema migration, SQL script translation, ETL migration, etc.).\nThe Data Validation Tool (DVT) provides an automated and repeatable solution to\nperform this task.\n\nDVT supports the following validations:\n* Column validation (count, sum, avg, min, max, stddev, group by)\n* Row validation (Not supported for FileSystem connections)\n* Schema validation\n* Custom Query validation\n* Ad hoc SQL exploration\n\nDVT supports the following connection types:\n\n*   [AlloyDB](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#alloydb)\n*   [BigQuery](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#google-bigquery)\n*   [DB2](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#db2)\n*   [FileSystem](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#filesystem)\n*   [Hive](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#hive)\n*   [Impala](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#impala)\n*   [MSSQL](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#mssql-server)\n*   [MySQL](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#mysql)\n*   [Oracle](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#oracle)\n*   [Postgres](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#postgres)\n*   [Redshift](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#redshift)\n*   [Spanner](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#google-spanner)\n*   [Teradata](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#teradata)\n*   [Snowflake](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md#snowflake)\n\nThe [Connections](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md) page provides details about how to create\nand list connections for the validation tool.\n\n### Disclaimer\nThis is not an officially supported Google product. Please be aware that bugs may lurk, and that we reserve the right to make small backwards-incompatible changes. Feel free to open bugs or feature requests, or contribute directly\n(see [CONTRIBUTING.md](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/CONTRIBUTING.md) for details).\n\n## Installation\n\nThe [Installation](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/installation.md) page describes the prerequisites and\nsetup steps needed to install and use the Data Validation Tool.\n\n## Usage\n\nBefore using this tool, you will need to create connections to the source and\ntarget tables. Once the connections are created, you can run validations on\nthose tables. Validation results can be printed to stdout (default) or outputted\nto BigQuery (recommended). DVT also allows you to save and edit validation\nconfigurations in a YAML or JSON file. This is useful for running common validations or\nupdating the configuration.\n\n### Managing Connections\n\nBefore running validations, DVT requires setting up a source and target connection.\nThese connections can be stored locally or in a GCS directory. To create connections,\nplease review the [Connections](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/connections.md) page.\n\n### Running Validations\n\nThe CLI is the main interface to use this tool and it has several different\ncommands which can be used to create and run validations. Below are the command\nsyntax and options for running validations.\n\nAlternatives to running DVT in the CLI include deploying DVT to Cloud Run, Cloud Functions, or Airflow\n([Examples Here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/samples)). See the [Validation Logic](https://github.com/GoogleCloudPlatform/professional-services-data-validator#validation-logic) section\nto learn more about how DVT uses the CLI to generate SQL queries.\n\nNote that we do not support nested or complex columns for column or row validations.\n\n#### Column Validations\n\nBelow is the command syntax for column validations. To run a grouped column\nvalidation, simply specify the `--grouped-columns` flag.\n\nYou can specify a list of string columns for aggregations in order to calculate\nan aggregation over the `length(string_col)`. Similarly, you can specify timestamp/date\ncolumns for aggregation over the `unix_seconds(timestamp_col)`. Running an aggregation\nover all columns ('*') will only run over numeric columns, unless the\n`--wildcard-include-string-len` or `--wildcard-include-timestamp` flags are present.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  validate column\n  --source-conn or -sc SOURCE_CONN\n                        Source connection details\n                        See: *Data Source Configurations* section for each data source\n  --target-conn or -tc TARGET_CONN\n                        Target connection details\n                        See: *Connections* section for each data source\n  --tables-list or -tbls SOURCE_SCHEMA.SOURCE_TABLE=TARGET_SCHEMA.TARGET_TABLE\n                        Comma separated list of tables in the form schema.table=target_schema.target_table. Or shorthand schema.* for all tables.\n                        Target schema name and table name are optional.\n                        i.e 'bigquery-public-data.new_york_citibike.citibike_trips'\n  [--grouped-columns or -gc GROUPED_COLUMNS]\n                        Comma separated list of columns for Group By i.e col_a,col_b\n  [--count COLUMNS]     Comma separated list of columns for count or * for all columns\n  [--sum COLUMNS]       Comma separated list of columns for sum or * for all numeric\n  [--min COLUMNS]       Comma separated list of columns for min or * for all numeric\n  [--max COLUMNS]       Comma separated list of columns for max or * for all numeric\n  [--avg COLUMNS]       Comma separated list of columns for avg or * for all numeric\n  [--std COLUMNS]       Comma separated list of columns for stddev_samp or * for all numeric\n  [--exclude-columns or -ec]\n                        Flag to indicate the list of columns provided should be excluded and not included.\n  [--result-handler or -rh CONNECTION_NAME.SCHEMA.TABLE or BQ_PROJECT_ID.DATASET.TABLE]\n                        Specify a BigQuery or PostgreSQL connection name as destination for validation results.\n                        Also supports legacy BigQuery format BQ_PROJECT_ID.DATASET.TABLE.\n                        See: *Validation Reports* section\n  [--bq-result-handler or -bqrh PROJECT_ID.DATASET.TABLE or CONNECTION_NAME.DATASET.TABLE]\n                        This option has been deprecated and will be removed in a future release.\n  [--service-account or -sa PATH_TO_SA_KEY]\n                        Service account to use for BigQuery result handler output.\n  [--wildcard-include-string-len or -wis]\n                        If flag is present, include string columns in aggregation as len(string_col)\n  [--wildcard-include-timestamp or -wit]\n                        If flag is present, include timestamp/date columns in aggregation as unix_seconds(ts_col)\n  [--cast-to-bigint or -ctb]\n                        If flag is present, cast all int32 columns to int64 before aggregation\n  [--filters SOURCE_FILTER:TARGET_FILTER]\n                        Colon separated string values of source and target filters.\n                        If target filter is not provided, the source filter will run on source and target tables.\n                        See: *Filters* section\n  [--config-file or -c CONFIG_FILE]\n                        YAML Config File Path to be used for storing validations and other features. Supports GCS and local paths.\n                        See: *Running DVT with YAML Configuration Files* section\n  [--config-file-json or -cj CONFIG_FILE_JSON]\n                        JSON Config File Path to be used for storing validations only for application purposes.\n  [--threshold or -th THRESHOLD]\n                        Float value. Maximum pct_difference allowed for validation to be considered a success. Defaults to 0.0\n  [--labels or -l KEY1=VALUE1,KEY2=VALUE2]\n                        Comma-separated key value pair labels for the run.\n  [--format or -fmt FORMAT]\n                        Format for stdout output. Supported formats are (text, csv, json, table). Defaults to table.\n  [--filter-status or -fs STATUSES_LIST]\n                        Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail). If no list is provided, all statuses are returned.\n\n```\n\nThe default aggregation type is a 'COUNT *', which will run in addition to the validations you specify. To remove this default,\nuse [YAML configs](https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop#running-dvt-with-yaml-configuration-files).\n\nThe [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md) page provides many examples of how a tool can be used to run powerful validations without writing any queries.\n\n#### Row Validations\n\n(Note: Row hash validation not supported for FileSystem connections.\nIn addition, please note that SHA256 is not a supported function on Teradata systems.\nIf you wish to perform this comparison on Teradata you will need to\n[deploy a UDF to perform the conversion](https://github.com/akuroda/teradata-udf-sha2/blob/master/src/sha256.c).)\n\nBelow is the command syntax for row validations. In order to run row level validations we require\nunique columns to join row sets, which are either inferred from the source/target table or provided\nvia the `--primary-keys` flag, and either the `--hash`, `--concat` or `--comparison-fields` flags.\nSee *Primary Keys* section.\n\nThe `--comparison-fields` flag specifies the values (e.g. columns) whose raw values will be compared\nbased on the primary key join. The `--hash` flag will run a checksum across specified columns in\nthe table. This will include casting to string, sanitizing the data (ifnull, rtrim, upper), concatenating,\nand finally hashing the row.\n\nUnder the hood, row validation uses\n[Calculated Fields](https://github.com/GoogleCloudPlatform/professional-services-data-validator#calculated-fields) to\napply functions such as IFNULL() or RTRIM(). These can be edited in the YAML or JSON config file to customize your row validation.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  validate row\n  --source-conn or -sc SOURCE_CONN\n                        Source connection details\n                        See: *Data Source Configurations* section for each data source\n  --target-conn or -tc TARGET_CONN\n                        Target connection details\n                        See: *Connections* section for each data source\n  --tables-list or -tbls SOURCE_SCHEMA.SOURCE_TABLE=TARGET_SCHEMA.TARGET_TABLE\n                        Comma separated list of tables in the form schema.table=target_schema.target_table\n                        Target schema name and table name are optional.\n                        i.e 'bigquery-public-data.new_york_citibike.citibike_trips'\n  --comparison-fields or -comp-fields FIELDS\n                        Comma separated list of columns to compare. Can either be a physical column or an alias\n                        See: *Calculated Fields* section for details\n  --hash COLUMNS        Comma separated list of columns to hash or * for all columns\n  --concat COLUMNS      Comma separated list of columns to concatenate or * for all columns (use if a common hash function is not available between databases)\n  [--primary-keys PRIMARY_KEYS, -pk PRIMARY_KEYS]\n                        Comma separated list of primary key columns, when not specified the value will be inferred\n                        from the source or target table if available.  See *Primary Keys* section\n  [--exclude-columns or -ec]\n                        Flag to indicate the list of columns provided should be excluded from hash or concat instead of included.\n  [--result-handler or -rh CONNECTION_NAME.SCHEMA.TABLE or BQ_PROJECT_ID.DATASET.TABLE]\n                        Specify a BigQuery or PostgreSQL connection name as destination for validation results.\n                        Also supports legacy BigQuery format BQ_PROJECT_ID.DATASET.TABLE.\n                        See: *Validation Reports* section\n  [--bq-result-handler or -bqrh PROJECT_ID.DATASET.TABLE or CONNECTION_NAME.DATASET.TABLE]\n                        This option has been deprecated and will be removed in a future release.\n  [--service-account or -sa PATH_TO_SA_KEY]\n                        Service account to use for BigQuery result handler output.\n  [--filters SOURCE_FILTER:TARGET_FILTER]\n                        Colon separated string values of source and target filters.\n                        If target filter is not provided, the source filter will run on source and target tables.\n                        See: *Filters* section\n  [--config-file or -c CONFIG_FILE]\n                        YAML Config File Path to be used for storing validations and other features. Supports GCS and local paths.\n                        See: *Running DVT with YAML Configuration Files* section\n  [--config-file-json or -cj CONFIG_FILE_JSON]\n                        JSON Config File Path to be used for storing validations only for application purposes.\n  [--labels or -l KEY1=VALUE1,KEY2=VALUE2]\n                        Comma-separated key value pair labels for the run.\n  [--format or -fmt FORMAT]\n                        Format for stdout output. Supported formats are (text, csv, json, table). Defaults to table.\n  [--use-random-row or -rr]\n                        Finds a set of random rows of the first primary key supplied.\n  [--random-row-batch-size or -rbs]\n                        Row batch size used for random row filters (default 10,000).\n  [--filter-status or -fs STATUSES_LIST]\n                        Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail). If no list is provided, all statuses are returned.\n  [--trim-string-pks, -tsp]\n                        Trims string based primary key values, intended for use when one engine uses padded string semantics (e.g. CHAR(n)) and the other does not (e.g. VARCHAR(n)).\n  [--case-insensitive-match, -cim]\n                        Performs a case insensitive match by adding an UPPER() before comparison.\n```\n#### Generate Partitions for Large Row Validations\n\nWhen performing row validations, Data Validation Tool brings each row into memory and can run into MemoryError. Below is the command syntax for generating partitions in order to perform row validations on large dataset (table or custom-query) to alleviate MemoryError. Each partition contains a range of primary key(s) and the ranges of keys across partitions are distinct. The partitions have nearly equal number of rows. See *Primary Keys* section\n\nThe command generates and stores multiple YAML validations each representing a chunk of the large dataset using filters (`WHERE primary_key(s) \u003e= X AND primary_key(s) \u003c Y`) in YAML files. The parameter parts-per-file, specifies the number of validations in one YAML file. Each yaml file will have parts-per-file validations in it - except the last one which will contain the remaining partitions (i.e. parts-per-file may not divide partition-num evenly). You can then run the validations in the directory serially (or in parallel in multiple containers, VMs) with the `data-validation configs run --config-dir PATH` command as described [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator#yaml-configuration-files).\n\nThe command takes the same parameters as required for `Row Validation` *plus* a few parameters to support partitioning. Single and multiple primary keys are supported and keys can be of any indexable type, except for date and timestamp type. You can specify tables that are being validated or the source and target custom query. A parameter used in earlier versions, ```partition-key``` is no longer supported.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  generate-table-partitions\n  --source-conn or -sc SOURCE_CONN\n                        Source connection details\n                        See: *Data Source Configurations* section for each data source\n  --target-conn or -tc TARGET_CONN\n                        Target connection details\n                        See: *Connections* section for each data source\n  --tables-list or -tbls SOURCE_SCHEMA.SOURCE_TABLE=TARGET_SCHEMA.TARGET_TABLE\n                        Comma separated list of tables in the form schema.table=target_schema.target_table\n                        Target schema name and table name are optional.\n                        i.e 'bigquery-public-data.new_york_citibike.citibike_trips'\n                        Either --tables-list or --source-query (or file) and --target-query (or file) must be provided\n  --source-query SOURCE_QUERY, -sq SOURCE_QUERY\n                        Source sql query\n                        Either --tables-list or --source-query (or file) and --target-query (or file) must be provided\n  --source-query-file  SOURCE_QUERY_FILE, -sqf SOURCE_QUERY_FILE\n                        File containing the source sql command. Supports GCS and local paths.\n  --target-query TARGET_QUERY, -tq TARGET_QUERY\n                        Target sql query\n                        Either --tables-list or --source-query (or file) and --target-query (or file) must be provided\n  --target-query-file TARGET_QUERY_FILE, -tqf TARGET_QUERY_FILE\n                        File containing the target sql command. Supports GCS and local paths.\n  --comparison-fields or -comp-fields FIELDS\n                        Comma separated list of columns to compare. Can either be a physical column or an alias\n                        See: *Calculated Fields* section for details\n  --hash COLUMNS        Comma separated list of columns to hash or * for all columns\n  --concat COLUMNS      Comma separated list of columns to concatenate or * for all columns (use if a common hash function is not available between databases)\n  --config-dir CONFIG_DIR, -cdir CONFIG_DIR\n                        Directory Path to store YAML Config Files\n                        GCS: Provide a full gs:// path of the target directory. Eg: `gs://\u003cBUCKET\u003e/partitions_dir`\n                        Local: Provide a relative path of the target directory. Eg: `partitions_dir`\n                        If invoked with -tbls parameter, the validations are stored in a directory named \u003cschema\u003e.\u003ctable\u003e, otherwise the directory is named `custom.\u003crandom_string\u003e`\n  --partition-num INT, -pn INT\n                        Number of partitions into which the table should be split, e.g. 1000 or 10000\n                        In case this value exceeds the row count of the source/target table, it will be decreased to max(source_row_count, target_row_count)\n  [--primary-keys PRIMARY_KEYS, -pk PRIMARY_KEYS]\n                        Comma separated list of primary key columns, when not specified the value will be inferred\n                        from the source or target table if available.  See *Primary Keys* section\n  [--result-handler or -rh CONNECTION_NAME.SCHEMA.TABLE or BQ_PROJECT_ID.DATASET.TABLE]\n                        Specify a BigQuery or PostgreSQL connection name as destination for validation results.\n                        Also supports legacy BigQuery format BQ_PROJECT_ID.DATASET.TABLE.\n                        See: *Validation Reports* section\n  [--bq-result-handler or -bqrh PROJECT_ID.DATASET.TABLE or CONNECTION_NAME.DATASET.TABLE]\n                        This option has been deprecated and will be removed in a future release.\n  [--service-account or -sa PATH_TO_SA_KEY]\n                        Service account to use for BigQuery result handler output.\n  [--parts-per-file INT], [-ppf INT]\n                        Number of partitions in a yaml file, default value 1.\n  [--filters SOURCE_FILTER:TARGET_FILTER]\n                        Colon separated string values of source and target filters.\n                        If target filter is not provided, the source filter will run on source and target tables.\n                        See: *Filters* section\n  [--labels or -l KEY1=VALUE1,KEY2=VALUE2]\n                        Comma-separated key value pair labels for the run.\n  [--format or -fmt FORMAT]\n                        Format for stdout output. Supported formats are (text, csv, json, table). Defaults to table.\n  [--filter-status or -fs STATUSES_LIST]\n                        Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail). If no list is provided, all statuses are returned.\n  [--trim-string-pks, -tsp]\n                        Trims string based primary key values, intended for use when one engine uses padded string semantics (e.g. CHAR(n)) and the other does not (e.g. VARCHAR(n)).\n  [--case-insensitive-match, -cim]\n                        Performs a case insensitive match by adding an UPPER() before comparison.\n```\n#### Schema Validations\n\nBelow is the syntax for schema validations. These can be used to compare case insensitive column names and\ntypes between source and target.\n\nNote: An exclamation point before a data type (`!string`) signifies the column is non-nullable or required.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  validate schema\n  --source-conn or -sc SOURCE_CONN\n                        Source connection details\n                        See: *Data Source Configurations* section for each data source\n  --target-conn or -tc TARGET_CONN\n                        Target connection details\n                        See: *Connections* section for each data source\n  --tables-list or -tbls SOURCE_SCHEMA.SOURCE_TABLE=TARGET_SCHEMA.TARGET_TABLE\n                        Comma separated list of tables in the form schema.table=target_schema.target_table. Or shorthand schema.* for all tables.\n                        Target schema name and table name are optional.\n                        e.g.: 'bigquery-public-data.new_york_citibike.citibike_trips'\n  [--result-handler or -rh CONNECTION_NAME.SCHEMA.TABLE or BQ_PROJECT_ID.DATASET.TABLE]\n                        Specify a BigQuery or PostgreSQL connection name as destination for validation results.\n                        Also supports legacy BigQuery format BQ_PROJECT_ID.DATASET.TABLE.\n                        See: *Validation Reports* section\n  [--bq-result-handler or -bqrh PROJECT_ID.DATASET.TABLE or CONNECTION_NAME.DATASET.TABLE]\n                        This option has been deprecated and will be removed in a future release.\n  [--service-account or -sa PATH_TO_SA_KEY]\n                        Service account to use for BigQuery result handler output.\n  [--config-file or -c CONFIG_FILE]\n                        YAML Config File Path to be used for storing validations and other features. Supports GCS and local paths.\n                        See: *Running DVT with YAML Configuration Files* section\n  [--config-file-json or -cj CONFIG_FILE_JSON]\n                        JSON Config File Path to be used for storing validations only for application purposes.\n  [--format or -fmt]    Format for stdout output. Supported formats are (text, csv, json, table).\n                        Defaults  to table.\n  [--filter-status or -fs STATUSES_LIST]\n                        Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail).\n                        If no list is provided, all statuses are returned.\n  [--exclusion-columns or -ec EXCLUSION_COLUMNS]\n                        Comma separated list of columns to be excluded from the schema validation, e.g.: col_a,col_b.\n  [--allow-list or -al ALLOW_LIST]\n                        Comma separated list of data-type mappings of source and destination data sources which will be validated in case of missing data types in destination data source. e.g: \"decimal(4,2):decimal(5,4),!string:string\"\n  [--allow-list-file ALLOW_LIST_FILE, -alf ALLOW_LIST_FILE]\n                        YAML file containing default --allow-list mappings. Can be used in conjunction with --allow-list.\n                        e.g.: samples/allow_list/oracle_to_bigquery.yaml or gs://dvt-allow-list-files/oracle_to_bigquery.yaml\n                        See example files in samples/allow_list/.\n```\n\n#### Custom Query Column Validations\n\nBelow is the command syntax for custom query column validations.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  validate custom-query column\n  --source-conn or -sc SOURCE_CONN\n                        Source connection details\n                        See: *Data Source Configurations* section for each data source\n  --target-conn or -tc TARGET_CONN\n                        Target connection details\n                        See: *Connections* section for each data source\n  --source-query SOURCE_QUERY, -sq SOURCE_QUERY\n                        Source sql query\n                        Either --source-query or --source-query-file must be provided\n  --source-query-file  SOURCE_QUERY_FILE, -sqf SOURCE_QUERY_FILE\n                        File containing the source sql command. Supports GCS and local paths.\n  --target-query TARGET_QUERY, -tq TARGET_QUERY\n                        Target sql query\n                        Either --target-query or --target-query-file must be provided\n  --target-query-file TARGET_QUERY_FILE, -tqf TARGET_QUERY_FILE\n                        File containing the target sql command. Supports GCS and local paths.\n  [--count COLUMNS]     Comma separated list of columns for count or * for all columns\n  [--sum COLUMNS]       Comma separated list of columns for sum or * for all numeric\n  [--min COLUMNS]       Comma separated list of columns for min or * for all numeric\n  [--max COLUMNS]       Comma separated list of columns for max or * for all numeric\n  [--avg COLUMNS]       Comma separated list of columns for avg or * for all numeric\n  [--std COLUMNS]       Comma separated list of columns for stddev_samp or * for all numeric\n  [--exclude-columns or -ec]\n                        Flag to indicate the list of columns provided should be excluded and not included.\n  [--result-handler or -rh CONNECTION_NAME.SCHEMA.TABLE or BQ_PROJECT_ID.DATASET.TABLE]\n                        Specify a BigQuery or PostgreSQL connection name as destination for validation results.\n                        Also supports legacy BigQuery format BQ_PROJECT_ID.DATASET.TABLE.\n                        See: *Validation Reports* section\n  [--bq-result-handler or -bqrh PROJECT_ID.DATASET.TABLE or CONNECTION_NAME.DATASET.TABLE]\n                        This option has been deprecated and will be removed in a future release.\n  [--service-account or -sa PATH_TO_SA_KEY]\n                        Service account to use for BigQuery result handler output.\n  [--config-file or -c CONFIG_FILE]\n                        YAML Config File Path to be used for storing validations and other features. Supports GCS and local paths.\n                        See: *Running DVT with YAML Configuration Files* section\n  [--config-file-json or -cj CONFIG_FILE_JSON]\n                        JSON Config File Path to be used for storing validations only for application purposes.\n  [--labels or -l KEY1=VALUE1,KEY2=VALUE2]\n                        Comma-separated key value pair labels for the run.\n  [--format or -fmt FORMAT]\n                        Format for stdout output. Supported formats are (text, csv, json, table). Defaults to table.\n  [--filter-status or -fs STATUSES_LIST]\n                        Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail). If no list is provided, all statuses are returned.\n```\n\nThe default aggregation type is a 'COUNT *'. If no aggregation flag (i.e count,\nsum , min, etc.) is provided, the default aggregation will run.\n\nThe [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md)\npage provides few examples of how this tool can be used to run custom query validations.\n\n\n#### Custom Query Row Validations\n\n(Note: Custom query row validation is not supported for FileSystem connections. Struct and array data types are not currently supported.)\n\nBelow is the command syntax for row validations. In order to run row level\nvalidations you need to pass `--hash` flag which will specify the fields\nof the custom query result that will be concatenated and hashed. The primary key should be included\nin the SELECT statement of both source_query.sql and target_query.sql.  See *Primary Keys* section\n\nBelow is the command syntax for custom query row validations.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  validate custom-query row\n  --source-conn or -sc SOURCE_CONN\n                        Source connection details\n                        See: *Data Source Configurations* section for each data source\n  --target-conn or -tc TARGET_CONN\n                        Target connection details\n                        See: *Connections* section for each data source\n  --source-query SOURCE_QUERY, -sq SOURCE_QUERY\n                        Source sql query\n                        Either --source-query or --source-query-file must be provided\n  --source-query-file SOURCE_QUERY_FILE, -sqf SOURCE_QUERY_FILE\n                        File containing the source sql command. Supports GCS and local paths.\n  --target-query TARGET_QUERY, -tq TARGET_QUERY\n                        Target sql query\n                        Either --target-query or --target-query-file must be provided\n  --target-query-file TARGET_QUERY_FILE, -tqf TARGET_QUERY_FILE\n                        File containing the target sql command. Supports GCS and local paths.\n  --comparison-fields or -comp-fields FIELDS\n                        Comma separated list of columns to compare. Can either be a physical column or an alias\n                        See: *Calculated Fields* section for details\n  --hash '*'            '*' to hash all columns.\n  --concat COLUMNS      Comma separated list of columns to concatenate or * for all columns\n                        (use if a common hash function is not available between databases)\n  [--primary-keys PRIMARY_KEYS, -pk PRIMARY_KEYS]\n                       Common column between source and target queries for join\n  [--exclude-columns or -ec]\n                        Flag to indicate the list of columns provided should be excluded from hash or concat instead of included.\n  [--result-handler or -rh CONNECTION_NAME.SCHEMA.TABLE or BQ_PROJECT_ID.DATASET.TABLE]\n                        Specify a BigQuery or PostgreSQL connection name as destination for validation results.\n                        Also supports legacy BigQuery format BQ_PROJECT_ID.DATASET.TABLE.\n                        See: *Validation Reports* section\n  [--bq-result-handler or -bqrh PROJECT_ID.DATASET.TABLE or CONNECTION_NAME.DATASET.TABLE]\n                        This option has been deprecated and will be removed in a future release.\n  [--service-account or -sa PATH_TO_SA_KEY]\n                        Service account to use for BigQuery result handler output.\n  [--config-file or -c CONFIG_FILE]\n                        YAML Config File Path to be used for storing validations and other features. Supports GCS and local paths.\n                        See: *Running DVT with YAML Configuration Files* section\n  [--config-file-json or -cj CONFIG_FILE_JSON]\n                        JSON Config File Path to be used for storing validations only for application purposes.\n  [--labels or -l KEY1=VALUE1,KEY2=VALUE2]\n                        Comma-separated key value pair labels for the run.\n  [--format or -fmt FORMAT]\n                        Format for stdout output. Supported formats are (text, csv, json, table). Defaults to table.\n  [--filter-status or -fs STATUSES_LIST]\n                        Comma separated list of statuses to filter the validation results. Supported statuses are (success, fail). If no list is provided, all statuses are returned.\n  [--trim-string-pks, -tsp]\n                        Trims string based primary key values, intended for use when one engine uses padded string semantics (e.g. CHAR(n)) and the other does not (e.g. VARCHAR(n)).\n  [--case-insensitive-match, -cim]\n                        Performs a case insensitive match by adding an UPPER() before comparison.\n```\n\nThe [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md)\npage provides few examples of how this tool can be used to run custom query row validations.\n\n#### Dry Run Validation\n\nThe `validate` command takes a `--dry-run` command line flag that prints source\nand target SQL to stdout as JSON in lieu of performing a validation:\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  validate\n  [--dry-run or -dr]    Prints source and target SQL to stdout in lieu of performing a validation.\n```\n\nFor example, this flag can be used as follows:\n\n```shell\n\u003e data-validation validate --dry-run row \\\n  -sc my_bq_conn \\\n  -tc my_bq_conn \\\n  -tbls bigquery-public-data.new_york_citibike.citibike_stations \\\n  --primary-keys station_id \\\n  --hash '*'\n{\n    \"source_query\": \"SELECT `hash__all`, `station_id`\\nFROM ...\",\n    \"target_query\": \"SELECT `hash__all`, `station_id`\\nFROM ...\"\n}\n```\n\n### Running DVT with YAML Configuration Files\n\nRunning DVT with YAML configuration files is the recommended approach if:\n* you want to customize the configuration for any given validation OR\n* you want to run DVT at scale (i.e. run multiple validations sequentially or in parallel)\n\nWe recommend generating YAML configs with the `--config-file \u003cfile-name\u003e` flag when running a validation command, which supports\nGCS and local paths.\n\nYou can use the `data-validation configs` command to run and view YAMLs.\n\n```\ndata-validation\n  [--verbose or -v ]\n                        Verbose logging\n  [--log-level or -ll]\n                        Log Level to be assigned. Supported levels are (DEBUG,INFO,WARNING,ERROR,CRITICAL). Defaults to INFO.\n  configs run\n  [--config-file or -c CONFIG_FILE]\n                        Path to YAML config file to run. Supports local and GCS paths.\n  [--config-dir or -cdir CONFIG_DIR]\n                        Directory path containing YAML configs to be run sequentially. Supports local and GCS paths.\n  [--dry-run or -dr]    If this flag is present, prints the source and target SQL generated in lieu of running the validation.\n  [--kube-completions or -kc]\n                        Flag to indicate usage in Kubernetes index completion mode.\n                        See *Scaling DVT* section\n```\n\n```\ndata-validation configs list\n  [--config-dir or -cdir CONFIG_DIR]\n                        GCS or local directory from which to list validation YAML configs. Defaults to current local directory.\n```\n\n```\ndata-validation configs get\n  [--config-file or -c CONFIG_FILE] GCS or local path of validation YAML to print.\n```\n\nView the complete YAML file for a Grouped Column validation on the\n[Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-yaml-config-grouped-column-validation) page.\n\n\n### Scaling DVT\n\nYou can scale DVT for large validations by running the tool in a distributed manner. To optimize the validation speed for large tables, you can use GKE Jobs ([Google Kubernetes Jobs](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-workloads-overview#batch_jobs)) or [Cloud Run Jobs](https://cloud.google.com/run/docs/create-jobs). If you are not familiar with Kubernetes or Cloud Run Jobs, see [Scaling DVT with Distributed Jobs](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/internal/distributed_jobs.md) for a detailed overview.\n\n\nWe recommend first generating partitions with the `generate-table-partitions` command for your large datasets (tables or queries). Then, use Cloud Run or GKE to distribute the validation of each chunk in parallel. See the [Cloud Run Jobs Quickstart sample](https://github.com/GoogleCloudPlatform/professional-services-data-validator/tree/develop/samples/cloud_run_jobs) to get started.\n\nWhen running DVT in a distributed fashion, both the `--kube-completions` and `--config-dir` flags are required. The `--kube-completions` flag specifies that the validation is being run in indexed completion mode in Kubernetes or as multiple independent tasks in Cloud Run. If the `-kc` option is used and you are not running in indexed mode, you will receive a warning and the container will process all the validations sequentially. If the `-kc` option is used and a config directory is not provided (i.e. a `--config-file` is provided instead), a warning is issued.\n\nThe `--config-dir` flag will specify the directory with the YAML files to be executed in parallel. If you used `generate-table-partitions` to generate the YAMLs, this would be the directory where the partition files numbered `0000.yaml` to `\u003cpartition_num - 1\u003e.yaml` are stored i.e (`gs://my_config_dir/source_schema.source_table/`). When creating your Cloud Run Job, set the number of tasks equal to the number of table partitions so the task index matches the YAML file to be validated. When executed, each Cloud Run task will validate a partition in parallel.\n\n\n### Validation Reports\n\nThe result handlers tell DVT where to store the results of each validation. The\ntool can write the results of a validation run to Google BigQuery, PostgreSQL\nor print to stdout (default). View the schema of the results table [here](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/terraform/results_schema.json).\n\nTo output to BigQuery or PostgreSQL, simply include the `-rh` flag during a validation run including\nthe schema and table name for the results.\n\nBigQuery example by connection name:\n```shell\ndata-validation validate column \\\n  -sc bq_conn \\\n  -tc bq_conn \\\n  -tbls bigquery-public-data.new_york_citibike.citibike_trips \\\n  -rh bq_conn.dataset.results_table \\\n  -sa 'service-acct@project.iam.gserviceaccount.com'\n```\n\nBigQuery example by project name:\n```shell\ndata-validation validate column \\\n  -sc bq_conn \\\n  -tc bq_conn \\\n  -tbls bigquery-public-data.new_york_citibike.citibike_trips \\\n  -rh bq-project-id.dataset.results_table \\\n  -sa 'service-acct@project.iam.gserviceaccount.com'\n```\n\nPostgreSQL example:\n```shell\ndata-validation validate column \\\n  -sc ora_conn \\\n  -tc pg_conn1 \\\n  -tbls my_schema.some_table \\\n  -rh pg_conn2.dvt_schema.results_table\n```\n\n### Ad Hoc SQL Exploration\n\nThere are many occasions where you need to explore a data source while running\nvalidations. To avoid the need to open and install a new client, the CLI allows\nyou to run ad hoc queries.\n\n```\ndata-validation query\n  --conn or -c CONN\n          The connection name to be queried\n  --query or -q QUERY\n          The raw query to run against the supplied connection\n  [--format or -f {minimal,python}]\n          Format for query output (default: python)\n```\n\n### Building Matched Table Lists\n\nCreating the list of matched tables can be a hassle. We have added a feature\nwhich may help you to match all of the tables together between source and\ntarget. The `find-tables` command:\n\n-   Pulls all tables in the source (applying a supplied `allowed-schemas` filter)\n-   Pulls all tables from the target\n-   Uses Jaro Similarity algorithm to match tables\n-   Finally, it prints a JSON list of tables which can be a reference for the\n    validation run config.\n\nNote that our default value for the `score-cutoff` parameter is 1 and it seeks for identical matches. If no matches occur, reduce this value as deemed necessary. By using smaller numbers such as 0.7, 0.65 etc you can get more matches. For reference, we make use of [this jaro_similarity method](https://jamesturk.github.io/jellyfish/functions/#jaro-similarity) for the string comparison.\n\n```shell\ndata-validation find-tables --source-conn source --target-conn target \\\n    --allowed-schemas pso_data_validator \\\n    --score-cutoff 1\n```\n\n### Using Beta CLI Features\n\nThere may be occasions we want to release a new CLI feature under a Beta flag.\nAny features under Beta may or may not make their way to production. However, if\nthere is a Beta feature you wish to use than it can be accessed using the\nfollowing.\n\n```\ndata-validation beta --help\n```\n\n#### [Beta] Deploy Data Validation as a Local Service\n\nIf you wish to use Data Validation as a Flask service, the following command\nwill help. This same logic is also expected to be used for Cloud Run, Cloud\nFunctions, and other deployment services.\n\n`data-validation beta deploy`\n\n## Validation Logic\n### Aggregated Fields\n\nAggregate fields contain the SQL fields that you want to produce an aggregate\nfor. Currently the functions `COUNT()`, `AVG()`, `SUM()`, `MIN()`, `MAX()`,\nand `STDDEV_SAMP()` are supported.\n\nHere is a sample aggregate config:\n```yaml\nvalidations:\n- aggregates:\n    - field_alias: count\n    source_column: null\n    target_column: null\n    type: count\n    - field_alias: count__tripduration\n    source_column: tripduration\n    target_column: tripduration\n    type: count\n    - field_alias: sum__tripduration\n    source_column: tripduration\n    target_column: tripduration\n    type: sum\n```\n\nIf you are aggregating columns with large values, you can CAST() before aggregation\nwith calculated fields as shown in [this example](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-yaml-with-calc-fields-cast-to-numeric-before-aggregation).\n\n### Filters\n\nFilters let you apply a WHERE statement to your validation query (ie. `SELECT *\nFROM table WHERE created_at \u003e 30 days ago AND region_id = 71;`). The filter is\nwritten in the syntax of the given source and must reference columns in the\nunderlying table, not projected DVT expressions.\n\nNote that you are writing the query to execute, which does not have to match\nbetween source and target as long as the results can be expected to align. If\nthe target filter is omitted, the source filter will run on both the source and\ntarget tables.\n\n### Primary Keys\n\nIn many cases, validations (e.g. count, min, max etc) produce one row per table. The comparison between the source\nand target table is to compare the value for each column in the source with the value of the column in the target.\n`grouped-columns` validation and `validate row` produce multiple rows per table. Data Validation Tool needs one or more columns to uniquely identify each row so the source and target can be compared. Data Validation Tool refers to these columns as primary keys. These do not need to be primary keys in the table. The only requirement is that the keys uniquely identify the row in the results.\n\nThese columns are inferred, where possible, from the source/target table or can be provided via the `--primary-keys` flag.\n\n### Grouped Columns\n\nGrouped Columns contain the fields you want your aggregations to be broken out\nby, e.g. `SELECT last_updated::DATE, COUNT(*) FROM my.table` will produce a\nresultset that breaks down the count of rows per calendar date.\n\n### Hash, Concat, and Comparison Fields\n\nRow level validations can involve either a hash/checksum, concat, or comparison fields.\nA hash validation (`--hash '*'`) will first sanitize the data with the following\noperations on all or selected columns: CAST to string, IFNULL replace with a default\nreplacement string and RSTRIP. Then, it will CONCAT() the results\nand run a SHA256() hash and compare the source and target results.\n\nWhen there are data type mismatches for columns, for example dates compared to timestamps and\nbooleans compared with numeric columns, you may see other expressions in SQL statements which\nensure that consistent values are used to build comparison values.\n\nSince each row will be returned in the result set if is recommended recommended to validate a\nsubset of the table. The `--filters` and `--use-random-row` options can be used for this purpose.\n\nPlease note that SHA256 is not a supported function on Teradata systems. If you wish to perform\nthis comparison on Teradata you will need to [deploy a UDF to perform the conversion](https://github.com/akuroda/teradata-udf-sha2/blob/master/src/sha256.c).\n\nThe concat validation (`--concat '*'`) will do everything up until the hash. It will sanitize\nand concatenate the specified columns, and then value compare the results.\n\nComparison field validations (`--comp-fields column`) involve an value comparison of the\ncolumn values. These values will be compared via a JOIN on their corresponding primary\nkey and will be evaluated for an exact match.\n\nSee hash and comparison field validations in the [Examples](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#run-a-row-hash-validation-for-all-rows) page.\n\n### Calculated Fields\n\nSometimes direct comparisons are not feasible between databases due to\ndifferences in how particular data types may be handled. These differences can\nbe resolved by applying functions to columns in the query itself.\nExamples might include trimming whitespace from a string, converting strings to\na single case to compare case insensitivity, or rounding numeric types to a\nsignificant figure.\n\nOnce a calculated field is defined, it can be referenced by other calculated\nfields at any \"depth\" or higher. Depth controls how many subqueries are executed\nin the resulting query. For example, with the following YAML config:\n\n```yaml\n- calculated_fields:\n    - field_alias: rtrim_col_a\n      source_calculated_columns: ['col_a']\n      target_calculated_columns: ['col_a']\n      type: rtrim\n      depth: 0 # generated off of a native column\n    - field_alias: ltrim_col_a\n      source_calculated_columns: ['col_b']\n      target_calculated_columns: ['col_b']\n      type: ltrim\n      depth: 0 # generated off of a native column\n    - field_alias: concat_col_a_col_b\n      source_calculated_columns: ['rtrim_col_a', 'ltrim_col_b']\n      target_calculated_columns: ['rtrim_col_a', 'ltrim_col_b']\n      type: concat\n      depth: 1 # calculated one query above\n```\n\nis equivalent to the following SQL query:\n\n```sql\nSELECT\n  CONCAT(rtrim_col_a, rtrim_col_b) AS concat_col_a_col_b\nFROM (\n  SELECT\n      RTRIM(col_a) AS rtrim_col_a\n    , LTRIM(col_b) AS ltrim_col_b\n  FROM my.table\n  ) as table_0\n```\n\nIf you generate the config file for a row validation, you can see that it uses\ncalculated fields to generate the query. You can also use calculated fields\nin column level validations to generate the length of a string, or cast\na INT field to BIGINT for aggregations.\n\nSee the [Examples page](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-yaml-with-calc-fields-cast-to-numeric-before-aggregation) for a sample\ncast to NUMERIC.\n\n#### Custom Calculated Fields\n\nDVT supports certain functions required for row hash validation natively (i.e. CAST() and CONCAT()),\nwhich are listed in the CalculatedField() class methods in the [QueryBuilder](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/data_validation/query_builder/query_builder.py).\n\nYou can also specify custom functions (i.e. replace() or truncate()) from the Ibis expression\n[API reference](https://ibis-project.org/reference/expressions/generic/). Keep in mind these will run\non both source and target systems. You will need to specify the Ibis API expression and the parameters\nrequired, if any, with the 'params' block as shown below:\n\n```yaml\n- calculated_fields:\n  - depth: 0\n    field_alias: format_start_time\n    source_calculated_columns:\n    - start_time\n    target_calculated_columns:\n    - start_time\n    type: custom\n    ibis_expr: ibis.expr.types.TemporalValue.strftime\n    params:\n    - format_str: '%m%d%Y'\n```\n\nThe above block references the [TemporalValue.strftime](https://ibis-project.org/reference/expressions/timestamps/#ibis.expr.types.temporal.TemporalValue.strftime) Ibis API expression.\nSee the [Examples page](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/examples.md#sample-row-validation-yaml-with-custom-calc-field)\nfor a sample YAML with a custom calculated field.\n\n## Contributing\n\nContributions are welcome. See the [Contributing guide](https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/CONTRIBUTING.md) for details.\n","funding_links":[],"categories":["Python","Data Comparison"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGoogleCloudPlatform%2Fprofessional-services-data-validator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FGoogleCloudPlatform%2Fprofessional-services-data-validator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FGoogleCloudPlatform%2Fprofessional-services-data-validator/lists"}