{"id":18524266,"url":"https://github.com/datakitchen/dataops-testgen","last_synced_at":"2026-02-25T03:13:07.564Z","repository":{"id":234517301,"uuid":"788112927","full_name":"DataKitchen/dataops-testgen","owner":"DataKitchen","description":"DataOps Data Quality TestGen is part of DataKitchen's Open Source Data Observability.   DataOps TestGen delivers simple, fast data quality test generation and execution by data profiling,  new dataset hygiene review, AI generation of data quality validation tests, ongoing testing of data refreshes, \u0026 continuous anomaly monitoring","archived":false,"fork":false,"pushed_at":"2025-03-25T05:45:00.000Z","size":5464,"stargazers_count":54,"open_issues_count":3,"forks_count":3,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-30T00:11:10.723Z","etag":null,"topics":["data","data-engineering","data-observability","data-quality","data-science","data-testing","datachecker","dataops","dataprofiling","dataquality","datavalidation","mssql","postgresql","python","redshift","self-hosted","snowflake"],"latest_commit_sha":null,"homepage":"https://datakitchen.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataKitchen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-17T19:54:23.000Z","updated_at":"2025-03-25T05:45:02.000Z","dependencies_parsed_at":"2024-04-30T20:48:50.417Z","dependency_job_id":"2f499289-6543-4617-8c9d-a2e41157ec9a","html_url":"https://github.com/DataKitchen/dataops-testgen","commit_stats":null,"previous_names":["datakitchen/dataops-testgen"],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdataops-testgen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdataops-testgen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdataops-testgen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdataops-testgen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataKitchen","download_url":"https://codeload.github.com/DataKitchen/dataops-testgen/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247419860,"owners_count":20936012,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-engineering","data-observability","data-quality","data-science","data-testing","datachecker","dataops","dataprofiling","dataquality","datavalidation","mssql","postgresql","python","redshift","self-hosted","snowflake"],"created_at":"2024-11-06T17:40:19.597Z","updated_at":"2026-02-25T03:13:07.558Z","avatar_url":"https://github.com/DataKitchen.png","language":"Python","readme":"# DataOps Data Quality TestGen\n![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F\u0026query=results%5B0%5D.name\u0026label=latest%20version\u0026color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F\u0026query=pull_count\u0026style=flat\u0026label=docker%20pulls\u0026color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat\u0026logo=slack)](https://data-observability-slack.datakitchen.io/join)\n\n*\u003cp style=\"text-align: center;\"\u003eDataOps Data Quality TestGen, or \"TestGen\" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.\u003c/p\u003e*\n\n## Documentation\n\n[DataOps TestGen Overview](https://datakitchen.io/dataops-testgen-product/)\n\n[DataOps TestGen Documentation](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help)\n\n\n## Features\n\n[Interactive Product Tour](https://datakitchen.storylane.io/share/byag8vimd5tn)\n\nWhat does DataKitchen's DataOps Data Quality TestGen do? It helps you understand and \u003cb\u003efind data issues in new data\u003c/b\u003e.\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DatKitchen Open Source Data Quality TestGen Features - New Data\" src=\"https://datakitchen.io/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-2.22.57 PM.png\" width=\"70%\"\u003e\n\u003c/p\u003e\nIt constantly \u003cb\u003ewatches your data for data quality anomalies\u003c/b\u003e and lets you drill into problems.\n\u003cbr\u003e\u003c/br\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DataKitchen Open Source Data Quality TestGen Features - Data Ingestion and Quality Testing\" src=\"https://datakitchen.io/wp-content/uploads/2024/07/Screenshot-2024-07-23-at-2.23.07 PM.png\" width=\"70%\"\u003e\n\u003c/p\u003e\nA \u003cb\u003esingle place to manage Data Quality\u003c/b\u003e across data sets, locations, and teams.\n\u003cbr\u003e\u003c/br\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DataKitchen Open Source Data Quality TestGen Features - Single Place\" src=\"https://datakitchen.io/wp-content/uploads/2024/07/Screenshot-dataops-testgen-centralize.png\" width=\"70%\"\u003e\n\u003c/p\u003e\n\n## Installation with dk-installer (recommended)\n\nThe [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) program installs DataOps Data Quality TestGen as a [Docker Compose](https://docs.docker.com/compose/) application. This is the recommended mode of installation as Docker encapsulates and isolates the application from other software on your machine and does not require you to manage Python dependencies.\n\n### Install the prerequisite software\n\n| Software                | Tested Versions               | Command to check version                |\n|-------------------------|-------------------------|-------------------------------|\n| [Python](https://www.python.org/downloads/) \u003cbr/\u003e- Most Linux and macOS systems have Python pre-installed. \u003cbr/\u003e- On Windows machines, you will need to download and install it.  \u003cbr/\u003e Why Python?  To run the installer.       | 3.9, 3.10, 3.11, 3.12                | `python3 --version`                |\n| [Docker](https://docs.docker.com/get-docker/) \u003cbr/\u003e[Docker Compose](https://docs.docker.com/compose/install/)  \u003cbr/\u003e Why Docker?  Docker lets you try TestGen without affecting your local software environment.  All the dependencies TestGen needs are isolated in its own container, so installation is easy and insulated.  | 26.1, 27.5, 28.0 \u003cbr/\u003e 2.32, 2.33, 2.34        | `docker -v` \u003cbr/\u003e `docker compose version`         |\n\n### Download the installer\n\nOn Unix-based operating systems, use the following command to download it to the current directory. We recommend creating a new, empty directory.\n\n```shell\ncurl -o dk-installer.py 'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py'\n```\n\n* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from the [data-observability-installer](https://github.com/DataKitchen/data-observability-installer) repository.\n* All commands listed below should be run from the folder containing this file.\n* For usage help and command options, run `python3 dk-installer.py --help` or `python3 dk-installer.py \u003ccommand\u003e --help`.\n\n### Install the TestGen application\n\nThe installation downloads the latest Docker images for TestGen and deploys a new Docker Compose application. The process may take 5~10 minutes depending on your machine and network connection.\n\n```shell\npython3 dk-installer.py tg install\n```\n\nThe `--port` option may be used to set a custom localhost port for the application (default: 8501).\n\nTo enable SSL for HTTPS support, use the `--ssl-cert-file` and `--ssl-key-file` options to specify local file paths to your SSL certificate and key files.\n\nOnce the installation completes, verify that you can login to the UI with the URL and credentials provided in the output.\n\n### Optional: Run the TestGen demo setup\n\nThe [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.\n\n```shell\npython3 dk-installer.py tg run-demo\n```\n\nIn the TestGen UI, you will see that new data profiling and test results have been generated.\n\n## Installation with pip\n\nAs an alternative to the Docker Compose [installation with dk-installer (recommended)](#installation-with-dk-installer-recommended), DataOps Data Quality TestGen can also be installed as a Python package via [pip](https://pip.pypa.io/en/stable/). This mode of installation uses the [dataops-testgen](https://pypi.org/project/dataops-testgen/) package published to PyPI, and it requires a PostgreSQL instance to be provisioned for the application database.\n\n### Install the prerequisite software\n\n| Software                                                                                                                                                                         | Tested Versions  | Command to check version               |\n|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|------------------------------|\n| [Python](https://www.python.org/downloads/) \u003cbr/\u003e- Most Linux and macOS systems have Python pre-installed. \u003cbr/\u003e- On Windows machines, you will need to download and install it. | 3.12 | `python3 --version`               |\n| [PostgreSQL](https://www.postgresql.org/download/)                                                                                                                                                                     | 14.1, 15.8, 16.4       | `psql --version`|\n\n### Install the TestGen package\n\nWe recommend using a Python virtual environment to avoid any dependency conflicts with other applications installed on your machine. The [venv](https://docs.python.org/3/library/venv.html#creating-virtual-environments) module, which is part of the Python standard library, or other third-party tools, like [virtualenv](https://virtualenv.pypa.io/en/latest/) or [conda](https://docs.conda.io/en/latest/), can be used.\n\nCreate and activate a virtual environment with a TestGen-compatible version of Python (`\u003e=3.12`). The steps may vary based on your operating system and Python installation - the [Python packaging user guide](https://packaging.python.org/en/latest/tutorials/installing-packages/) is a useful reference.\n\n_On Linux/Mac_\n```shell\npython3 -m venv venv\nsource venv/bin/activate\n```\n\n_On Windows_\n```powershell\npy -3.12 -m venv venv\nvenv\\Scripts\\activate\n```\n\nWithin the virtual environment, install the TestGen package with pip.\n```shell\npip install dataops-testgen\n```\n\nVerify that the [_testgen_ command line](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-commands-and-details) works.\n```shell\ntestgen --help\n```\n\n### Set up the application database in PostgresSQL\n\nCreate a `local.env` file with the following environment variables, replacing the `\u003cvalue\u003e` placeholders with appropriate values. Refer to the [TestGen Configuration](docs/configuration.md) document for more details, defaults, and other supported configuration.\n```shell\n# Connection parameters for the PostgreSQL server\nexport TG_METADATA_DB_HOST=\u003cpostgres_hostname\u003e\nexport TG_METADATA_DB_PORT=\u003cpostgres_port\u003e\n\n# Connection credentials for the PostgreSQL server\n# This role must have privileges to create roles, users, database and schema so that the application database can be initialized\nexport TG_METADATA_DB_USER=\u003cpostgres_username\u003e\nexport TG_METADATA_DB_PASSWORD=\u003cpostgres_password\u003e\n\n# Set a password and arbitrary string (the \"salt\") to be used for encrypting secrets in the application database\nexport TG_DECRYPT_PASSWORD=\u003cencryption_password\u003e\nexport TG_DECRYPT_SALT=\u003cencryption_salt\u003e\n\n# Set credentials for the default admin user to be created for TestGen\nexport TESTGEN_USERNAME=\u003cusername\u003e\nexport TESTGEN_PASSWORD=\u003cpassword\u003e\n\n# Set an arbitrary base64-encoded string to be used for signing authentication tokens\nexport TG_JWT_HASHING_KEY=\u003cbase64_key\u003e\n\n# Set an accessible path for storing application logs\nexport TESTGEN_LOG_FILE_PATH=\u003cpath_for_logs\u003e\n```\n\nSource the file to apply the environment variables. For the Windows equivalent, refer to [this guide](https://bennett4.medium.com/windows-alternative-to-source-env-for-setting-environment-variables-606be2a6d3e1).\n```shell\nsource local.env\n```\n\nMake sure the PostgreSQL database server is up and running. Initialize the application database for TestGen.\n```shell\ntestgen setup-system-db --yes\n```\n\n### Run the application modules\n\nRun the following command to start TestGen. It will open the browser at [http://localhost:8501](http://localhost:8501).\n\n```shell\ntestgen run-app\n```\n\nVerify that you can login to the UI with the `TESTGEN_USERNAME` and `TESTGEN_PASSWORD` values that you configured in the environment variables.\n\n### Optional: Run the TestGen demo setup\n\nThe [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through DataOps Data Quality TestGen capabilities to demonstrate how it covers critical use cases for data and analytic teams.\n\n```shell\ntestgen quick-start\n```\n\nIn the TestGen UI, you will see that new data profiling and test results have been generated.\n\n## Useful Commands\n\nThe [dk-installer](https://github.com/DataKitchen/data-observability-installer/?tab=readme-ov-file#install-the-testgen-application) and [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the TestGen application installed using dk-installer. All commands must be run in the same folder that contains the `dk-installer.py` and `docker-compose.yml` files used by the installation.\n\n### Remove demo data\n\nAfter completing the quickstart, you can remove the demo data from the application with the following command.\n\n```shell\npython3 dk-installer.py tg delete-demo\n```\n\n### Upgrade to latest version\n\nNew releases of TestGen are announced on the `#releases` channel on [Data Observability Slack](https://data-observability-slack.datakitchen.io/join), and release notes can be found on the [DataKitchen documentation portal](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-release-notes/a/h1_1691719522). Use the following command to upgrade to the latest released version.\n\n ```shell\n python3 dk-installer.py tg upgrade\n ```\n\n### Uninstall the application\n\nThe following command uninstalls the Docker Compose application and removes all data, containers, and images related to TestGen from your machine.\n\n```shell\npython3 dk-installer.py tg delete\n```\n\n### Access the _testgen_ CLI\n\nThe [_testgen_ command line](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container.\n\n```shell\ndocker compose exec engine bash\n```\n\nUse `exit` to return to the regular terminal.\n\n### Stop the application\n\n```shell\ndocker compose down\n```\n\n### Restart the application\n\n```shell\ndocker compose up -d\n```\n\n## What Next?\n\n### Getting started guide\nWe recommend you start by going through the [Data Observability Overview Demo](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview).\n\n### Support\nFor support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and post on the `#support` channel.\n\n### Connect to your database\nFollow [these instructions](https://docs.datakitchen.io/articles/dataops-testgen-help/connect-your-database) to improve the quality of data in your database.\n\n### Community\nTalk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.\n\nJoin our community here:\n\n* 👋 [Join us on Slack](https://data-observability-slack.datakitchen.io/join), this is also how you get support (see above)\n\n* 🌟 [Star us on GitHub](https://github.com/DataKitchen/data-observability-installer)\n\n* 🐦 [Follow us on Twitter](https://twitter.com/i/flow/login?redirect_after_login=%2Fdatakitchen_io)\n\n* 🕴️ [Follow us on LinkedIn](https://www.linkedin.com/company/datakitchen)\n\n* 📺 [Get Free DataOps Fundamentals Certification](https://info.datakitchen.io/training-certification-dataops-fundamentals)\n\n* 📚 [Read our blog posts](https://datakitchen.io/blog/)\n\n* 🗃 [Sign The DataOps Manifesto](https://DataOpsManifesto.org)\n\n* 🗃 [Sign The Data Journey Manifesto](https://DataJourneyManifesto.org)\n\n\n### Contributing\nFor details on contributing or running the project for development, check out our [contributing guide](CONTRIBUTING.md).\n\n### License\nDataKitchen's DataOps Data Quality TestGen is Apache 2.0 licensed.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatakitchen%2Fdataops-testgen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatakitchen%2Fdataops-testgen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatakitchen%2Fdataops-testgen/lists"}