{"id":13702979,"url":"https://github.com/DataKitchen/data-observability-installer","last_synced_at":"2025-05-05T07:30:30.339Z","repository":{"id":234518212,"uuid":"788686533","full_name":"DataKitchen/data-observability-installer","owner":"DataKitchen","description":"Installer for DataKitchen's Open Source Data Observability Products. Data breaks. Servers break. Your toolchain breaks. Ensure your team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.","archived":false,"fork":false,"pushed_at":"2025-04-30T14:25:08.000Z","size":274,"stargazers_count":114,"open_issues_count":6,"forks_count":10,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-30T15:41:30.502Z","etag":null,"topics":["data","data-engineering","data-observability","data-profiling","data-quality","data-reliability","data-science","datachecker","datacleaner","datacleaning","dataops","dataquality","datatesting","datavalidation","mssql","pipeline-tests","postgresql","redshift","self-hosted","snowflake"],"latest_commit_sha":null,"homepage":"https://datakitchen.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataKitchen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-18T22:21:12.000Z","updated_at":"2025-04-30T14:25:10.000Z","dependencies_parsed_at":"2024-04-22T17:43:23.747Z","dependency_job_id":"981e03af-de71-4ff0-b2a7-42f91ededc9c","html_url":"https://github.com/DataKitchen/data-observability-installer","commit_stats":{"total_commits":58,"total_committers":5,"mean_commits":11.6,"dds":0.5689655172413793,"last_synced_commit":"e4c36505e35c63a737c9fd732312b9aac6a89145"},"previous_names":["datakitchen/data-observability-installer"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdata-observability-installer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdata-observability-installer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdata-observability-installer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataKitchen%2Fdata-observability-installer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataKitchen","download_url":"https://codeload.github.com/DataKitchen/data-observability-installer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252458316,"owners_count":21751014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","data-engineering","data-observability","data-profiling","data-quality","data-reliability","data-science","datachecker","datacleaner","datacleaning","dataops","dataquality","datatesting","datavalidation","mssql","pipeline-tests","postgresql","redshift","self-hosted","snowflake"],"created_at":"2024-08-02T21:00:47.904Z","updated_at":"2025-05-05T07:30:30.331Z","avatar_url":"https://github.com/DataKitchen.png","language":"Python","readme":"# DataKitchen Data Observability Installer \n![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) \n![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) \n[![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/#!open-source-data-observability/data-observability-overview) \n[![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat\u0026logo=slack)](https://data-observability-slack.datakitchen.io/join)\n\n*\u003cp style=\"text-align: center;\"\u003eData breaks. Servers break. Your toolchain breaks. Ensure your data team is the first to know and the first to solve with visibility across and down your data estate. Save time with simple, fast data quality test generation and execution. Trust your data, tools, and systems end to end.\u003c/p\u003e*\n\nThis repo contains the installer and quickstart setup for the DataKitchen Open Source Data Observability product suite (released April 2024).\n* [**DataOps Data Quality TestGen**](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) is a data quality verification tool that does five main tasks: (1) data profiling, (2) new dataset screening and hygiene review, (3) algorithmic generation of data quality validation tests, (4) ongoing production testing of new data refreshes and (5) continuous periodic monitoring of datasets for anomalies [(GitHub)](https://github.com/DataKitchen/dataops-testgen).\n* [**DataOps Observability**](https://docs.datakitchen.io/articles/dataops-observability-help/dataops-observability-help) monitors every tool used in the journey of data from data source to customer value, from any team development environment into production, across every tool, team, data set, environment, and project so that problems are detected, localized, and understood immediately [(GitHub)](https://github.com/DataKitchen/dataops-observability).\n\n![DataKitchen Open Source Data Observability](https://datakitchen.io/wp-content/uploads/2024/04/both-products.png)\n\nFor background on why we build this product check out the articles on ['why we open sourced'](https://datakitchen.io/why-we-open-sourced-our-data-observability-products/), [manifesto](https://datajourneymanifesto.org/), [free book](https://datakitchen.io/the-dataops-cookbook/), and [top data observability and DataOps articles](https://datakitchen.io/datakitchen-resource-guide-to-data-journeys-data-observability-dataops/).\n\n## Features\n\nWhat does DataKitchen's  Open Source Data Observability do?  It helps you understand and \u003cb\u003efind data issues in new data\u003c/b\u003e. \n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DatKitchen Open Source Data Observability Features - New Data\" src=\"https://datakitchen.io/wp-content/uploads/2024/06/Quick-over-view.png\" width=\"70%\" \u003e\n\u003c/p\u003e\nIt constantly \u003cb\u003ewatches your data for data quality anomalies\u003c/b\u003e and alerts you of problems.\n\u003cbr\u003e\u003c/br\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DatKitchen Open Source Data Observability Features - Data Ingestion and Polling\" src=\"https://datakitchen.io/wp-content/uploads/2024/06/Quick-over-view-1.png\" width=\"70%\" \u003e\n\u003c/p\u003e\nIt monitors \u003cb\u003emulti-tool, multi-data set, multi-hop data analytic production\u003c/b\u003e processes.  \n\u003cbr\u003e\u003c/br\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DatKitchen Open Source Data Observability Features - Data Production\" src=\"https://datakitchen.io/wp-content/uploads/2024/06/Quick-over-view-2.png\" width=\"70%\" \u003e\n\u003c/p\u003e\nAnd it allows you to \u003cb\u003emake fast, safe development changes\u003c/b\u003e.\n\u003cbr\u003e\u003c/br\u003e\n\u003cp align=\"center\"\u003e\n\u003cimg alt=\"DatKitchen Open Source Data Observability Features - Development CI-CD\" src=\"https://datakitchen.io/wp-content/uploads/2024/06/Quick-over-view-4.png\" width=\"70%\" \u003e\n\u003c/p\u003e\n\n\n## Prerequisites\n\n### Minimum system requirements\n\n- 2 CPUs\n- 8 GB memory\n- 20 GB disk space\n\n### Install the required software\n\n#### Requirements for TestGen \u0026 Observability\n\n| Software                | Tested Versions               | Command to check version                |\n|-------------------------|-------------------------|-------------------------------|\n| [Python](https://www.python.org/downloads/) \u003cbr/\u003e- Most Linux and macOS systems have Python pre-installed. \u003cbr/\u003e- On Windows machines, you will need to download and install it.        | 3.9, 3.10, 3.11, 3.12                | `python3 --version`                |\n| [Docker](https://docs.docker.com/get-docker/) \u003cbr/\u003e[Docker Compose](https://docs.docker.com/compose/install/)         | 26.1, 27.5, 28.0 \u003cbr/\u003e 2.32, 2.33, 2.34        | `docker -v` \u003cbr/\u003e `docker compose version`         |\n\n#### Additional Requirements for Observability only\n\n| Software                | Tested Versions               | Command to check version                |\n|-------------------------|-------------------------|-------------------------------|\n| [Minikube](https://minikube.sigs.k8s.io/docs/start/)         | 1.33, 1.34, 1.35                | `minikube version`                |\n| [Helm](https://helm.sh/docs/intro/install/)            | 3.15, 3.16, 3.17        | `helm version`         |\n\n### Download the installer\n\nOn Unix-based operating systems, use the following command to download it to the current directory. We recommend creating a new, empty directory.\n\n```shell\ncurl -o dk-installer.py 'https://raw.githubusercontent.com/DataKitchen/data-observability-installer/main/dk-installer.py'\n```\n\n* Alternatively, you can manually download the [`dk-installer.py`](https://github.com/DataKitchen/data-observability-installer/blob/main/dk-installer.py) file from this repo.\n* All commands listed below should be run from the folder containing this file.\n* For usage help and command options, run `python3 dk-installer.py --help` or `python3 dk-installer.py \u003ccommand\u003e --help`.\n\nOn Windows operating systems, you can also download the executable file [`dk-installer.exe`](https://github.com/DataKitchen/data-observability-installer/releases/download/latest/dk-installer.exe) and run it by double-clicking the file.\n\n## Quickstart Guide\n\nThe [Data Observability quickstart](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) walks you through Dataops Observability and TestGen capabilities to demonstrate how our products cover critical use cases for data and analytic teams.\n\nBefore going through the quickstart, complete the prequisites above and then the following steps to install the two products and setup the demo data. For any of the commands, you can view additional options by appending `--help` at the end.\n\n### Install the TestGen application\n\nThe installation downloads the latest Docker images for TestGen and deploys a new Docker Compose application. The process may take 5~10 minutes depending on your machine and network connection.\n\n```shell\npython3 dk-installer.py tg install\n```\nThe `--port` option may be used to set a custom localhost port for the application (default: 8501).\n\nTo enable SSL for HTTPS support, use the `--ssl-cert-file` and `--ssl-key-file` options to specify local file paths to your SSL certificate and key files.\n\nOnce the installation completes, verify that you can login to the UI with the URL and credentials provided in the output.\n\n### Install the Observability application\n\nThe installation downloads the latest Helm charts and Docker images for Observability and deploys the application on a new minikube cluster. The process may take 5~30 minutes depending on your machine and network connection. \n```shell\npython3 dk-installer.py obs install\n```\n#### Bind HTTP ports to host machine\n\nThis step is required to access the application when using Docker driver on Mac or Windows. It may also be useful for installations on remote machines to access the UI from a local browser.\n\n```shell\npython3 dk-installer.py obs expose\n```\nThe `--port` option may be used to set a custom localhost port for the application (default: 8082).\n\nVerify that you can login to the UI with the URL and credentials provided in the output. Leave this process running, and continue the next steps on another terminal window.\n\n### Run the TestGen demo setup\n\nThe `demo-config.json` file generated by the Observability installation must be present in the folder.\n\n```shell\npython3 dk-installer.py tg run-demo --export\n```\nIn the TestGen UI, you will see that new data profiling and test results have been generated. Additionally, in the Observavility UI, you will see that new test outcome events have been received.\n\n### Run the Observability demo setup\n\nThe `demo-config.json` file generated by the Observability installation must be present in the folder.\n\n```shell\npython3 dk-installer.py obs run-demo\n```\nIn the Observability UI, you will see that new journeys and events have been generated.\n\n### Run the Agent Heartbeat demo setup\n\nThe `demo-config.json` file generated by the Observability installation must be present in the folder.\n\n```shell\npython3 dk-installer.py obs run-heartbeat-demo\n```\nIn the Observability UI, you will see that new agents have been generated on the Integrations page.\n\nLeave this process running, and continue with the [quickstart guide](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview) to tour the applications.\n\n## Product Documentation\n\n[DataOps TestGen](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help)\n\n[DataOps Observability](https://docs.datakitchen.io/articles/dataops-observability-help/dataops-observability-help)\n\n## Useful Commands\n\n### DataOps TestGen\n\nThe [docker compose CLI](https://docs.docker.com/compose/reference/) can be used to operate the installed TestGen application. All commands must be run in the same folder that contains the `docker-compose.yaml` file generated by the installation.\n\nAccess the _testgen_ CLI: `docker compose exec engine bash` (use `exit` to return to the regular terminal)\n\nStop the app: `docker compose down`\n\nRestart the app: `docker compose up`\n\nUpgrade the app to latest version: `python3 dk-installer.py tg upgrade`\n\n### DataOps Observability\n\nThe [minikube](https://minikube.sigs.k8s.io/docs/commands/) and [kubectl](https://kubernetes.io/docs/reference/kubectl/) command line tools can be used to operate the Observability application.\n\nInspect the pods: `kubectl get pods`\n\nGet pod logs: `kubectl logs \u003cPOD ID\u003e`\n\nStop the app: `minikube stop`\n\nRestart the app: `minikube start`\n\n## Remove Demo Data \n\nAfter completing the quickstart, you can remove the demo data from the applications with the following steps.\n\n### Stop the Agent Heartbeat demo\n\nStop the process that is running the Agent Heartbeat demo using `Ctrl + C`.\n\n*Note*: Currently, the agents generated by the heartbeat demo are not cleaned up.\n\n### Remove TestGen \u0026 Observability demo data\n\nThe `demo-config.json` file generated by the Observability installation must be present in the folder.\n\n```shell\npython3 dk-installer.py tg delete-demo\npython3 dk-installer.py obs delete-demo\n```\n\n## Uninstall Applications\n\n### Uninstall TestGen\n```shell\npython3 dk-installer.py tg delete\n```\n\n### Uninstall Observability\n```shell\npython3 dk-installer.py obs delete\n```\n\n## Use Cases for Data Observability\n\n**Data Analytics Use Case**|**When Does it Happen**|**Data Observability Challenge**|**Key Data Observability Product Feature**|**Key Benefit**\n:-----:|:-----:|:-----:|:-----:|:-----:\n[**Patch (or pushback)**](https://datakitchen.io/the-five-use-cases-in-data-observability-part-1/): New data analysis and cleansing|Before New Data Sources Are Added To Production|Evaluate new data, find data hygiene issues, and communicate with your data providers.|DataOps TestGen's data profiling of 51 data characteristics, then 27 data hygiene detector suggestions; UI to review and disposition|Save time, lower errors, improve data quality\n[**Poll**](https://datakitchen.io/the-five-use-cases-in-data-observability-part-2): Updates to existing data sources; Data ingestion monitoring|Continually|Find anomalies in data updates and notify the proper party in the right place.|DataOps TestGen's auto-generation of data anomaly tests: freshness, schema, volume, and data drift checks.  DataOps Observability Data Journeys, overview UI, and notification rules and limits|Find problem data quickly, save time, lower errors\n[**Production**](https://datakitchen.io/the-five-use-cases-in-data-observability-part-3):  Monitoring of multi-tool, multi-data sets, multi-hop, data analytic production processes.|During The Production Cycle|Find data, SLA, and toolchain problems, local quickly, and notify quickly.|DataOps TestGen's auto-generation of 32 data quality validation tests based on data profiling. 2 custom test types. Fast in database SQL execution (no data copies). DataOps Observability's end-to-end Data Journeys are digital twins that represent your entire process and allow you to find, alert, and fix quickly.|Stop embarrassing customer errors, gain customer data trust, lower errors, improve team productivity\n[**Push**](https://datakitchen.io/the-five-use-cases-in-data-observability-part-4): Development Unit, Regression Tests, and Impact Assessment.|During The Development Process|Find problems in data or tools in development to validate code/configuration changes.|The combination of DataOps Observability and DataOps TestGen can be run in your development environment against test data to provide functional, unit, and regression tests.|Improve the speed and lower the risk of changes to production, less wasted time, improve productivity\n[**Parallel**](https://datakitchen.io/the-five-use-cases-in-data-observability-part-5): Checking data accuracy during Data Migration projects: \"Does It Match'?|During a Data Migration Process|Checking two data similar data sets or processes so they produce the same results.|DataOps TestGen can find errors between migrated data sets by comparing source and target data quality tests. DataOps Observability can monitor legacy tools and migrated cloud tools at the same time.|Lower risk of data errors, improve project delivery time\n\n\n\n## Community\n\n### Getting Started Guide\nWe recommend you review the [Data Observability Overview Demo](https://docs.datakitchen.io/articles/open-source-data-observability/data-observability-overview).\n\n### Support\nFor support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) and ask post on #support channel.\n\n### Connect\nTalk and Learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project. \n\nJoin our community here:\n\n* 🌟 [Star us on GitHub](https://github.com/DataKitchen/data-observability-installer)\n\n* 🐦 [Follow us on Twitter](https://twitter.com/i/flow/login?redirect_after_login=%2Fdatakitchen_io)\n\n* 🕴️ [Follow us on LinkedIn](https://www.linkedin.com/company/datakitchen)\n\n* 📺 [Get Free Data Observability and Data Quality Testing Certificationn](https://info.datakitchen.io/webinar-2024-05-video-form-data-observability-and-data-quality-testing-certification-series)\n\n* 📺 [Get Free DataOps Fundamentals Certification](https://info.datakitchen.io/training-certification-dataops-fundamentals)\n\n* 📚 [Read our blog posts](https://datakitchen.io/blog/)\n\n* 👋 [Join us on Slack](https://data-observability-slack.datakitchen.io/join)\n\n* 🗃 [Sign The DataOps Manifesto](https://DataOpsManifesto.org)\n\n* 🗃 [Sign The Data Journey Manifesto](https://DataJourneyManifesto.org)\n\n\n### Contributing\nFor details on contributing or running the project for development, check out our [contributing guide](https://github.com/DataKitchen/data-observability-installer/blob/main/CONTRIBUTING.md).\n\n### License\nDataKitchen DataOps Observability is Apache 2.0 licensed.\n","funding_links":[],"categories":["Data Quality"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDataKitchen%2Fdata-observability-installer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FDataKitchen%2Fdata-observability-installer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FDataKitchen%2Fdata-observability-installer/lists"}