{"id":19271981,"url":"https://github.com/warfox/dqt","last_synced_at":"2025-02-23T20:25:46.106Z","repository":{"id":41184817,"uuid":"443883079","full_name":"WarFox/dqt","owner":"WarFox","description":"Data Quality Tool","archived":false,"fork":false,"pushed_at":"2023-07-07T20:25:22.000Z","size":123,"stargazers_count":1,"open_issues_count":3,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-01-05T13:31:10.368Z","etag":null,"topics":["clojure","data-quality","data-reliability"],"latest_commit_sha":null,"homepage":"","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WarFox.png","metadata":{"files":{"readme":"README.org","changelog":"CHANGELOG.org","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-02T22:25:31.000Z","updated_at":"2023-03-04T11:52:45.000Z","dependencies_parsed_at":"2024-11-09T20:45:12.621Z","dependency_job_id":null,"html_url":"https://github.com/WarFox/dqt","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarFox%2Fdqt","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarFox%2Fdqt/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarFox%2Fdqt/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WarFox%2Fdqt/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WarFox","download_url":"https://codeload.github.com/WarFox/dqt/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240373898,"owners_count":19791298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clojure","data-quality","data-reliability"],"created_at":"2024-11-09T20:35:02.162Z","updated_at":"2025-02-23T20:25:46.050Z","avatar_url":"https://github.com/WarFox.png","language":"Clojure","readme":"#+title: dqt - Data Quality Tool\n\nA simple data quality tool. Collect and publish metrics about quality of data anywhere.\n\n* Docs\n\n1. [[./docs/dimensions.org][Dimensions of data quality]]\n\n* Features [2/12]\n\n- [X] get metrics\n- [X] run tests\n- [ ] publish metrics to aws\n- [ ] publish metrics to prometheus\n- [ ] publish metadata to DataHub\n- [ ] build dashboards\n- [ ] alets based on CloudWatch\n- [ ] other cloud providers?\n- [ ] multiple data sources\n- [ ] example dag with airflow\n- [ ] example with prefect\n- [ ] re-conciliation between two data sources, % missing, matching columns vs mismatch\n\n* Installation\n\nDownload from https://github.com/warfox/dqt\n\n* Usage\n\n`dqt` is a command line tool that runs on JVM.\n\nMake sure you have the jdbc drivers in classpath.\n\n#+begin_src\n  java -jar dqt.jar run -d datasource.edn -t table.edn\n#+end_src\n\n#+begin_src\n   java -cp \"/path/to/jdbc/driver/jar/:./dqt.jar\" dqt.core run -d examples/postgres.edn -t examples/tables/employees.edn\n#+end_src\n\n** datasource.edn\n\n** table.edn\n\n* Development\n\nRun the project directly, via `:main-opts` (`-m dqt.core`):\n\n#+begin_src\n    $ clojure -M:run\n#+end_src\n\nRun the project, with parameters\n\n#+begin_src\n    $ clojure -M:run -d datasource.edn -t table.edn\n#+end_src\n\nRun the project's tests (they'll fail until you edit them):\n\n#+begin_src\n    $ clojure -T:build test\n#+end_src\n\n#+begin_src\n  $ ./bin/kaocha\n#+end_src\n\nBuild uberjar\n\n#+begin_src\n    $ clojure -T:build uberjar\n#+end_src\n\nThis will produce an updated =pom.xml= file with synchronized dependencies inside the =META-INF=\ndirectory inside =target/classes= and the uberjar in =target=. You can update the version (and SCM tag)\ninformation in generated =pom.xml= by updating =build.clj=.\n\nIf you don't want the =pom.xml= file in your project, you can remove it. The =ci= task will\nstill generate a minimal =pom.xml= as part of the =uber= task, unless you remove =version=\nfrom =build.clj=.\n\nRun that uberjar:\n\n#+begin_src\n    $ java -jar target/dqt-0.1.0-SNAPSHOT.jar\n#+end_src\n\nIf you remove =version= from =build.clj=, the uberjar will become =target/dqt-standalone.jar=.\n\n* Options\n\nFIXME: listing of options this app accepts.\n\n* Examples\n\n** datasource file datasource.edn\n#+begin_src clojure\n {:dbtype     \"postgresql\"\n :dbname     \"postgres\"\n :host       #or [#env DATABASE_HOSTNAME \"localhost\"]\n :user       \"postgres\"\n :password   \"postgres\"\n :ssl        false\n :classname  \"org.postgres.Driver\"\n :sslfactory \"org.postgresql.ssl.NonValidatingFactory\"}\n#+end_src\n\n** table file employees.edn\n\n#+begin_src clojure\n{:table-name :employees\n :metrics    [:row-count\n              :avg-length\n              :max-length\n              :min-length\n              :avg\n              :sum\n              :max\n              :min\n              :stddev\n              :variance]\n\n :tests      [[:row-count \u003e 10]\n              [:avg-length-phone-number \u003c 13]\n              [:stddev-salary \u003e 4500]\n              [:sum-salary \u003e 20000]\n              [:max-length-email \u003c 30]]}\n#+end_src\n\n** Run\n\n#+begin_src shell\n  clj -M:dev:run run -d datasource.edn -t tables/employees.edn\n#+end_src\n\n#+begin_src shell\n  bb run-example\n#+end_src\n\n* Development\n\n** Run development mode with babashka\n\n#+begin_src shell\n bb dev\n#+end_src\n\n** Test database\n\nRun =docker compose up= to have postgress running\n\n** Run migraion\n\n#+begin_src shell\n  bb migrate\n#+end_src\n\n** Run test\n\n#+begin_src shell\n  clj -M:dev:test\n  clj -M:dev:test --watch\n  bb test\n  bb test:watch\n#+end_src\n\n#+begin_src\n  $ bin/koacha\n#+end_src\n\n#+begin_src\n  $ bin/koacha --watch\n#+end_src\n\n* References\n\n- https://www.sweettooth.dev/endpoint/dev/architecture/integrant-tutorial.html\n\n* License\n\nCopyright © 2021 Warfox\n\nDistributed under the MIT License.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarfox%2Fdqt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwarfox%2Fdqt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwarfox%2Fdqt/lists"}