Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/warfox/dqt
Data Quality Tool
https://github.com/warfox/dqt
clojure data-quality data-reliability
Last synced: about 2 months ago
JSON representation
Data Quality Tool
- Host: GitHub
- URL: https://github.com/warfox/dqt
- Owner: WarFox
- License: mit
- Created: 2022-01-02T22:25:31.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-07-07T20:25:22.000Z (over 1 year ago)
- Last Synced: 2024-04-08T16:08:23.304Z (9 months ago)
- Topics: clojure, data-quality, data-reliability
- Language: Clojure
- Homepage:
- Size: 120 KB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.org
- Changelog: CHANGELOG.org
- License: LICENSE
Awesome Lists containing this project
README
#+title: dqt - Data Quality Tool
A simple data quality tool. Collect and publish metrics about quality of data anywhere.
* Docs
1. [[./docs/dimensions.org][Dimensions of data quality]]
* Features [2/12]
- [X] get metrics
- [X] run tests
- [ ] publish metrics to aws
- [ ] publish metrics to prometheus
- [ ] publish metadata to DataHub
- [ ] build dashboards
- [ ] alets based on CloudWatch
- [ ] other cloud providers?
- [ ] multiple data sources
- [ ] example dag with airflow
- [ ] example with prefect
- [ ] re-conciliation between two data sources, % missing, matching columns vs mismatch* Installation
Download from https://github.com/warfox/dqt
* Usage
`dqt` is a command line tool that runs on JVM.
Make sure you have the jdbc drivers in classpath.
#+begin_src
java -jar dqt.jar run -d datasource.edn -t table.edn
#+end_src#+begin_src
java -cp "/path/to/jdbc/driver/jar/:./dqt.jar" dqt.core run -d examples/postgres.edn -t examples/tables/employees.edn
#+end_src** datasource.edn
** table.edn
* Development
Run the project directly, via `:main-opts` (`-m dqt.core`):
#+begin_src
$ clojure -M:run
#+end_srcRun the project, with parameters
#+begin_src
$ clojure -M:run -d datasource.edn -t table.edn
#+end_srcRun the project's tests (they'll fail until you edit them):
#+begin_src
$ clojure -T:build test
#+end_src#+begin_src
$ ./bin/kaocha
#+end_srcBuild uberjar
#+begin_src
$ clojure -T:build uberjar
#+end_srcThis will produce an updated =pom.xml= file with synchronized dependencies inside the =META-INF=
directory inside =target/classes= and the uberjar in =target=. You can update the version (and SCM tag)
information in generated =pom.xml= by updating =build.clj=.If you don't want the =pom.xml= file in your project, you can remove it. The =ci= task will
still generate a minimal =pom.xml= as part of the =uber= task, unless you remove =version=
from =build.clj=.Run that uberjar:
#+begin_src
$ java -jar target/dqt-0.1.0-SNAPSHOT.jar
#+end_srcIf you remove =version= from =build.clj=, the uberjar will become =target/dqt-standalone.jar=.
* Options
FIXME: listing of options this app accepts.
* Examples
** datasource file datasource.edn
#+begin_src clojure
{:dbtype "postgresql"
:dbname "postgres"
:host #or [#env DATABASE_HOSTNAME "localhost"]
:user "postgres"
:password "postgres"
:ssl false
:classname "org.postgres.Driver"
:sslfactory "org.postgresql.ssl.NonValidatingFactory"}
#+end_src** table file employees.edn
#+begin_src clojure
{:table-name :employees
:metrics [:row-count
:avg-length
:max-length
:min-length
:avg
:sum
:max
:min
:stddev
:variance]:tests [[:row-count > 10]
[:avg-length-phone-number < 13]
[:stddev-salary > 4500]
[:sum-salary > 20000]
[:max-length-email < 30]]}
#+end_src** Run
#+begin_src shell
clj -M:dev:run run -d datasource.edn -t tables/employees.edn
#+end_src#+begin_src shell
bb run-example
#+end_src* Development
** Run development mode with babashka
#+begin_src shell
bb dev
#+end_src** Test database
Run =docker compose up= to have postgress running
** Run migraion
#+begin_src shell
bb migrate
#+end_src** Run test
#+begin_src shell
clj -M:dev:test
clj -M:dev:test --watch
bb test
bb test:watch
#+end_src#+begin_src
$ bin/koacha
#+end_src#+begin_src
$ bin/koacha --watch
#+end_src* References
- https://www.sweettooth.dev/endpoint/dev/architecture/integrant-tutorial.html
* License
Copyright © 2021 Warfox
Distributed under the MIT License.