https://github.com/oscaro/ds-test-tools
Small library to help test Datasplash pipelines.
https://github.com/oscaro/ds-test-tools
clojure-library dataflow
Last synced: 9 months ago
JSON representation
Small library to help test Datasplash pipelines.
- Host: GitHub
- URL: https://github.com/oscaro/ds-test-tools
- Owner: oscaro
- License: epl-1.0
- Created: 2019-04-10T08:20:56.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2024-10-04T12:38:00.000Z (over 1 year ago)
- Last Synced: 2025-04-15T10:24:07.803Z (10 months ago)
- Topics: clojure-library, dataflow
- Language: Clojure
- Homepage: https://cljdoc.org/d/com.oscaro/ds-test-tools/0.1.1/doc/readme
- Size: 39.1 KB
- Stars: 1
- Watchers: 16
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# ds-test-tools [](https://github.com/oscaro/ds-test-tools/actions/workflows/clojure.yml) [](https://clojars.org/com.oscaro/ds-test-tools)
`ds-test-tools` is a small library to help test [Datasplash][] pipelines.
[Datasplash]: https://github.com/ngrunwald/datasplash
## Usage
```clojure
[com.oscaro/ds-test-tools "0.2.1"]
```
Then:
```clojure
(ns your.project
(:require [ds-test-tools.core :as dt]))
```
The only function you need is `dt/run-pipeline`.
It takes inputs as Clojure data, mapping of keys to result files, and a
function to call in order to build the pipeline. It dumps the input data in the
appropriate files; builds a configuration map; pass it to your function; run
the pipeline; collect the results; and return them to you.
### Simple Usage
```clojure
;; Your pipeline
(defn my-job [conf p]
(->> p
(ds/read-edn-file (:numbers conf))
(ds/map inc)
(ds/write-edn-file (str (:output conf) "/higher.edn"))))
(let [{:keys [result]} (dt/run-pipeline
{:numbers [1 2 3 4]}
{:result "higher"}
my-job)]
(println (sort result))) ; '(2 3 4 5)
```
#### Specifying inputs
The inputs config map uses the same format as your configuration map. Your
build function should take a map of keywords to file paths:
```clojure
(defn my-job [{:keys [people houses output]} p]
(let [people (ds/read-edn-file people p)
houses (ds/read-edn-file houses p)]
(->> (ds/join-by (fn [p h] [p :lives-in h])
[[people :house-id {:type :required}]
[houses :id {:type :required}]])
(ds/write-edn-file (tio/join-path output "housing.edn")))))
```
The pipeline above would use the following inputs config map:
```clojure
{:people [{:name "John" :house-id 1} {:name "Jane" :house-id 2} ...]
:houses [{:id 1 :name "Red House"} {:id 2 :name "Green House"} ...]}
```
#### Changing the input/output format
By default, it assumes you use EDN as inputs and outputs. You can change that
by setting the `:reader` (used to read outputs) and `:writer` (used to write
inputs) keys in the optional options map:
```clojure
(dt/run-pipeline
{:reader :jsons ; or :edns (the default)
:writer :jsons} ; or :edns (the default)
inputs-config
outputs-config)
```
If you have mixed formats or something else than EDN/JSONS, you can also
provide a function of two (readers) or three (writers) arguments:
```clojure
(dt/run-pipeline
{:reader (fn [k filename] ; k is the key in your inputs-config map
(case k
:my-jsons-output (tio/read-jsons-file filename)
:my-edns-output (tio/read-edns-file filename)))
:writer (fn [k filename data]
(case k
:my-jsons-input (tio/write-jsons-file filename data)
:my-csv-input (tio/write-csv-file filename data)
:my-text-input (tio/write-text-file filename data)))}
inputs-config
outputs-config)
```
## License
Copyright © 2018-2019 Oscaro
Distributed under the Eclipse Public License either version 1.0 or (at your
option) any later version.