An open API service indexing awesome lists of open source software.

https://github.com/oscaro/ds-test-tools

Small library to help test Datasplash pipelines.
https://github.com/oscaro/ds-test-tools

clojure-library dataflow

Last synced: 9 months ago
JSON representation

Small library to help test Datasplash pipelines.

Awesome Lists containing this project

README

          

# ds-test-tools [![Clojure CI](https://github.com/oscaro/ds-test-tools/actions/workflows/clojure.yml/badge.svg)](https://github.com/oscaro/ds-test-tools/actions/workflows/clojure.yml) [![Clojars Project](https://img.shields.io/clojars/v/com.oscaro/ds-test-tools.svg)](https://clojars.org/com.oscaro/ds-test-tools)

`ds-test-tools` is a small library to help test [Datasplash][] pipelines.

[Datasplash]: https://github.com/ngrunwald/datasplash

## Usage

```clojure
[com.oscaro/ds-test-tools "0.2.1"]
```

Then:

```clojure
(ns your.project
(:require [ds-test-tools.core :as dt]))
```

The only function you need is `dt/run-pipeline`.

It takes inputs as Clojure data, mapping of keys to result files, and a
function to call in order to build the pipeline. It dumps the input data in the
appropriate files; builds a configuration map; pass it to your function; run
the pipeline; collect the results; and return them to you.

### Simple Usage

```clojure
;; Your pipeline
(defn my-job [conf p]
(->> p
(ds/read-edn-file (:numbers conf))
(ds/map inc)
(ds/write-edn-file (str (:output conf) "/higher.edn"))))

(let [{:keys [result]} (dt/run-pipeline
{:numbers [1 2 3 4]}
{:result "higher"}
my-job)]
(println (sort result))) ; '(2 3 4 5)
```

#### Specifying inputs

The inputs config map uses the same format as your configuration map. Your
build function should take a map of keywords to file paths:

```clojure
(defn my-job [{:keys [people houses output]} p]
(let [people (ds/read-edn-file people p)
houses (ds/read-edn-file houses p)]
(->> (ds/join-by (fn [p h] [p :lives-in h])
[[people :house-id {:type :required}]
[houses :id {:type :required}]])
(ds/write-edn-file (tio/join-path output "housing.edn")))))
```

The pipeline above would use the following inputs config map:

```clojure
{:people [{:name "John" :house-id 1} {:name "Jane" :house-id 2} ...]
:houses [{:id 1 :name "Red House"} {:id 2 :name "Green House"} ...]}
```

#### Changing the input/output format

By default, it assumes you use EDN as inputs and outputs. You can change that
by setting the `:reader` (used to read outputs) and `:writer` (used to write
inputs) keys in the optional options map:

```clojure
(dt/run-pipeline
{:reader :jsons ; or :edns (the default)
:writer :jsons} ; or :edns (the default)
inputs-config
outputs-config)
```

If you have mixed formats or something else than EDN/JSONS, you can also
provide a function of two (readers) or three (writers) arguments:

```clojure
(dt/run-pipeline
{:reader (fn [k filename] ; k is the key in your inputs-config map
(case k
:my-jsons-output (tio/read-jsons-file filename)
:my-edns-output (tio/read-edns-file filename)))

:writer (fn [k filename data]
(case k
:my-jsons-input (tio/write-jsons-file filename data)
:my-csv-input (tio/write-csv-file filename data)
:my-text-input (tio/write-text-file filename data)))}
inputs-config
outputs-config)
```

## License

Copyright © 2018-2019 Oscaro

Distributed under the Eclipse Public License either version 1.0 or (at your
option) any later version.