https://github.com/oscaro/ds-test-tools

Small library to help test Datasplash pipelines.
https://github.com/oscaro/ds-test-tools

clojure-library dataflow

Last synced: 11 months ago
JSON representation

Small library to help test Datasplash pipelines.

Host: GitHub
URL: https://github.com/oscaro/ds-test-tools
Owner: oscaro
License: epl-1.0
Created: 2019-04-10T08:20:56.000Z (almost 7 years ago)
Default Branch: master
Last Pushed: 2024-10-04T12:38:00.000Z (over 1 year ago)
Last Synced: 2025-04-15T10:24:07.803Z (11 months ago)
Topics: clojure-library, dataflow
Language: Clojure
Homepage: https://cljdoc.org/d/com.oscaro/ds-test-tools/0.1.1/doc/readme
Size: 39.1 KB
Stars: 1
Watchers: 16
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # ds-test-tools [![Clojure CI](https://github.com/oscaro/ds-test-tools/actions/workflows/clojure.yml/badge.svg)](https://github.com/oscaro/ds-test-tools/actions/workflows/clojure.yml) [![Clojars Project](https://img.shields.io/clojars/v/com.oscaro/ds-test-tools.svg)](https://clojars.org/com.oscaro/ds-test-tools)

`ds-test-tools` is a small library to help test [Datasplash][] pipelines.

[Datasplash]: https://github.com/ngrunwald/datasplash

## Usage

```clojure

[com.oscaro/ds-test-tools "0.2.1"]

```

Then:

```clojure

(ns your.project

  (:require [ds-test-tools.core :as dt]))

```

The only function you need is `dt/run-pipeline`.

It takes inputs as Clojure data, mapping of keys to result files, and a

function to call in order to build the pipeline. It dumps the input data in the

appropriate files; builds a configuration map; pass it to your function; run

the pipeline; collect the results; and return them to you.

### Simple Usage

```clojure

;; Your pipeline

(defn my-job [conf p]

  (->> p

    (ds/read-edn-file (:numbers conf))

    (ds/map inc)

    (ds/write-edn-file (str (:output conf) "/higher.edn"))))

(let [{:keys [result]} (dt/run-pipeline

                         {:numbers [1 2 3 4]}

                         {:result "higher"}

                         my-job)]

  (println (sort result))) ; '(2 3 4 5)

```

#### Specifying inputs

The inputs config map uses the same format as your configuration map. Your

build function should take a map of keywords to file paths:

```clojure

(defn my-job [{:keys [people houses output]} p]

  (let [people (ds/read-edn-file people p)

        houses (ds/read-edn-file houses p)]

    (->> (ds/join-by (fn [p h] [p :lives-in h])

                     [[people :house-id {:type :required}]

                      [houses :id {:type :required}]])

         (ds/write-edn-file (tio/join-path output "housing.edn")))))

```

The pipeline above would use the following inputs config map:

```clojure

{:people [{:name "John" :house-id 1} {:name "Jane" :house-id 2} ...]

 :houses [{:id 1 :name "Red House"} {:id 2 :name "Green House"} ...]}

```

#### Changing the input/output format

By default, it assumes you use EDN as inputs and outputs. You can change that

by setting the `:reader` (used to read outputs) and `:writer` (used to write

inputs) keys in the optional options map:

```clojure

(dt/run-pipeline

    {:reader :jsons  ; or :edns (the default)

     :writer :jsons} ; or :edns (the default)

    inputs-config

    outputs-config)

```

If you have mixed formats or something else than EDN/JSONS, you can also

provide a function of two (readers) or three (writers) arguments:

```clojure

(dt/run-pipeline

    {:reader (fn [k filename] ; k is the key in your inputs-config map

               (case k

                 :my-jsons-output (tio/read-jsons-file filename)

                 :my-edns-output (tio/read-edns-file filename)))

     :writer (fn [k filename data]

               (case k

                 :my-jsons-input (tio/write-jsons-file filename data)

                 :my-csv-input (tio/write-csv-file filename data)

                 :my-text-input (tio/write-text-file filename data)))}

    inputs-config

    outputs-config)

```

## License

Copyright © 2018-2019 Oscaro

Distributed under the Eclipse Public License either version 1.0 or (at your

option) any later version.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/oscaro/ds-test-tools

Awesome Lists containing this project

README