Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andr83/io.parsek
Scala library for building ETL pipelines in functional way.
https://github.com/andr83/io.parsek
ast etl json scala
Last synced: 23 days ago
JSON representation
Scala library for building ETL pipelines in functional way.
- Host: GitHub
- URL: https://github.com/andr83/io.parsek
- Owner: andr83
- Created: 2017-05-03T16:24:36.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-09-19T11:32:41.000Z (about 6 years ago)
- Last Synced: 2024-10-02T03:03:08.648Z (about 1 month ago)
- Topics: ast, etl, json, scala
- Language: Scala
- Homepage:
- Size: 201 KB
- Stars: 2
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# io.parsek
[![Build Status](https://travis-ci.org/andr83/io.parsek.svg?branch=master)](https://travis-ci.org/andr83/io.parsek)
[![codecov](https://codecov.io/gh/andr83/io.parsek/branch/master/graph/badge.svg)](https://codecov.io/gh/andr83/io.parsek)Parsek is a Scala library for building ETL pipelines in functional way.
## Overview
The main goal is to provide tools to work with data in generic form independently of source or target formats. For the initial idea was taken JSON AST and methods to work with it from libraries like [Circe](https://circe.github.io/circe/) and [Play Json](https://www.playframework.com/documentation/latest/ScalaJson).So why not original Circe and JSON? The main problem with JSON is limited type support. For example there missing important for ETL types like `Date`, `DateTime`, `Byte Array`. Also common tasks in ETL are data cleaning, validation and transforming from one form to another. Circe and especially Play will required a lot of boilerplate code.
Parsek has modular architecture with minimum external dependencies. Yes we know what is dependency hell! It explains why no dependency on [Scalaz](https://github.com/scalaz/scalaz)/[Cats](https://github.com/typelevel/cats) or [Monocle](http://julien-truffaut.github.io/Monocle/).
`Core` module focusing on AST, data encoding/decoding, schema definition and validation.
`Jackson` support JSON serialisation/deserialisation.
`Shapeless` for automatic generic types (case classes) derivation.
`JDBC` provide utilities to simplify communication with jdbc source.
## Quick start
```scala
val parsekVersion = "0.2.0"// for >= Scala 2.10.6, 2.11.x, 2.12.x
libraryDependencies ++= Seq(
"io.parsek" %% "parsek-core",
"io.parsek" %% "parsek-jackson",
"io.parsek" %% "parsek-shapeless",
"io.parsek" %% "parsek-jdbc"
).map(_ % parsekVersion)
```In Scala REPL console:
```scala
import io.parsek._, io.parsek.implicits._
import io.parsek.shapeless.implicits._case class Foo(x: Int, y: String)
// defined class Fooval foo = Foo(42, "hello")
// foo: Foo = Foo(42,hello)val pv = foo.toPValue
// converting case class to AST PValue representation
// io.parsek.PValue = PMap(Map('y -> PString(hello), 'x -> PInt(42)))root.x.as[Int].modify(_ * 100)(pv)
// use lens with Dynamics support to modify PValue.
// res: io.parsek.PResult[io.parsek.PValue] = PSuccess(PMap(Map('y -> PString(hello), 'x -> PInt(4200))),List())import io.parsek.optics.Projection
// import AST projectionval p = Projection(
| 'x -> root.y.as[String],
| 'z -> root.x.as[Int],
| 's -> Projection(
| 'x -> root.x.as[Int],
| 'y -> root.y.map[String, String](_.toUpperCase).as[String]
| )
| )
// create projectionval pv2 = p.get(pv).unsafe
// apply projection
// pv2: io.parsek.PValue = PMap(Map('x -> PString(hello), 'z -> PInt(42), 's -> PMap(Map('x -> PInt(42), 'y -> PString(HELLO)))))case class Bar(x: String, z: Int, s: Foo)
// defined class Barpv2.as[Bar]
// io.parsek.PResult[Bar] = PSuccess(Bar(hello,42,Foo(42,HELLO)),List())import io.parsek.jackson._
// import JSON moduleval serde = JsonSerDe()
// serde:io.parsek.jackson.JsonSerDe = JsonSerDe(com.fasterxml.jackson.databind.ObjectMapper@ace16b)serde.write(pv2)
//res: String = {"x":"hello","z":42,"s":{"x":42,"y":"HELLO"}}```
## License
MIT License
Copyright (c) 2018 Andrei Tupitcyn