Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/music-of-the-ainur/quenya-dsl
Quenya-DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data
https://github.com/music-of-the-ainur/quenya-dsl
Last synced: 2 months ago
JSON representation
Quenya-DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data
- Host: GitHub
- URL: https://github.com/music-of-the-ainur/quenya-dsl
- Owner: music-of-the-ainur
- License: apache-2.0
- Created: 2019-11-09T01:30:15.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-09-22T09:58:55.000Z (over 1 year ago)
- Last Synced: 2023-09-23T13:03:46.372Z (over 1 year ago)
- Language: Scala
- Homepage:
- Size: 119 KB
- Stars: 9
- Watchers: 4
- Forks: 14
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Quenya-DSL
[![Build-Status](https://github.com/music-of-the-ainur/quenya-dsl/actions/workflows/quenya-dsl-githubactions.yml/badge.svg)](https://github.com/music-of-the-ainur/quenya-dsl/actions/workflows/quenya-dsl-githubactions.yml)
Adding Quenya-DSL dependency to your sbt build:
```
libraryDependencies += "com.github.music-of-the-ainur" %% "quenya-dsl" % "1.2.2-$SPARK_VERSION"
```To run in spark-shell:
```
spark-shell --packages "com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-$SPARK_VERSION"
```Quenya-Dsl is available in [Maven Central](https://mvnrepository.com/artifact/com.github.music-of-the-ainur)
repository.| versions | Connector Artifact |
|----------------------------|-----------------------------------------------------------|
| Spark 3.5.x and scala 2.13 | `com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.5` |
| Spark 3.5.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.5` |
| Spark 3.4.x and scala 2.13 | `com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.4` |
| Spark 3.4.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.4` |
| Spark 3.3.x and scala 2.13 | `com.github.music-of-the-ainur:quenya-dsl_2.13:1.2.2-3.3` |
| Spark 3.3.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.3` |
| Spark 3.2.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.2` |
| Spark 3.1.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-3.1` |
| Spark 2.4.x and scala 2.12 | `com.github.music-of-the-ainur:quenya-dsl_2.12:1.2.2-2.4` |
| Spark 2.4.x and scala 2.11 | `com.github.music-of-the-ainur:quenya-dsl_2.11:1.2.2-2.4` |## Introduction
Quenya-DSL(Domain Specific Language) is a language that simplifies the task to parser complex semi-structured data.```scala
val inputDf: DataFrame = ...
val quenyaDsl = QuenyaDSL
val dsl = quenyaDsl.compile("""
|uuid$id:StringType
|id$id:LongType
|code$area_code:LongType
|names@name
| name.firstName$first_name:StringType
| name.secondName$second_name:StringType
| name.lastName$last_name:StringType
|source_id$source_id:LongType
|address[3]$zipcode:StringType""".stripMargin)
val df:DataFrame = quenyaDsl.execute(dsl,inputDf)
df.show(false)
```## Operators
### $ i.e DOLLAR
Operator **$** i.e **dollar** is used to select.Example:
DSL
```
name.nameOne$firstName:StringType
name.nickNames[0]$firstNickName:StringType
```JSON
```json
{
"name":{
"nameOne":"Mithrandir",
"LastName":"Olórin",
"nickNames":[
"Gandalf the Grey",
"Gandalf the White"
]
},
"race":"Maiar",
"age":"immortal",
"weapons":[
"Glamdring",
"Narya",
"Wizard Staff"
]
}
```Output:
```
+----------+----------------+
|firstName |firstNickName |
+----------+----------------+
|Mithrandir|Gandalf the Grey|
+----------+----------------+
```
### @ i.e AT
Operator **@** i.e **at** is used to explode arrays, "space" or "tab" is used to define the precedence.Example:
DSL
```
weapons@weapon
weapon$weapon:StringType
```JSON
```json
{
"name":{
"nameOne":"Mithrandir",
"LastName":"Olórin",
"nickNames":[
"Gandalf the Grey",
"Gandalf the White"
]
},
"race":"Maiar",
"age":"immortal",
"weapons":[
"Glamdring",
"Narya",
"Wizard Staff"
]
}
```Output:
```
+------------+
|weapon |
+------------+
|Glamdring |
|Narya |
|Wizard Staff|
+------------+
```
## Supported Types* FloatType
* BinaryType
* ByteType
* BooleanType
* StringType
* TimestampType
* DecimalType
* DoubleType
* IntegerType
* LongType
* ShortType## DSL Generator
You can generate the DSL from an existing DataFrame:
```scala
import com.github.music.of.the.ainur.quenya.QuenyaDSLval df:DataFrame = ...
val quenyaDsl = QuenyaDSL
quenyaDsl.printDsl(df)
```### getDsl
You can generate and asssign a DSL to variable based on a DataFrame:```scala
import com.github.music.of.the.ainur.quenya.QuenyaDSLval df:DataFrame = ...
val quenyaDsl = QuenyaDSL
val dsl = quenyaDsl.getDsl(df)
```json:
```
{
"name":{
"nameOne":"Mithrandir",
"LastName":"Olórin",
"nickNames":[
"Gandalf the Grey",
"Gandalf the White"
]
},
"race":"Maiar",
"age":"immortal",
"weapon":[
"Glamdring",
"Narya",
"Wizard Staff"
]
}
```output:
```
age$age:StringType
name.LastName$name_LastName:StringType
name.nameOne$name_nameOne:StringType
name.nickNames@name_nickNames
name_nickNames$name_nickNames:StringType
race$race:StringType
weapon@weapon
weapon$weapon:StringType
```You can _alias_ using the fully qualified name using ```printDsl(df,true)```, you should turn on in case of name conflict.
## How to Handle Special Characters
Use the literal backtick **``** to handle special characters like space,semicolon,hyphen and colon.
Example:json:
```
{
"name":{
"name One":"Mithrandir",
"Last-Name":"Olórin",
"nick:Names":[
"Gandalf the Grey",
"Gandalf the White"
]
},
"race":"Maiar",
"age":"immortal",
"weapon;name":[
"Glamdring",
"Narya",
"Wizard Staff"
]
}
```DSL:
```
age$age:StringType
`name.Last-Name`$`Last-Name`:StringType
`name.name One`$`name-One`:StringType
`name.nick:Names`@`nick:Names`
`nick:Names`$`nick:Names`:StringType
race$race:StringType
`weapon;name`@`weapon;name`
`weapon;name`$`weapon_name`:StringType
```## Backus–Naur form
```
::= \{"[\r\n]*".r \}
::= "[\s\t]*".r
::= "a-zA-Z0-9_.".r [ element ]
::= "[" "\d".r "]"
::= <@> | <$>
<@> ::= @
<$> ::= $ :
::= "0-9a-zA-Z_".r
::= BinaryType | BooleanType | StringType | TimestampType | DecimalType
| DoubleType | FloatType | ByteType | IntegerType | LongType | ShortType
```## Author
Daniel Mantovani [[email protected]](mailto:[email protected])## Sponsor
[![Modak Analytics](/docs/img/modak_analytics.png)](http://www.modak.com)