Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/absaoss/spark-data-standardization
A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.
https://github.com/absaoss/spark-data-standardization
data-quality data-structures scala schema spark
Last synced: about 2 months ago
JSON representation
A library for Spark that helps to stadardize any input data (DataFrame) to adhere to the provided schema.
- Host: GitHub
- URL: https://github.com/absaoss/spark-data-standardization
- Owner: AbsaOSS
- License: apache-2.0
- Created: 2021-11-15T18:56:23.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2024-10-24T09:34:22.000Z (2 months ago)
- Last Synced: 2024-10-25T07:14:31.882Z (2 months ago)
- Topics: data-quality, data-structures, scala, schema, spark
- Language: Scala
- Homepage:
- Size: 393 KB
- Stars: 4
- Watchers: 13
- Forks: 1
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Spark Data Standardization Library
[![License](http://img.shields.io/:license-apache-blue.svg)](http://www.apache.org/licenses/LICENSE-2.0.html)
[![Release](https://github.com/AbsaOSS/spark-data-standardization/actions/workflows/release.yml/badge.svg)](https://github.com/AbsaOSS/spark-data-standardization/actions/workflows/release.yml)- Dataframe in
- Standardized Dataframe out## Usage
### Needed Provided Dependencies
The library needs following dependencies to be included in your project
```sbt
"org.apache.spark" %% "spark-core" % SPARK_VERSION,
"org.apache.spark" %% "spark-sql" % SPARK_VERSION,
"za.co.absa" %% s"spark-commons-spark${SPARK_MAJOR}.${SPARK_MINOR}" % "0.6.1",
```### Usage in SBT:
```sbt
"za.co.absa" %% "spark-data-standardization" % VERSION
```### Usage in Maven
### Scala 2.11 [![Maven Central](https://maven-badges.herokuapp.com/maven-central/za.co.absa/spark-data-standardization_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/za.co.absa/spark-data-standardization_2.11)
```xml
za.co.absa
spark-data-standardization_2.11
${latest_version}```
### Scala 2.12 [![Maven Central](https://maven-badges.herokuapp.com/maven-central/za.co.absa/spark-data-standardization_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/za.co.absa/spark-data-standardization_2.12)
```xml
za.co.absa
spark-data-standardization_2.12
${latest_version}```
### Scala 2.13 [![Maven Central](https://maven-badges.herokuapp.com/maven-central/za.co.absa/spark-data-standardization_2.13/badge.svg)](https://maven-badges.herokuapp.com/maven-central/za.co.absa/spark-data-standardization_2.13)
```xml
za.co.absa
spark-data-standardization_2.13
${latest_version}```
Spark and Scala compatibility
>| | Scala 2.11 | Scala 2.12 | Scala 2.13 |
>|---|---|---|---|
>|Spark| 2.4.7 | 3.2.1 | 3.2.1 |## How to Release
Please see [this file](RELEASE.md) for more details.
## How to generate Code coverage report
```sbt
sbt ++ jacoco
```
Code coverage will be generated on path:
```
{project-root}/target/scala-{scala_version}/jacoco/report/html
```