Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tupol/spark-tools

Executable Apache Spark Tools: Format Converter & SQL Processor
https://github.com/tupol/spark-tools

apache-spark converts format-converter scala spark sql tools

Last synced: 3 months ago
JSON representation

Executable Apache Spark Tools: Format Converter & SQL Processor

Awesome Lists containing this project

README

        

# Spark Tools #

[![Maven Central](https://img.shields.io/maven-central/v/org.tupol/spark-tools_2.11.svg)][maven-central]  
[![GitHub](https://img.shields.io/github/license/tupol/spark-tools.svg)][license]  
[![Travis (.org)](https://img.shields.io/travis/tupol/spark-tools.svg)][travis.org]  
[![Codecov](https://img.shields.io/codecov/c/github/tupol/spark-tools.svg)][codecov]  
[![Javadocs](https://www.javadoc.io/badge/org.tupol/spark-tools_2.11.svg)][javadocs]  
[![Gitter](https://badges.gitter.im/spark-tools/community.svg)][gitter]  
[![Twitter](https://img.shields.io/twitter/url/https/_tupol.svg?color=%2317A2F2)][twitter]  

## Description ##
This project contains some basic runnable tools that can help with various tasks around a Spark based project.

The main tools available:
- [FormatConverter](docs/format-converter.md) Converts any acceptable file format into a different
file format, providing also partitioning support.
- [SimpleSqlProcessor](docs/sql-processor.md) Applies a given SQL to the input files which are
being mapped into tables.
- [StreamingFormatConverter](docs/streaming-format-converter.md) Converts any acceptable data
stream format into a different data stream format, providing also partitioning support.
- [SimpleFileStreamingSqlProcessor](docs/file-streaming-sql-processor.md) Applies a given SQL to the input files streams which are being mapped into file output streams.

This project is also trying to create and encourage a friendly yet professional environment
for developers to help each other, so please do no be shy and join through [gitter], [twitter],
[issue reports](https://github.com/tupol/spark-tools/issues/new/choose) or pull requests.

## Prerequisites ##

* Java 8 or higher
* Scala 2.11 or 2.12
* Apache Spark 2.4.X

## Getting Spark Tools ##

Spark Tools is published to [Maven Central][maven-central] and [Spark Packages][spark-packages]:

where the latest artifacts can be found.

- Group id / organization: `org.tupol`
- Artifact id / name: `spark-tools`
- Latest version is `0.4.1`

Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:

```scala
libraryDependencies += "org.tupol" %% "spark-tools" % "0.4.1"
```

Include this package in your Spark Applications using `spark-shell` or `spark-submit`
with Scala 2.11
```bash
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.4.1
```
or with Scala 2.12
```bash
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.12:0.4.1
```

## What's new? ##

**0.4.1**

- Added `StreamingFormatConverter`
- Added `FileStreamingSqlProcessor`, `SimpleFileStreamingSqlProcessor`
- Bumped `spark-utils` dependency to `0.4.2`
- The project compiles with both Scala `2.11.12` and `2.12.12`
- Updated Apache Spark to `2.4.6`
- Updated `delta.io` to `0.6.1`
- Updated the `spark-xml` library to `0.10.0`
- Removed the `com.databricks:spark-avro` dependency, as avro support is now built into Apache Spark
- Updated the `spark-utils` dependency to the latest available snapshot

For previous versions please consult the [release notes](RELEASE-NOTES.md).

## License ##

This code is open source software licensed under the [MIT License](LICENSE).

[scala]: https://scala-lang.org/
[spark]: https://spark.apache.org/
[maven-central]: https://mvnrepository.com/artifact/org.tupol/spark-tools
[spark-packages]: https://spark-packages.org/package/tupol/spark-tools
[license]: https://github.com/tupol/spark-tools/blob/master/LICENSE
[travis.org]: https://travis-ci.com/tupol/spark-tools
[codecov]: https://codecov.io/gh/tupol/spark-tools
[javadocs]: https://www.javadoc.io/doc/org.tupol/spark-tools_2.11
[gitter]: https://gitter.im/spark-tools/community
[twitter]: https://twitter.com/_tupol