Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tupol/spark-tools
Executable Apache Spark Tools: Format Converter & SQL Processor
https://github.com/tupol/spark-tools
apache-spark converts format-converter scala spark sql tools
Last synced: 3 months ago
JSON representation
Executable Apache Spark Tools: Format Converter & SQL Processor
- Host: GitHub
- URL: https://github.com/tupol/spark-tools
- Owner: tupol
- License: mit
- Created: 2018-12-24T15:04:34.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-07-21T15:48:15.000Z (over 1 year ago)
- Last Synced: 2023-07-21T20:37:08.822Z (over 1 year ago)
- Topics: apache-spark, converts, format-converter, scala, spark, sql, tools
- Language: Scala
- Homepage:
- Size: 125 KB
- Stars: 11
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Spark Tools #
[![Maven Central](https://img.shields.io/maven-central/v/org.tupol/spark-tools_2.11.svg)][maven-central]
[![GitHub](https://img.shields.io/github/license/tupol/spark-tools.svg)][license]
[![Travis (.org)](https://img.shields.io/travis/tupol/spark-tools.svg)][travis.org]
[![Codecov](https://img.shields.io/codecov/c/github/tupol/spark-tools.svg)][codecov]
[![Javadocs](https://www.javadoc.io/badge/org.tupol/spark-tools_2.11.svg)][javadocs]
[![Gitter](https://badges.gitter.im/spark-tools/community.svg)][gitter]
[![Twitter](https://img.shields.io/twitter/url/https/_tupol.svg?color=%2317A2F2)][twitter]## Description ##
This project contains some basic runnable tools that can help with various tasks around a Spark based project.The main tools available:
- [FormatConverter](docs/format-converter.md) Converts any acceptable file format into a different
file format, providing also partitioning support.
- [SimpleSqlProcessor](docs/sql-processor.md) Applies a given SQL to the input files which are
being mapped into tables.
- [StreamingFormatConverter](docs/streaming-format-converter.md) Converts any acceptable data
stream format into a different data stream format, providing also partitioning support.
- [SimpleFileStreamingSqlProcessor](docs/file-streaming-sql-processor.md) Applies a given SQL to the input files streams which are being mapped into file output streams.This project is also trying to create and encourage a friendly yet professional environment
for developers to help each other, so please do no be shy and join through [gitter], [twitter],
[issue reports](https://github.com/tupol/spark-tools/issues/new/choose) or pull requests.## Prerequisites ##
* Java 8 or higher
* Scala 2.11 or 2.12
* Apache Spark 2.4.X## Getting Spark Tools ##
Spark Tools is published to [Maven Central][maven-central] and [Spark Packages][spark-packages]:
where the latest artifacts can be found.
- Group id / organization: `org.tupol`
- Artifact id / name: `spark-tools`
- Latest version is `0.4.1`Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:
```scala
libraryDependencies += "org.tupol" %% "spark-tools" % "0.4.1"
```Include this package in your Spark Applications using `spark-shell` or `spark-submit`
with Scala 2.11
```bash
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.4.1
```
or with Scala 2.12
```bash
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.12:0.4.1
```## What's new? ##
**0.4.1**
- Added `StreamingFormatConverter`
- Added `FileStreamingSqlProcessor`, `SimpleFileStreamingSqlProcessor`
- Bumped `spark-utils` dependency to `0.4.2`
- The project compiles with both Scala `2.11.12` and `2.12.12`
- Updated Apache Spark to `2.4.6`
- Updated `delta.io` to `0.6.1`
- Updated the `spark-xml` library to `0.10.0`
- Removed the `com.databricks:spark-avro` dependency, as avro support is now built into Apache Spark
- Updated the `spark-utils` dependency to the latest available snapshotFor previous versions please consult the [release notes](RELEASE-NOTES.md).
## License ##
This code is open source software licensed under the [MIT License](LICENSE).
[scala]: https://scala-lang.org/
[spark]: https://spark.apache.org/
[maven-central]: https://mvnrepository.com/artifact/org.tupol/spark-tools
[spark-packages]: https://spark-packages.org/package/tupol/spark-tools
[license]: https://github.com/tupol/spark-tools/blob/master/LICENSE
[travis.org]: https://travis-ci.com/tupol/spark-tools
[codecov]: https://codecov.io/gh/tupol/spark-tools
[javadocs]: https://www.javadoc.io/doc/org.tupol/spark-tools_2.11
[gitter]: https://gitter.im/spark-tools/community
[twitter]: https://twitter.com/_tupol