Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/spektom/data-formats-samples
Spark-based different data formats samples generator
https://github.com/spektom/data-formats-samples
avro json orc parquet spark
Last synced: 26 days ago
JSON representation
Spark-based different data formats samples generator
- Host: GitHub
- URL: https://github.com/spektom/data-formats-samples
- Owner: spektom
- Created: 2020-02-15T19:56:52.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-02-24T07:20:03.000Z (almost 5 years ago)
- Last Synced: 2024-11-19T13:54:18.893Z (3 months ago)
- Topics: avro, json, orc, parquet, spark
- Language: Scala
- Homepage:
- Size: 33.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
data-formats-samples
====================Apache Spark-based generator for data samples in different formats with different compressions.
This work was derived from Apache Spark's unit test suite.Supported formats:
- orc
- parquet
- avro
- json
- csv
- tsv
- psv
Supported compressions are all that are supported by Spark and current format.
## Buildingmvn package
## Runningspark-submit target/data-foramts-samples_2.11-0.1.0-uberjar.jar
After running, a folder named `output` will contain the data files.## Configuration
The utility accepts various command line arguments that tune some of its parameters.
To see the list of available options, run:spark-submit target/data-foramts-samples_2.11-0.1.0-uberjar.jar --help