https://github.com/gcdev373/example-spark-datasourcev2
A very simple Java implementation of the Spark DataSourceV2 API.
https://github.com/gcdev373/example-spark-datasourcev2
datasource example spark spark-sql sparksql
Last synced: about 18 hours ago
JSON representation
A very simple Java implementation of the Spark DataSourceV2 API.
- Host: GitHub
- URL: https://github.com/gcdev373/example-spark-datasourcev2
- Owner: gcdev373
- License: apache-2.0
- Created: 2019-07-18T11:12:01.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2021-05-24T08:42:22.000Z (almost 5 years ago)
- Last Synced: 2025-07-06T16:41:06.678Z (10 months ago)
- Topics: datasource, example, spark, spark-sql, sparksql
- Language: Java
- Homepage:
- Size: 9.77 KB
- Stars: 4
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# example-spark-datasourcev2
A very simple Java implementation of the Apache Spark DataSourceV2 API.
This example is compatible with **Spark 2.4.3.**
# Building
The jar file containing the DataSource is built with the following command
```text
$ mvn package
```
# Testing
The DataSource can be demonstrated from the pyspark shell.
Pyspark should be launched with the following command:
```text
$ pyspark --jars ./target/example-datasource-1.0.jar
```
You should see something like
```text
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.3
/_/
Using Python version 3.7.3 (default, Jun 24 2019 04:54:02)
SparkSession available as 'spark'.
```
Then from within the pyspark shell, type the commands below:
```text
>>> df = spark.read.format('example.ExampleDataSource').load()
>>> df.show()
```
In order to display the data provided by the DataSource
```text
+-------+---+
| name|age|
+-------+---+
| Alfie| 24|
| Bertie| 36|
|Charlie| 48|
| Debbie| 60|
| Ernie| 72|
|Frankie| 84|
| Gettie| 96|
+-------+---+
```