An open API service indexing awesome lists of open source software.

https://github.com/gcdev373/example-spark-datasourcev2

A very simple Java implementation of the Spark DataSourceV2 API.
https://github.com/gcdev373/example-spark-datasourcev2

datasource example spark spark-sql sparksql

Last synced: about 18 hours ago
JSON representation

A very simple Java implementation of the Spark DataSourceV2 API.

Awesome Lists containing this project

README

          

# example-spark-datasourcev2
A very simple Java implementation of the Apache Spark DataSourceV2 API.

This example is compatible with **Spark 2.4.3.**

# Building
The jar file containing the DataSource is built with the following command
```text
$ mvn package
```

# Testing
The DataSource can be demonstrated from the pyspark shell.

Pyspark should be launched with the following command:
```text
$ pyspark --jars ./target/example-datasource-1.0.jar
```
You should see something like
```text
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.3
/_/

Using Python version 3.7.3 (default, Jun 24 2019 04:54:02)
SparkSession available as 'spark'.
```
Then from within the pyspark shell, type the commands below:
```text
>>> df = spark.read.format('example.ExampleDataSource').load()
>>> df.show()
```
In order to display the data provided by the DataSource
```text
+-------+---+
| name|age|
+-------+---+
| Alfie| 24|
| Bertie| 36|
|Charlie| 48|
| Debbie| 60|
| Ernie| 72|
|Frankie| 84|
| Gettie| 96|
+-------+---+

```