Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yaooqinn/spark-postgres
PostgreSQL and GreenPlum Data Source for Apache Spark
https://github.com/yaooqinn/spark-postgres
greenplum postgres postgresql spark sparksql transactional
Last synced: 23 days ago
JSON representation
PostgreSQL and GreenPlum Data Source for Apache Spark
- Host: GitHub
- URL: https://github.com/yaooqinn/spark-postgres
- Owner: yaooqinn
- License: apache-2.0
- Created: 2019-03-14T07:08:34.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2024-02-21T00:23:04.000Z (9 months ago)
- Last Synced: 2024-10-02T10:06:25.470Z (about 1 month ago)
- Topics: greenplum, postgres, postgresql, spark, sparksql, transactional
- Language: Scala
- Homepage: https://yaooqinn.github.io/spark-postgres/
- Size: 78.1 KB
- Stars: 35
- Watchers: 4
- Forks: 14
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# PostgreSQL & GreenPlum Data Source for Apache Spark [![License](https://img.shields.io/badge/license-Apache%202-4EB1BA.svg)](https://www.apache.org/licenses/LICENSE-2.0.html) [![](https://tokei.rs/b1/github/yaooqinn/spark-greenplum)](https://github.com/yaooqinn/spark-greenplum) [![GitHub release](https://img.shields.io/github/release/yaooqinn/spark-greenplum.svg)](https://github.com/yaooqinn/spark-greenplum/releases) [![codecov](https://codecov.io/gh/yaooqinn/spark-greenplum/branch/master/graph/badge.svg)](https://codecov.io/gh/yaooqinn/spark-greenplum) [![Build Status](https://travis-ci.com/yaooqinn/spark-greenplum.svg?branch=master)](https://travis-ci.com/yaooqinn/spark-greenplum)[![HitCount](http://hits.dwyl.io/yaooqinn/spark-greenplum.svg)](http://hits.dwyl.io/yaooqinn/spark-greenplum)
A library for reading data from and transferring data to Greenplum databases with Apache Spark, for Spark SQL and DataFrames.
This library is **100x faster** than Apache Spark's JDBC DataSource while transferring data from Spark to Greenpum databases.
Also, this library is fully **transactional** .
## Try it now !
### CTAS
```genericsql
CREATE TABLE tbl
USING greenplum
options (
url "jdbc:postgresql://greenplum:5432/",
delimiter "\t",
dbschema "gptest",
dbtable "store_sales",
user 'gptest',
password 'test')
AS
SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;
```### View & Insert
```genericsql
CREATE TEMPORARY TABLE tbl
USING greenplum
options (
url "jdbc:postgresql://greenplum:5432/",
delimiter "\t",
dbschema "gptest",
dbtable "store_sales",
user 'gptest',
password 'test')
INSERT INTO TABLE tbl SELECT * FROM tpcds_100g.store_sales WHERE ss_sold_date_sk<=2451537 AND ss_sold_date_sk> 2451520;```
Please refer to [Spark SQL Guide - JDBC To Other Databases](http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html) to learn more about the similar usage.