https://github.com/wittline/livyc
Apache Spark as a Service with Apache Livy Client
https://github.com/wittline/livyc
apache-livy apache-spark big-data data-engineering dataengineering docker livy-client livy-docker pyhton spark
Last synced: over 1 year ago
JSON representation
Apache Spark as a Service with Apache Livy Client
- Host: GitHub
- URL: https://github.com/wittline/livyc
- Owner: Wittline
- License: mit
- Created: 2022-06-10T20:03:17.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-09-19T23:30:25.000Z (over 3 years ago)
- Last Synced: 2024-10-14T18:40:28.881Z (over 1 year ago)
- Topics: apache-livy, apache-spark, big-data, data-engineering, dataengineering, docker, livy-client, livy-docker, pyhton, spark
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# livyc
## Apache Livy Client
## Install library
```python
pip install livyc
```
## Import library
```python
from livyc import livyc
```
## Setting livy configuration
```python
data_livy = {
"livy_server_url": "localhost",
"port": "8998",
"jars": ["org.postgresql:postgresql:42.3.1"]
}
```
## Let's try launch a pySpark script to Apache Livy Server
```python
params = {"host": "localhost", "port":"5432", "database": "db", "table":"staging", "user": "postgres", "password": "pg12345"}
```
```python
pyspark_script = """
from pyspark.sql.functions import udf, col, explode
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, ArrayType
from pyspark.sql import Row
from pyspark.sql import SparkSession
df = spark.read.format("jdbc") \
.option("url", "jdbc:postgresql://{host}:{port}/{database}") \
.option("driver", "org.postgresql.Driver") \
.option("dbtable", "{table}") \
.option("user", "{user}") \
.option("password", "{password}") \
.load()
n_rows = df.count()
spark.stop()
"""
```
#### Creating an livyc Object
```python
lvy = livyc.LivyC(data_livy)
```
#### Creating a new session to Apache Livy Server
```python
session = lvy.create_session()
```
#### Send and execute script in the Apache Livy server
```python
lvy.run_script(session, pyspark_script.format(**params))
```
#### Accesing to the variable "n_rows" available in the session
```python
lvy.read_variable(session, "n_rows")
```
## Contributing and Feedback
Any ideas or feedback about this repository?. Help me to improve it.
## Authors
- Created by Ramses Alexander Coraspe Valdez
- Created on 2022
## License
This project is licensed under the terms of the MIT License.