Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/wittline/livyc
Apache Spark as a Service with Apache Livy Client
https://github.com/wittline/livyc
apache-livy apache-spark big-data data-engineering dataengineering docker livy-client livy-docker pyhton spark
Last synced: 4 months ago
JSON representation
Apache Spark as a Service with Apache Livy Client
- Host: GitHub
- URL: https://github.com/wittline/livyc
- Owner: Wittline
- License: mit
- Created: 2022-06-10T20:03:17.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-09-19T23:30:25.000Z (over 2 years ago)
- Last Synced: 2024-10-14T18:40:28.881Z (4 months ago)
- Topics: apache-livy, apache-spark, big-data, data-engineering, dataengineering, docker, livy-client, livy-docker, pyhton, spark
- Language: Python
- Homepage:
- Size: 24.4 KB
- Stars: 3
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# livyc
## Apache Livy Client
![]()
## Install library
```python
pip install livyc
```## Import library
```python
from livyc import livyc
```## Setting livy configuration
```python
data_livy = {
"livy_server_url": "localhost",
"port": "8998",
"jars": ["org.postgresql:postgresql:42.3.1"]
}
```## Let's try launch a pySpark script to Apache Livy Server
```python
params = {"host": "localhost", "port":"5432", "database": "db", "table":"staging", "user": "postgres", "password": "pg12345"}
``````python
pyspark_script = """from pyspark.sql.functions import udf, col, explode
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, ArrayType
from pyspark.sql import Row
from pyspark.sql import SparkSessiondf = spark.read.format("jdbc") \
.option("url", "jdbc:postgresql://{host}:{port}/{database}") \
.option("driver", "org.postgresql.Driver") \
.option("dbtable", "{table}") \
.option("user", "{user}") \
.option("password", "{password}") \
.load()
n_rows = df.count()spark.stop()
"""
```#### Creating an livyc Object
```python
lvy = livyc.LivyC(data_livy)
```#### Creating a new session to Apache Livy Server
```python
session = lvy.create_session()
```#### Send and execute script in the Apache Livy server
```python
lvy.run_script(session, pyspark_script.format(**params))
```#### Accesing to the variable "n_rows" available in the session
```python
lvy.read_variable(session, "n_rows")
```## Contributing and Feedback
Any ideas or feedback about this repository?. Help me to improve it.## Authors
- Created by Ramses Alexander Coraspe Valdez
- Created on 2022## License
This project is licensed under the terms of the MIT License.