https://github.com/wittline/livyc

Apache Spark as a Service with Apache Livy Client
https://github.com/wittline/livyc

apache-livy apache-spark big-data data-engineering dataengineering docker livy-client livy-docker pyhton spark

Last synced: 5 months ago
JSON representation

Apache Spark as a Service with Apache Livy Client

Host: GitHub
URL: https://github.com/wittline/livyc
Owner: Wittline
License: mit
Created: 2022-06-10T20:03:17.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-09-19T23:30:25.000Z (almost 3 years ago)
Last Synced: 2024-10-14T18:40:28.881Z (9 months ago)
Topics: apache-livy, apache-spark, big-data, data-engineering, dataengineering, docker, livy-client, livy-docker, pyhton, spark
Language: Python
Homepage:
Size: 24.4 KB
Stars: 3
Watchers: 3
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

        # livyc

## Apache Livy Client



  



## Install library

```python

pip install livyc

```

## Import library

```python

from livyc import livyc

```

## Setting livy configuration 

```python

data_livy = {

    "livy_server_url": "localhost",

    "port": "8998",

    "jars": ["org.postgresql:postgresql:42.3.1"]

}

```

## Let's try launch a pySpark script to Apache Livy Server

```python

params = {"host": "localhost", "port":"5432", "database": "db", "table":"staging", "user": "postgres", "password": "pg12345"}

```

```python

pyspark_script = """

    from pyspark.sql.functions import udf, col, explode

    from pyspark.sql.types import StructType, StructField, IntegerType, StringType, ArrayType

    from pyspark.sql import Row

    from pyspark.sql import SparkSession

    df = spark.read.format("jdbc") \

        .option("url", "jdbc:postgresql://{host}:{port}/{database}") \

        .option("driver", "org.postgresql.Driver") \

        .option("dbtable", "{table}") \

        .option("user", "{user}") \

        .option("password", "{password}") \

        .load()

        

    n_rows = df.count()

    spark.stop()

"""

```

#### Creating an livyc Object

```python

lvy = livyc.LivyC(data_livy)

```

#### Creating a new session to Apache Livy Server

```python

session = lvy.create_session()

```

#### Send and execute script in the Apache Livy server

```python

lvy.run_script(session, pyspark_script.format(**params))

```

#### Accesing to the variable "n_rows" available in the session

```python

lvy.read_variable(session, "n_rows")

```

## Contributing and Feedback

Any ideas or feedback about this repository?. Help me to improve it.

## Authors

- Created by Ramses Alexander Coraspe Valdez

- Created on 2022

## License

This project is licensed under the terms of the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/wittline/livyc

Awesome Lists containing this project

README