Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Tubular/sparkly
Helpers & syntactic sugar for PySpark.
https://github.com/Tubular/sparkly
pyspark python spark
Last synced: 3 months ago
JSON representation
Helpers & syntactic sugar for PySpark.
- Host: GitHub
- URL: https://github.com/Tubular/sparkly
- Owner: tubular
- License: apache-2.0
- Created: 2016-10-07T15:50:02.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-06-26T21:09:23.000Z (over 1 year ago)
- Last Synced: 2024-07-26T07:42:02.988Z (4 months ago)
- Topics: pyspark, python, spark
- Language: Python
- Homepage:
- Size: 1.54 MB
- Stars: 60
- Watchers: 37
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.rst
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
Sparkly
=======|Sparkly PyPi Version| |Documentation Status|
Helpers & syntax sugar for PySpark. There are several features to make your life easier:
- Definition of spark packages, external jars, UDFs and spark options within your code;
- Simplified reader/writer api for Cassandra, Elastic, MySQL, Kafka;
- Testing framework for spark applications.More details could be found in `the official
documentation `__.Installation
------------Sparkly itself is easy to install::
pip install pyspark # pick your version
pip install sparkly (compatible with spark >= 2.4)Getting Started
---------------Here is a small code snippet to show how to easily read Cassandra table
and write its content to ElasticSearch index::from sparkly import SparklySession
class MySession(SparklySession):
packages = [
'datastax:spark-cassandra-connector:2.0.0-M2-s_2.11',
'org.elasticsearch:elasticsearch-spark-20_2.11:6.5.4',
]if __name__ == '__main__':
spark = MySession()
df = spark.read_ext.cassandra('localhost', 'my_keyspace', 'my_table')
df.write_ext.elastic('localhost', 'my_index', 'my_type')See `the online documentation `__ for
more details.Testing
-------To run tests you have to have `docker `__ and
`docker-compose `__ installed on your
system. If you are working on MacOS we highly recommend you to use
`docker-machine `__. As soon as the
tools mentioned above have been installed, all you need is to run::make test
Supported Spark Versions
------------------------At the moment we support:
+---------------------------------------------------------------------------+
| sparkly >= 2.7 | Spark 2.4.x |
+---------------------------------------------------------------------------+
| sparkly 2.x | Spark 2.0.x and Spark 2.1.x and Spark 2.2.x |
+---------------------------------------------------------------------------+
| sparkly 1.x | Spark 1.6.x |
+---------------------------------------------------------------------------+.. |Sparkly PyPi Version| image:: http://img.shields.io/pypi/v/sparkly.svg
:target: https://pypi.python.org/pypi/sparkly
.. |Sparkly Build Status| image:: https://app.travis-ci.com/tubular/sparkly.svg?branch=master
:target: https://app.travis-ci.com/github/tubular/sparkly
.. |Documentation Status| image:: https://readthedocs.org/projects/sparkly/badge/?version=latest
:target: http://sparkly.readthedocs.io/en/latest/?badge=latest