https://github.com/e2fyi/databricks-utils
`databricks-utils` is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.
https://github.com/e2fyi/databricks-utils
aws databricks jupyter-notebooks notebook pyspark s3 spark vega vega-lite
Last synced: 7 months ago
JSON representation
`databricks-utils` is a python package that provide several utility classes/func that improve ease-of-use in databricks notebook.
- Host: GitHub
- URL: https://github.com/e2fyi/databricks-utils
- Owner: e2fyi
- License: apache-2.0
- Created: 2018-06-27T02:31:51.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2018-07-03T07:26:15.000Z (over 7 years ago)
- Last Synced: 2025-02-17T05:35:58.233Z (8 months ago)
- Topics: aws, databricks, jupyter-notebooks, notebook, pyspark, s3, spark, vega, vega-lite
- Language: Python
- Homepage: https://e2fyi.github.io/databricks-utils/
- Size: 5.28 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# databricks-utils
[](https://shields.io/)
[](https://shields.io/)
[](https://travis-ci.org/e2fyi/databricks-utils)`databricks-utils` is a python package that provide several utility classes/func
that improve ease-of-use in databricks notebook.### Installation
```bash
pip install databricks-utils
```### Features
- `S3Bucket` class to easily interact with a [S3 bucket](https://aws.amazon.com/s3/) via [`dbfs`](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) and databricks spark.- `vega_embed` to render charts from [Vega](https://vega.github.io/vega/) and [Vega-Lite](https://vega.github.io/vega-lite/) specifications.
### Documentation
API documentation can be found at [https://e2fyi.github.io/databricks-utils/](https://e2fyi.github.io/databricks-utils/).### Quick start
**S3Bucket**
```python
import json
from databricks_utils.aws import S3Bucket# need to attach notebook's dbutils
# before S3Bucket can be used
S3Bucket.attach_dbutils(dbutils)# create an instance of the s3 bucket
bucket = (S3Bucket("somebucketname", "SOMEACCESSKEY", "SOMESECRETKEY")
.allow_spark(sc) # local spark context
.mount("somebucketname")) # mount location name (resolves as `/mnt/somebucketname`)# show list of files/folders in the bucket "resource" folder
bucket.ls("resource/")# read in a json file from the bucket
data = json.load(open(bucket.local("resource/somefile.json", "r")))# read from parquet via spark
dataframe = spark.read.parquet(bucket.s3("resource/somedf.parquet"))# umount
bucket.umount()
```**Vega**
[Vega](https://vega.github.io/vega/) and [Vega-Lite](https://vega.github.io/vega-lite/)
are high-level grammars of interactive graphics. They provide concise JSON
syntax for rapidly generating visualizations to support analysis.```python
from databricks_utils.vega import vega_embed# vega-lite spec for a bar chart
spec = {
"data": {
"values": [
{"a": "A","b": 28}, {"a": "B","b": 55}, {"a": "C","b": 43},
{"a": "D","b": 91}, {"a": "E","b": 81}, {"a": "F","b": 53},
{"a": "G","b": 19}, {"a": "H","b": 87}, {"a": "I","b": 52}
]
},
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "ordinal"},
"y": {"field": "b", "type": "quantitative"}
}
}# plot out the vega chart in databricks notebook
displayHTML(vega_embed(spec=spec))
```### Developer
```bash
# add a version to git tag and publish to pypi
. add_tag.sh
```