Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/qri-io/startf
Starlark transformation syntax for qri datasets
https://github.com/qri-io/startf
Last synced: 6 days ago
JSON representation
Starlark transformation syntax for qri datasets
- Host: GitHub
- URL: https://github.com/qri-io/startf
- Owner: qri-io
- License: bsd-3-clause
- Created: 2018-05-23T20:25:02.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-06-06T10:38:39.000Z (over 5 years ago)
- Last Synced: 2024-04-25T00:19:21.608Z (7 months ago)
- Language: Go
- Homepage:
- Size: 222 KB
- Stars: 5
- Watchers: 6
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
[![Qri](https://img.shields.io/badge/made%20by-qri-magenta.svg?style=flat-square)](https://qri.io)
[![GoDoc](https://godoc.org/github.com/qri-io/startf?status.svg)](http://godoc.org/github.com/qri-io/startf)
[![License](https://img.shields.io/github/license/qri-io/startf.svg?style=flat-square)](./LICENSE)
[![Codecov](https://img.shields.io/codecov/c/github/qri-io/startf.svg?style=flat-square)](https://codecov.io/gh/qri-io/startf)
[![CI](https://img.shields.io/circleci/project/github/qri-io/startf.svg?style=flat-square)](https://circleci.com/gh/qri-io/startf)
[![Go Report Card](https://goreportcard.com/badge/github.com/qri-io/startf)](https://goreportcard.com/report/github.com/qri-io/startf)# Qri Starlark Transformation Syntax
Qri ("query") is about datasets. Transformations are repeatable scripts for generating a dataset. [Starlark](https://github.com/google/starlark-go/blob/master/doc/spec.md) is a scripting language from Google that feels a lot like python. This package implements starlark as a _transformation syntax_. Starlark tranformations are about as close as one can get to the full power of a programming language as a transformation syntax. Often you need this degree of control to generate a dataset.
Typical examples of a starlark transformation include:
* combining paginated calls to an API into a single dataset
* downloading unstructured structured data from the internet to extract
* pulling raw data off the web & turning it into a datsetWe're excited about starlark for a few reasons:
* **python syntax** - _many_ people working in data science these days write python, we like that, starlark likes that. dope.
* **deterministic subset of python** - unlike python, starlark removes properties that reduce introspection into code behaviour. things like `while` loops and recursive functions are omitted, making it possible for qri to infer how a given transformation will behave.
* **parallel execution** - thanks to this deterministic requirement (and lack of global interpreter lock) starlark functions can be executed in parallel. Combined with peer-2-peer networking, we're hoping to advance tranformations toward peer-driven distribed computing. More on that in the coming months.## Getting started
If you're mainly interested in learning how to write starlark transformations, our [documentation](https://qri.io/docs) is a better place to start. If you're interested in contributing to the way starlark transformations work, this is the place!The easiest way to see starlark transformations in action is to use [qri](https://github.com/qri-io/qri). This `startf` package powers all the starlark stuff in qri. Assuming you have the [go programming language](https://golang.org/) the following should work from a terminal:
```shell
# get this package
$ go get github.com/qri-io/startf# navigate to package
$ cd $GOPATH/src/github.com/qri-io/startf# run tests
```
$ go test ./...
```Often the next steps are to install [qri](https://github.com/qri-io/qri), mess with this `startf` package, then rebuild qri with your changes to see them in action within qri itself.
## Starlark Special Functions
_Special Functions_ are the core of a starlark transform script. Here's an example of a simple data function that sets the body of a dataset to a constant:
```python
def transform(ds,ctx):
ds.set_meta(["hello","world"])
```Here's something slightly more complicated (but still very contrived) that modifies a dataset by adding up the length of all of the elements in a dataset body
```python
def transform(ds, ctx):
body = ds.get_body()
if body != None:
count = 0
for entry in body:
count += len(entry)
ds.set_body([{"total": count}])
```Starlark special functions have a few rules on top of starlark itself:
* special functions *always* accept a _transformation context_ (the `ctx` arg)
* When you define a data function, qri calls it for you
* All special functions are optional (you don't _need_ to define them), except `transform`. transform is required.
* Special functions are always called in the same orderAnother import special function is `download`, which allows access to the `http` package:
```python
load("http.star", "http")def download(ctx):
data = http.get("http://example.com/data.json")
return data
```The result of this special function can be accessed using `ctx.download`:
```python
def transform(ds, ctx):
ds.set_body(ctx.download)
```More docs on the provide API is coming soon.
## Running a transform
Let's say the above function is saved as `transform.star`. You can run it to create a new dataset by using:
```
qri save --file=transform.star me/dataset_name
```Or, you can add more details by creating a dataset file (saved as `dataset.yaml`, for example) with additional structure:
```
name: dataset_name
transform:
scriptpath: transform.star
meta:
title: My awesome dataset
```Then invoke qri:
```
qri save --file=dataset.yaml
```Fun! More info over on our [docs site](https://qri.io/docs)
** **