Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/FeatureBaseDB/pdk
Pilosa Dev Kit - implementation tooling and use case examples are here!
https://github.com/FeatureBaseDB/pdk
Last synced: about 1 month ago
JSON representation
Pilosa Dev Kit - implementation tooling and use case examples are here!
- Host: GitHub
- URL: https://github.com/FeatureBaseDB/pdk
- Owner: FeatureBaseDB
- License: bsd-3-clause
- Archived: true
- Created: 2017-02-08T22:20:37.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2022-09-27T21:01:34.000Z (about 2 years ago)
- Last Synced: 2024-06-20T00:54:29.947Z (5 months ago)
- Language: Go
- Size: 5.54 MB
- Stars: 31
- Watchers: 25
- Forks: 21
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# pdk
This repo archived Sept 2022 as part of the transition from Pilosa to FeatureBase.
Please contact community[at]featurebase[dot]com with any questions.[![CircleCI](https://circleci.com/gh/pilosa/pdk.svg?style=shield)](https://circleci.com/gh/pilosa/pdk)
[![Go Report Card](https://goreportcard.com/badge/github.com/pilosa/pdk)](https://goreportcard.com/report/github.com/pilosa/pdk)
[![GoDoc](https://godoc.org/github.com/pilosa/pdk?status.svg)](https://godoc.org/github.com/pilosa/pdk)Pilosa Dev Kit - implementation tooling and use case examples are here!
Documentation is here: https://www.pilosa.com/docs/pdk/
## Requirements
* A running instance of Pilosa. See: https://www.pilosa.com/docs/getting-started/
* A recent version of [Go](https://golang.org/doc/install).
## Installing
We assume you are on a UNIX-like operating system. Otherwise adapt the following instructions for your platform. We further assume that you have a [Go development environment](https://golang.org/doc/install) set up. You should have $GOPATH/bin on your $PATH for access to installed binaries.
* `go get github.com/pilosa/pdk`
* `cd $GOPATH/src/github.com/pilosa/pdk`
* `make install`## Running Tests
To run unit tests
`make test`To run unit and integration tests, first, install and start the Confluent stack:
1. Download tarball here: https://www.confluent.io/download
2. Decompress, enter directory, then,
3. Run `./bin/confluent start kafka-rest`
Now that's running, you can do
`make test TESTFLAGS="-tags=integration"`## Taxi usecase
To get started immediately, run this:
`pdk taxi`
This will create and fill an index called `taxi`, using the short url list in usecase/taxi/urls-short.txt.
If you want to try out the full data set, run this:
`pdk taxi -i taxi-big -f usecase/taxi/greenAndYellowUrls.txt`
There are a number of other options you can tweak to affect the speed and memory usage of the import (or point it to a remote pilosa instance). Use `pdk taxi --help` to see all the options.
Note that this url file represents 1+ billion columns of data - depending on your hardware this will probably take well over 3 hours, and consume quite a bit of memory (and CPU). You can make a file with fewer URLs if you just want to get a sample.
After importing, you can try a few example queries at https://github.com/alanbernstein/pilosa-notebooks/blob/master/taxi-use-case.ipynb .
## Net usecase
To get started immediately, run this:
`pdk net -i en0`
which will capture traffic on the interface `en0` (see available interfaces with `ifconfig`).
## SSB
The Star Schema Benchmark is a benchmark based on [TPC-H](www.tpc.org/tpch/) but tweaked for a somewhat difference use case. It has been implemented by some big data projects such as https://hortonworks.com/blog/sub-second-analytics-hive-druid/ .
To execute the star schema benchmark with Pilosa, you must.
1. Generate the SSB data at a particular scale factor.
2. Import the data into Pilosa.
3. Run the `demo-ssb` application for convenience which has all of the SSB queries pre-written.### Generating SSB data
Use https://github.com/electrum/ssb-dbgen.git to generate the raw SSB data. This can be a bit finicky to work with - hit up @tgruben for tips (or maybe he'll update this section :wink:.When generating the data, you have to select a particular scale factor - the size of the generated data will be about 600MB * SF(scale factor), so SF=100 will generate about 60GB of data.
### Import data into Pilosa.
Use `pdk ssb` to import the data into Pilosa. You must specify the directory containing the `.tbl` files generated in the first step as well as the location of your pilosa cluster. There are a few other options which you can tweak which may help import performance. See `pdk ssb -h` for more information.### Run demo-ssb
This repo https://github.com/pilosa/demo-ssb.git contains a small Go program which packages up the different queries which comprise the benchmark. Running demo-ssb starts a web server which executes queries against pilosa on your behalf. You can simply run (e.g.) `curl localhost:8000/query/1.1` to run an SSB query.