https://github.com/heardacat/dave
A simple featurestore using jsonlines
https://github.com/heardacat/dave
Last synced: 2 months ago
JSON representation
A simple featurestore using jsonlines
- Host: GitHub
- URL: https://github.com/heardacat/dave
- Owner: HeardACat
- Created: 2016-05-28T12:01:25.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2016-06-23T04:58:38.000Z (almost 9 years ago)
- Last Synced: 2025-03-26T16:40:39.130Z (2 months ago)
- Language: Python
- Homepage:
- Size: 225 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Dave: a framework for a feature store
=====================================[](https://travis-ci.org/Jules-and-Dave/Dave)
[](http://codecov.io/github/Jules-and-Dave/Dave?branch=master)Dave is a feature store (sort of). Simple as that. It stores facts and extracts features.
This version simply demonstrates how this concept works in environments like
`python` and `R`.Dave is inspired by Ambiata's implementation of Ivory, though treats facts
in a slightly different way. It is also influenced by `LIBSVM` format; creating a sparse data format in a human-readable form.Concepts
========Fact sets
---------Facts in Dave are represented as [`jsonlines`](http://jsonlines.org/) format. It
is also inspired by [entity, attribute, value](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model),
in order to create a sparse data set.These attributes can be provided in any number of rows, however each observation
must have at least the entity information (this can be thought of as a key), with
an optional longitude component being time (if applicable)For example, the following two fact sets are equivalent:
```
{"id": "cust_001", "as_at": "2016-05-28T11:39:+00:00", "gender": "male", "zipcode": "123456"}
``````
{"id": "cust_001", "as_at": "2016-05-28T11:39:+00:00", "gender": "male"}
{"id": "cust_001", "as_at": "2016-05-28T11:39:+00:00", "zipcode": "123456"}
```Feature sets
------------Feature sets can be thought of in two parts of the same picture:
* Feature engineering
* Feature extraction**Feature engineering** may be of interest when you take data over a time range; for
example if we are interested in duration between events, simply looking at
snapshot related information may not be sufficient.**Feature extraction** is outputting the information which is ingested into a
(hopefully) [tidy data format](http://vita.had.co.nz/papers/tidy-data.html) for
machine learning.License and Copyrights
======================This library is released under MIT License 2016 Chapman Siu.