Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fal-ai/dbt_feature_store
Build your feature store with macros right within your dbt repository
https://github.com/fal-ai/dbt_feature_store
dbt dbt-packages feature-store
Last synced: 3 months ago
JSON representation
Build your feature store with macros right within your dbt repository
- Host: GitHub
- URL: https://github.com/fal-ai/dbt_feature_store
- Owner: fal-ai
- Created: 2022-02-11T14:07:30.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-12-16T14:08:38.000Z (almost 2 years ago)
- Last Synced: 2024-05-14T10:34:53.541Z (6 months ago)
- Topics: dbt, dbt-packages, feature-store
- Language: Python
- Homepage: https://hub.getdbt.com/fal-ai/feature_store
- Size: 38.1 KB
- Stars: 36
- Watchers: 4
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-dbt - dbt-feature-store - Macros to build a feature store right within your dbt project. (Packages)
README
# dbt_feature_store
* [dbt_feature_store](#dbt_feature_store)
* [About](#about)
* [Usage](#usage)
* [Inside of dbt Models](#inside-of-dbt-models)
* [From fal client (coming soon)](#from-fal-client-coming-soon)
* [Macros](#macros)
* [create_dataset (source)](#create_dataset-source)
* [latest_timestamp (source)](#latest_timestamp-source)
* [Building block Macros](#building-block-macros)
* [next_timestamp (source)](#next_timestamp-source)
* [label_feature_join (source)](#label_feature_join-source)
* [feature_table object](#feature_table-object)# About
This package contains dbt macros to help you build a feature store right within your dbt repository.
# Usage
## Inside of dbt Models
> **NOTE:** to see a full example of the package in use, go to [dbt_feature_store_example](https://github.com/fal-ai/dbt_feature_store_example)
You can build models with these macros to maintain a feature store updated with your dbt runs.
## From fal client (coming soon)
Trigger feature calculations from the fal Python client to quickly iterate and discover the best features for your ML model from your notebook.
# Macros
### create_dataset ([source](/macros/create_dataset.sql))
This macro creates a table that holds the label and the historical features. This table should be ready to be used as training data without any additional transformations
Constructor: `feature_store.create_dataset(label, features)`
- `label`: [feature_table object](#feature_table-object)
- `features`: list of [feature_table objects](#feature_table-object)Example:
```jinja
SELECT *
FROM ({{ create_dataset(
{
'table': source('dbt_bike', 'bike_is_winner'),
'columns': ['is_winner']
},
[
{
'table': ref('bike_duration'),
'columns': ['trip_duration_last_week', 'trip_count_last_week']
}
]
) }})
```### latest_timestamp ([source](/macros/latest_timestamp.sql))
This macro creates a table with a only latest timestamp rows of a feature, this is useful to make predictions with the latest information available for an entity.
Constructor: `feature_store.latest_timestamp(feature)`
- `feature`: [feature_table object](#feature_table-object)
## Building block Macros
### next_timestamp ([source](/macros/next_timestamp.sql))
Constructor: `feature_store.next_timestamp(entity_column, timestamp_column)`
- `entity_column`: column name of id of rows for joining a label tables and feature tables
- `timestmap_column`: column name of timestamp/date of rows for joining a label tables and feature tables### label_feature_join ([source](/macros/label_feature_join.sql))
Constructor: `feature_store.label_feature_join(label_entity_column, label_timestamp_column, feature_entity_column, feature_timestamp_column, feature_next_timestamp_column)`
- `label_entity_column`: column name of the entity id that is used for predictions, this column is used to join labels to features
- `label_timestamp_column`: column name of the timestamp/date, this column is used to join labels to features
- `feature_entity_column`: column name of the entity id that is used for predictions, this column is used to join labels to features
- `feature_timestamp_column`: column name of the timestamp/date, this column is used to join labels to features
- `feature_next_timestamp_column`: column pre-calculated (normally in a CTE) with the call of the macro [feature_store.next_timestamp(feature_entity_column, feature_timestamp_column)](#next_timestamp)# feature_table object
A feature_table object is a Python dict with the following properties:
- `table`: a `ref`, `source` or name of a CTE defined in the query
- `columns`: a list of columns from the label relation to appear in the final query
- `entity_column` (optional): column name of the entity id that is used for predictions, this column is used to join labels to features
- `timestmap_column` (optional): column name of the timestamp/date, this column is used to join labels to featuresIf you pass a [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref/) or [source](https://docs.getdbt.com/reference/dbt-jinja-functions/source/) in the `table` property, you can skip the `entity_column` and `timestamp_column` properties, as they will be loaded from the [schema.yml](https://docs.getdbt.com/reference/resource-properties/schema) `meta` for models or sources.
```yml
version: 2
sources:
- name: dbt_bike
tables:
- name: bike_is_winner
meta:
# source example
fal:
feature_store:
entity_column: bike_id
timestamp_column: datemodels:
- name: bike_duration
meta:
# model example
fal:
feature_store:
entity_column: bike_id
timestamp_column: start_date
```