https://github.com/fal-ai/dbt_feature_store

Build your feature store with macros right within your dbt repository
https://github.com/fal-ai/dbt_feature_store

dbt dbt-packages feature-store

Last synced: 14 days ago
JSON representation

Build your feature store with macros right within your dbt repository

Host: GitHub
URL: https://github.com/fal-ai/dbt_feature_store
Owner: fal-ai
Created: 2022-02-11T14:07:30.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-12-16T14:08:38.000Z (over 2 years ago)
Last Synced: 2025-06-26T09:26:06.398Z (19 days ago)
Topics: dbt, dbt-packages, feature-store
Language: Python
Homepage: https://hub.getdbt.com/fal-ai/feature_store
Size: 38.1 KB
Stars: 38
Watchers: 4
Forks: 4
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-dbt - dbt-feature-store - Macros to build a feature store right within your dbt project. (Packages)

README

        # dbt_feature_store

* [dbt_feature_store](#dbt_feature_store)

* [About](#about)

* [Usage](#usage)

   * [Inside of dbt Models](#inside-of-dbt-models)

   * [From fal client (coming soon)](#from-fal-client-coming-soon)

* [Macros](#macros)

      * [create_dataset (source)](#create_dataset-source)

      * [latest_timestamp (source)](#latest_timestamp-source)

   * [Building block Macros](#building-block-macros)

      * [next_timestamp (source)](#next_timestamp-source)

      * [label_feature_join (source)](#label_feature_join-source)

* [feature_table object](#feature_table-object)

# About

This package contains dbt macros to help you build a feature store right within your dbt repository.

# Usage

## Inside of dbt Models

> **NOTE:** to see a full example of the package in use, go to [dbt_feature_store_example](https://github.com/fal-ai/dbt_feature_store_example)

You can build models with these macros to maintain a feature store updated with your dbt runs.

## From fal client (coming soon)

Trigger feature calculations from the fal Python client to quickly iterate and discover the best features for your ML model from your notebook.

# Macros

### create_dataset ([source](/macros/create_dataset.sql))

This macro creates a table that holds the label and the historical features. This table should be ready to be used as training data without any additional transformations

Constructor: `feature_store.create_dataset(label, features)`

- `label`: [feature_table object](#feature_table-object)

- `features`: list of [feature_table objects](#feature_table-object)

Example:

```jinja

SELECT * 

FROM (

  {{ create_dataset(

      { 

        'table': source('dbt_bike', 'bike_is_winner'), 

        'columns': ['is_winner'] 

      },

      [

        { 

          'table': ref('bike_duration'), 

          'columns': ['trip_duration_last_week', 'trip_count_last_week'] 

        }

      ]

  ) }}

)

```

### latest_timestamp ([source](/macros/latest_timestamp.sql))

This macro creates a table with a only latest timestamp rows of a feature, this is useful to make predictions with the latest information available for an entity.

Constructor: `feature_store.latest_timestamp(feature)`

- `feature`: [feature_table object](#feature_table-object)

## Building block Macros

### next_timestamp ([source](/macros/next_timestamp.sql))

Constructor: `feature_store.next_timestamp(entity_column, timestamp_column)`

- `entity_column`: column name of id of rows for joining a label tables and feature tables

- `timestmap_column`: column name of timestamp/date of rows for joining a label tables and feature tables

### label_feature_join ([source](/macros/label_feature_join.sql))

Constructor: `feature_store.label_feature_join(label_entity_column, label_timestamp_column, feature_entity_column, feature_timestamp_column, feature_next_timestamp_column)`

- `label_entity_column`: column name of the entity id that is used for predictions, this column is used to join labels to features

- `label_timestamp_column`: column name of the timestamp/date, this column is used to join labels to features

- `feature_entity_column`: column name of the entity id that is used for predictions, this column is used to join labels to features

- `feature_timestamp_column`: column name of the timestamp/date, this column is used to join labels to features

- `feature_next_timestamp_column`: column pre-calculated (normally in a CTE) with the call of the macro [feature_store.next_timestamp(feature_entity_column, feature_timestamp_column)](#next_timestamp)

# feature_table object

A feature_table object is a Python dict with the following properties:

- `table`: a `ref`, `source` or name of a CTE defined in the query

- `columns`: a list of columns from the label relation to appear in the final query

- `entity_column` (optional): column name of the entity id that is used for predictions, this column is used to join labels to features

- `timestmap_column` (optional): column name of the timestamp/date, this column is used to join labels to features

If you pass a [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref/) or [source](https://docs.getdbt.com/reference/dbt-jinja-functions/source/) in the `table` property, you can skip the `entity_column` and `timestamp_column` properties, as they will be loaded from the [schema.yml](https://docs.getdbt.com/reference/resource-properties/schema) `meta` for models or sources.

```yml

version: 2

sources:

  - name: dbt_bike

    tables:

      - name: bike_is_winner

        meta:

          # source example

          fal:

            feature_store:

              entity_column: bike_id

              timestamp_column: date

models:

  - name: bike_duration

    meta:

      # model example

      fal:

        feature_store:

          entity_column: bike_id

          timestamp_column: start_date

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fal-ai/dbt_feature_store

Awesome Lists containing this project

README