{"id":13703729,"url":"https://github.com/omnata-labs/dbt-ml-preprocessing","last_synced_at":"2025-05-07T16:10:09.543Z","repository":{"id":38883691,"uuid":"327861595","full_name":"omnata-labs/dbt-ml-preprocessing","owner":"omnata-labs","description":"A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.","archived":false,"fork":false,"pushed_at":"2023-07-03T21:38:18.000Z","size":1727,"stargazers_count":184,"open_issues_count":8,"forks_count":17,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-05-07T16:09:59.970Z","etag":null,"topics":["bigquery","dbt","redshift","scikit-learn","snowflake"],"latest_commit_sha":null,"homepage":"https://omnata-labs.github.io/dbt-ml-preprocessing/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/omnata-labs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-08T09:48:01.000Z","updated_at":"2025-04-11T21:40:33.000Z","dependencies_parsed_at":"2024-10-12T21:02:14.638Z","dependency_job_id":"e0271475-8606-44e1-a855-9ad9000a006d","html_url":"https://github.com/omnata-labs/dbt-ml-preprocessing","commit_stats":null,"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omnata-labs%2Fdbt-ml-preprocessing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omnata-labs%2Fdbt-ml-preprocessing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omnata-labs%2Fdbt-ml-preprocessing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/omnata-labs%2Fdbt-ml-preprocessing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/omnata-labs","download_url":"https://codeload.github.com/omnata-labs/dbt-ml-preprocessing/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252912996,"owners_count":21824066,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","dbt","redshift","scikit-learn","snowflake"],"created_at":"2024-08-02T21:00:59.400Z","updated_at":"2025-05-07T16:10:09.484Z","avatar_url":"https://github.com/omnata-labs.png","language":"Python","funding_links":[],"categories":["Packages"],"sub_categories":[],"readme":"# dbt-ml-preprocessing\n\nA package for dbt which enables standardization of data sets. You can use it to build a feature store in your data warehouse, without using external libraries like Spark's mllib or Python's scikit-learn.\n\nThe package contains a set of macros that mirror the functionality of the [scikit-learn preprocessing module](https://scikit-learn.org/stable/modules/preprocessing.html). Originally they were developed as part of the 2019 Medium article [Feature Engineering in Snowflake](https://medium.com/omnata/feature-engineering-in-snowflake-4312032e0d53).\n\nCurrently they have been tested in Snowflake, Redshift , BigQuery, SQL Server and PostgreSQL 13.2. The test case expectations have been built using scikit-learn (see *.py in [integration_tests/data/sql](integration_tests/data/sql)), so you can expect behavioural parity with it.\n\n| :warning: There are now several better alternatives to this package. If you're using Snowflake, they now offer the [snowflake-ml-python](https://docs.snowflake.com/en/developer-guide/snowpark-ml/index) package which is fully supported and much more comprehensive. Within dbt, the Python models feature allows Snowflake, BigQuery and Databricks users to use scikit-learn directly |\n| --- |\n\n\n\nThe macros are:\n\n| scikit-learn function | macro name | Snowflake | BigQuery | Redshift | MSSQL | PostgreSQL | Example |\n| --- | --- | --- | --- | --- | --- | --- | --- |\n| [KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html#sklearn.preprocessing.KBinsDiscretizer)| k_bins_discretizer  | Y | Y | Y | Y | Y | ![example](images/k_bins.gif) |\n| [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)| label_encoder  | Y | Y | Y | Y | Y | ![example](images/label_encoder.gif) |\n| [MaxAbsScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MaxAbsScaler.html#sklearn.preprocessing.MaxAbsScaler) | max_abs_scaler | Y | Y | Y | Y | Y | [![example](images/max_abs_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#maxabsscaler) |\n| [MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler) | min_max_scaler | Y | Y | Y | Y | Y | [![example](images/min_max_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#minmaxscaler) |\n| [Normalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer) | normalizer | Y | Y | Y | Y | Y | [![example](images/normalizer.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#normalizer) |\n| [OneHotEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder) | one_hot_encoder | Y | Y | Y | Y | Y | ![example](images/one_hot_encoder.gif) |\n| [QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer) | quantile_transformer | Y | Y | N | N | Y | [![example](images/quantile_transformer.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#quantiletransformer-uniform-output) |\n| [RobustScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.RobustScaler.html#sklearn.preprocessing.RobustScaler) | robust_scaler | Y | Y | Y | Y | Y | [![example](images/robust_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#robustscaler) |\n| [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) | standard_scaler | Y | Y | Y | N | Y | [![example](images/standard_scaler.png)](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html#standardscaler) |\n\n_\\* 2D charts taken from [scikit-learn.org](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html), GIFs are my own_\n## Installation\nTo use this in your dbt project, create or modify packages.yml to include:\n```\npackages:\n  - package: \"omnata-labs/dbt_ml_preprocessing\"\n    version: [\"\u003e=1.0.2\"]\n```\n_(replace the revision number with the latest)_\n\nThen run:\n```dbt deps``` to import the package.\n\n### dbt 1.0.0 compatibility\ndbt-ml-preprocessing version 1.2.0 is the first version to support (and require) dbt 1.0.0.\n\nIf you are not ready to upgrade to dbt 1.0.0, please use dbt-ml-preprocessing version 1.0.2.\n\n## Usage\nTo read the macro documentation and see examples, simply [generate your docs](https://docs.getdbt.com/reference/commands/cmd-docs/), and you'll see macro documentation in the Projects tree under ```dbt_ml_preprocessing```:\n\n![docs screenshot](images/docs_screenshot.png)\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomnata-labs%2Fdbt-ml-preprocessing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fomnata-labs%2Fdbt-ml-preprocessing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomnata-labs%2Fdbt-ml-preprocessing/lists"}