https://github.com/getindata/kedro-snowflake
Kedro Snowflake / Snowpark plugin
https://github.com/getindata/kedro-snowflake
kedro machine-learning mlops snowflake snowpark
Last synced: about 1 month ago
JSON representation
Kedro Snowflake / Snowpark plugin
- Host: GitHub
- URL: https://github.com/getindata/kedro-snowflake
- Owner: getindata
- License: apache-2.0
- Created: 2022-11-25T10:39:58.000Z (over 2 years ago)
- Default Branch: develop
- Last Pushed: 2024-07-19T08:53:25.000Z (10 months ago)
- Last Synced: 2025-04-09T20:11:30.127Z (about 1 month ago)
- Topics: kedro, machine-learning, mlops, snowflake, snowpark
- Language: Python
- Homepage: https://kedro-snowflake.readthedocs.io/
- Size: 5.62 MB
- Stars: 13
- Watchers: 10
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-kedro - kedro-snowflake - Enables to run full Kedro pipelines in Snowflake. ([Kedro plugins](https://docs.kedro.org/en/stable/extend_kedro/plugins.html))
README
# Kedro Snowflake Pipelines plugin
[](https://github.com/getindata/kedro-snowflake)
[](https://opensource.org/licenses/Apache-2.0)
[](https://semver.org/)
[](https://pypi.org/project/kedro-snowflake/)
[](https://pepy.tech/project/kedro-snowflake)[](https://sonarcloud.io/summary/new_code?id=getindata_kedro-snowflake)
[](https://sonarcloud.io/summary/new_code?id=getindata_kedro-snowflake)
[](https://kedro-snowflake.readthedocs.io/en/latest/?badge=latest)We help companies turn their data into assets
## About
This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports
* Kedro starter, to get you up to speed fast
* automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
* translating Kedro pipeline into Snowflake tasks graph
* running Kedro pipeline fully within Snowflake, without external system
* using Kedro's official `SnowparkTableDataSet`
* automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
* **(New!)** [MLflow](https://mlflow.org/) integration with Snowflake with examples in _Snowflights_ Kedro starter## Documentation
For detailed documentation refer to https://kedro-snowflake.readthedocs.io/## Usage
### With starter
1. Install the plugin
```bash
pip install "kedro-snowflake>=0.1.0"
```
2. Create new project with our Kedro starter ❄️ _Snowflights_ 🚀:
```bash
kedro new --starter=snowflights --checkout=master
```
And answer the interactive prompts ⬇️ (click to expand)
```
Project Name
============
Please enter a human readable name for your new project.
Spaces, hyphens, and underscores are allowed.
[Snowflights]:
Snowflake Account
=================
Please enter the name of your Snowflake account.
This is the part of the URL before .snowflakecomputing.com
[]: abc-123
Snowflake User
==============
Please enter the name of your Snowflake user.
[]: user2137
Snowflake Warehouse
===================
Please enter the name of your Snowflake warehouse.
[]: compute-wh
Snowflake Database
==================
Please enter the name of your Snowflake database.
[DEMO]:
Snowflake Schema
================
Please enter the name of your Snowflake schema.
[DEMO]:
Snowflake Password Environment Variable
=======================================
Please enter the name of the environment variable that contains your Snowflake password.
Alternatively, you can re-configure the plugin later to use Kedros credentials.yml
[SNOWFLAKE_PASSWORD]:
Pipeline Name Used As A Snowflake Task Prefix
=============================================[default]:
Enable Mlflow Integration (See Documentation For The Configuration Instructions)
================================================================================[False]:
The project name 'Snowflights' has been applied to:
- The project title in /tmp/snowflights/README.md
- The folder created for your project in /tmp/snowflights
- The project's python package in /tmp/snowflights/src/snowflights
```
3. Run the project
```bash
cd snowflights
kedro snowflake run --wait-for-completion
```### In existing Kedro project
1. Install the plugin
```bash
pip install "kedro-snowflake>=0.1.0"
```
2. Initialize the plugin
```bash
kedro snowflake init
```
3. Run the project
```bash
kedro snowflake run --wait-for-completion
```
### Kedro pipeline in Snowflake Tasks
Execution:
![]()