Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/getindata/kedro-snowflake
Kedro Snowflake / Snowpark plugin
https://github.com/getindata/kedro-snowflake
kedro machine-learning mlops snowflake snowpark
Last synced: 28 days ago
JSON representation
Kedro Snowflake / Snowpark plugin
- Host: GitHub
- URL: https://github.com/getindata/kedro-snowflake
- Owner: getindata
- License: apache-2.0
- Created: 2022-11-25T10:39:58.000Z (about 2 years ago)
- Default Branch: develop
- Last Pushed: 2024-07-19T08:53:25.000Z (5 months ago)
- Last Synced: 2024-10-29T05:34:53.353Z (about 2 months ago)
- Topics: kedro, machine-learning, mlops, snowflake, snowpark
- Language: Python
- Homepage: https://kedro-snowflake.readthedocs.io/
- Size: 5.62 MB
- Stars: 13
- Watchers: 10
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- awesome-kedro - kedro-snowflake - Enables to run full Kedro pipelines in Snowflake. ([Kedro plugins](https://docs.kedro.org/en/stable/extend_kedro/plugins.html))
README
# Kedro Snowflake Pipelines plugin
[![Python Version](https://img.shields.io/pypi/pyversions/kedro-snowflake)](https://github.com/getindata/kedro-snowflake)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![SemVer](https://img.shields.io/badge/semver-2.0.0-green)](https://semver.org/)
[![PyPI version](https://badge.fury.io/py/kedro-snowflake.svg)](https://pypi.org/project/kedro-snowflake/)
[![Downloads](https://pepy.tech/badge/kedro-snowflake)](https://pepy.tech/project/kedro-snowflake)[![Maintainability Rating](https://sonarcloud.io/api/project_badges/measure?project=getindata_kedro-snowflake&metric=sqale_rating)](https://sonarcloud.io/summary/new_code?id=getindata_kedro-snowflake)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=getindata_kedro-snowflake&metric=coverage)](https://sonarcloud.io/summary/new_code?id=getindata_kedro-snowflake)
[![Documentation Status](https://readthedocs.org/projects/kedro-snowflake/badge/?version=latest)](https://kedro-snowflake.readthedocs.io/en/latest/?badge=latest)We help companies turn their data into assets
## About
This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports
* Kedro starter, to get you up to speed fast
* automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
* translating Kedro pipeline into Snowflake tasks graph
* running Kedro pipeline fully within Snowflake, without external system
* using Kedro's official `SnowparkTableDataSet`
* automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
* **(New!)** [MLflow](https://mlflow.org/) integration with Snowflake with examples in _Snowflights_ Kedro starter## Documentation
For detailed documentation refer to https://kedro-snowflake.readthedocs.io/## Usage
### With starter
1. Install the plugin
```bash
pip install "kedro-snowflake>=0.1.0"
```
2. Create new project with our Kedro starter ❄️ _Snowflights_ 🚀:
```bash
kedro new --starter=snowflights --checkout=master
```
And answer the interactive prompts ⬇️ (click to expand)
```
Project Name
============
Please enter a human readable name for your new project.
Spaces, hyphens, and underscores are allowed.
[Snowflights]:
Snowflake Account
=================
Please enter the name of your Snowflake account.
This is the part of the URL before .snowflakecomputing.com
[]: abc-123
Snowflake User
==============
Please enter the name of your Snowflake user.
[]: user2137
Snowflake Warehouse
===================
Please enter the name of your Snowflake warehouse.
[]: compute-wh
Snowflake Database
==================
Please enter the name of your Snowflake database.
[DEMO]:
Snowflake Schema
================
Please enter the name of your Snowflake schema.
[DEMO]:
Snowflake Password Environment Variable
=======================================
Please enter the name of the environment variable that contains your Snowflake password.
Alternatively, you can re-configure the plugin later to use Kedros credentials.yml
[SNOWFLAKE_PASSWORD]:
Pipeline Name Used As A Snowflake Task Prefix
=============================================[default]:
Enable Mlflow Integration (See Documentation For The Configuration Instructions)
================================================================================[False]:
The project name 'Snowflights' has been applied to:
- The project title in /tmp/snowflights/README.md
- The folder created for your project in /tmp/snowflights
- The project's python package in /tmp/snowflights/src/snowflights
```
3. Run the project
```bash
cd snowflights
kedro snowflake run --wait-for-completion
```### In existing Kedro project
1. Install the plugin
```bash
pip install "kedro-snowflake>=0.1.0"
```
2. Initialize the plugin
```bash
kedro snowflake init
```
3. Run the project
```bash
kedro snowflake run --wait-for-completion
```
### Kedro pipeline in Snowflake TasksExecution: