Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/snowplow/snowplow-rdb-loader
Stores Snowplow enriched events in Redshift, Snowflake and Databricks
https://github.com/snowplow/snowplow-rdb-loader
redshift scala snowplow spark
Last synced: 2 months ago
JSON representation
Stores Snowplow enriched events in Redshift, Snowflake and Databricks
- Host: GitHub
- URL: https://github.com/snowplow/snowplow-rdb-loader
- Owner: snowplow
- License: other
- Created: 2017-07-27T23:05:09.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2024-08-27T08:24:20.000Z (5 months ago)
- Last Synced: 2024-08-27T11:17:22.367Z (5 months ago)
- Topics: redshift, scala, snowplow, spark
- Language: Scala
- Homepage:
- Size: 6.41 MB
- Stars: 31
- Watchers: 16
- Forks: 16
- Open Issues: 86
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG
Awesome Lists containing this project
README
# Relational Database Loader
[![Build Status][build-image]][build]
[![Release][release-image]][releases]
[![License][license-image]][license]## Introduction
This project contains applications required to load Snowplow data into various data warehouses.
It consists of two types of applications: Transformers and Loaders
### Transformers
Transformers read Snowplow enriched events, transform them to a format ready to be loaded to a data warehouse, then write them to respective blob storage.
There are two types of Transformers: Batch and Streaming
#### Stream Transformer
Stream Transformers read enriched events from respective stream service, transform them, then write transformed events to specified blob storage path.
They write transformed events in periodic windows.There are two different Stream Transformer applications: Transformer Kinesis and Transformer Pubsub. As one can predict, they are different variants for GCP, AWS and Azure.
#### Batch Transformer
It is a [Spark][spark] job. It only works with AWS services. It reads enriched events from a given S3 path, transforms them, then writes transformed events to a specified S3 path.
### Loaders
Transformers send a message to a message queue after they are finished with transforming some batch and writing it to blob storage.
This message contains information about transformed data such as where it is stored and what it looks like.Loaders subscribe to the message queue. After a message is received, it is parsed, and necessary bits are extracted to load transformed events to the destination.
Loaders construct necessary SQL statements to load transformed events then they send these SQL statements to the specified destination.At the moment, we have loader applications for Redshift, Databricks and Snowflake.
## Find out more
| Technical Docs | Setup Guide | Roadmap & Contributing |
|----------------------------|----------------------|------------------------|
| ![i1][techdocs-image] | ![i2][setup-image] | ![i3][roadmap-image] |
| [Technical Docs][techdocs] | [Setup Guide][setup] | [Roadmap][roadmap] |## Copyright and license
Copyright (c) 2012-present Snowplow Analytics Ltd. All rights reserved.
Licensed under the [Snowplow Limited Use License Agreement][license]. _(If you are uncertain how it applies to your use case, check our answers to [frequently asked questions][faq].)_
[techdocs-image]: https://d3i6fms1cm1j0i.cloudfront.net/github/images/techdocs.png
[setup-image]: https://d3i6fms1cm1j0i.cloudfront.net/github/images/setup.png
[roadmap-image]: https://d3i6fms1cm1j0i.cloudfront.net/github/images/roadmap.png
[setup]: https://docs.snowplow.io/docs/getting-started-on-snowplow-open-source/
[techdocs]: https://docs.snowplow.io/docs/pipeline-components-and-applications/loaders-storage-targets/snowplow-rdb-loader/
[roadmap]: https://github.com/snowplow/snowplow/projects/7[spark]: http://spark.apache.org/
[build-image]: https://github.com/snowplow/snowplow-rdb-loader/workflows/CI/badge.svg
[build]: https://github.com/snowplow/snowplow-rdb-loader/actions/workflows/ci.yml[release-image]: https://img.shields.io/badge/release-6.1.1-blue.svg?style=flat
[releases]: https://github.com/snowplow/snowplow-rdb-loader/releases[license]: https://docs.snowplow.io/limited-use-license-1.0
[license-image]: https://img.shields.io/badge/license-Snowplow--Limited-Use-blue.svg?style=flat[faq]: https://docs.snowplow.io/docs/contributing/limited-use-license-faq/