Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/shiyis/data-labs

This repo hosts data collection, wrangling, modeling, engineering practices and labs.
https://github.com/shiyis/data-labs

aws data-engineering sam serverless serverless-functions terraform

Last synced: about 1 month ago
JSON representation

This repo hosts data collection, wrangling, modeling, engineering practices and labs.

Awesome Lists containing this project

README

        

#### data-engineering-things

This repo holds all the aws data engineering practices and general data pipeline tutorials I have done. This only holds the submodule mapping to the repos that contain the actual content of these exercises.

[aws-sam-cicd](https://github.com/shiyis/aws-serverless-etl-cicd) is a simple aws data pipeline that streams, validates, and loads tweets.

[twitter-archive](https://github.com/shiyis/twitter-archive) is a github action workflow that retrieves tweet using YAML configuration.

[terraform-labs](https://github.com/shiyis/terraform-labs) data engineering schema config with terraform hcl.

[dra-data](https://github.com/shiyis/dra-data) open source data collection with github action _flat_ and manifest file.

[pyspark-etl-example](https://github.com/AlexIoannides/pyspark-example-project/tree/eeee0c2b9af79fdd7c5d86fe56466c147b487e26) a pyspark etl example that extracts, transforms, and loads dummy data.

[yelp-to-xml](https://github.com/shiyis/data-labs/tree/master/yelp-to-xml) a small data collection app/lab of yelp reviews; converted to xml, cleaned, wrangled and managed.