Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ccao-data/data-architecture
Codebase for CCAO data infrastructure construction and management
https://github.com/ccao-data/data-architecture
aws aws-athena aws-s3 data-architecture data-engineering
Last synced: about 11 hours ago
JSON representation
Codebase for CCAO data infrastructure construction and management
- Host: GitHub
- URL: https://github.com/ccao-data/data-architecture
- Owner: ccao-data
- Created: 2023-06-28T16:11:25.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-11-13T16:12:38.000Z (1 day ago)
- Last Synced: 2024-11-13T16:33:33.491Z (1 day ago)
- Topics: aws, aws-athena, aws-s3, data-architecture, data-engineering
- Language: R
- Homepage: https://ccao-data.github.io/data-architecture/
- Size: 30.6 MB
- Stars: 6
- Watchers: 0
- Forks: 4
- Open Issues: 74
-
Metadata Files:
- Readme: README.md
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# CCAO Data Infrastructure
This repository stores the code for the CCAO Data Department's ETL
pipelines and data lakehouse. This infrastructure supports the Data Team's
modeling, reporting, and data integrity work.## Quick Links
- [:file_folder: dbt Data Catalog](https://ccao-data.github.io/data-architecture/#!/overview) -
Documentation for all CCAO data lakehouse tables and views
- [:nut_and_bolt: dbt README](/dbt/README.md) - How to develop CCAO data
infrastructure using dbt
- [:test_tube: dbt Tests and QC Reports](dbt/README.md#-how-to-add-and-run-tests-and-qc-reports) -
How to add and run data tests, unit tests, and QC reports using dbt
- [:pencil: dbt Generic Test Documentation](/dbt/tests/generic/README.md) -
Definitions for CCAO generic dbt tests, which are functions that we use to define our QC tests## Repository Structure
- [./dbt](./dbt) contains the models and tests that build our Athena data lakehouse;
dbt mainly acts as a transformation and documentation layer on top of our raw data
- [./docs](./docs) contains design documents and other supplemental documentation
- [./etl](./etl) contains ETL scripts used to load raw and slightly cleaned up
data into the lakehouse as dbt sources
- [./socrata](./socrata) contains column transformations for the CCAO's
[Open Data Portal](https://datacatalog.cookcountyil.gov/) assets