https://github.com/legomb/fedex-assignment
Analytics Engineering assignment for FedEx using dbt, cube.dev, and Superset
https://github.com/legomb/fedex-assignment
cube dbt dbt-core devcontainer devcontainers diagrams-as-code docker docker-compose duckdb kaggle kaggle-dataset mermaid sqlfluff superset task testing tests
Last synced: 6 months ago
JSON representation
Analytics Engineering assignment for FedEx using dbt, cube.dev, and Superset
- Host: GitHub
- URL: https://github.com/legomb/fedex-assignment
- Owner: legomb
- Created: 2024-02-26T16:15:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-07T09:26:04.000Z (6 months ago)
- Last Synced: 2025-04-07T09:27:49.344Z (6 months ago)
- Topics: cube, dbt, dbt-core, devcontainer, devcontainers, diagrams-as-code, docker, docker-compose, duckdb, kaggle, kaggle-dataset, mermaid, sqlfluff, superset, task, testing, tests
- Language: Dockerfile
- Homepage:
- Size: 6.56 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# FedEx Analytics Engineering Assignment
## Overview
This is my submission for the FedEx Analytics Engineering Assignment.
It features a contained environment with a data pipeline that ingests, cleans, and enriches the [Amazon E-Commerce Sales Dataset from Kaggle](https://www.kaggle.com/datasets/thedevastator/unlock-profits-with-e-commerce-sales-data), and makes the results available for BI.
```mermaid
flowchart LR
raw["`Raw data
(.csv file)`"]
clean["Clean models
(dbt)"]
enriched["Enriched models
(dbt)"]
kimball["Kimball models
(dbt)"]
SemanticLayer["Semantic Layer
(Cube.dev)"]
BI["BI layer
(Apache Superset)"]raw --> clean --> enriched --> kimball --> SemanticLayer --> BI
```### Components
This project includes a workflow with:
- Data transformations using [dbt](https://www.getdbt.com/)
- Data storage using [DuckDB](https://duckdb.org/)
- Semantic Layer models using [cube.dev](https://cube.dev/)
- BI dashboards using [Superset](https://superset.apache.org/)
- A basic data catalog using [dbt docs](https://docs.getdbt.com/docs/collaborate/documentation)
- A local development environment using vscode devcontainer, linters, docker compose.### Out of scope
Due to time constraints, the following areas are incomplete/out of scope:
- Proper security handling for production, like not committing the `.env` file, using secrets, etc. (`.env` file is commited for demo purposes.)
- Superset works and has a connection to cube, so it can be used to create dashboards. But there are no readymade dashboards included in this repo.
- Devcontainer linters are not configured.
- Limited data cleansing and testing.
- The Pyspark part of this exercise was agreed to be skipped.## Quick reference
- `REQUIREMENTS.md`: Original requirements.
- `transform/models`: Data transformation models (dbt).
- `cube/schema`: Semantic Layer models, to be used by BI dashboard apps (Cube.dev)
- `superset`: Superset (BI dashboards)
- `docker-compose.yml`: Local environment definition.
- `taskfile.yml`: Available actions, to be used by maintainers and eventually the CI/CD.## How to run this demo
### Requirements
- Visual Studio Code
- Docker### Instructions
1. Open this repo in VSCode. Open the command palette (`Shift+Cmd+P` on mac) and select `Dev Containers: Rebuild and Reopen in Container`. This will spin up the environment including a devcontainer, cube, and superset.
2. Open a terminal in the devcontainer and run:```sh
task demo:run-full-demo
```3. Then:
- To see an overview of the **data transformation models and their metadata & lineage**, access the local dbt docs instance by navigating to [http://localhost:8080](http://localhost:8080).
- To view and manage the **semantic model data cubes and views**, open the local cube instance by navigating to [http://localhost:4000/](http://localhost:4000/).
- To view and manage **BI dashboards**, open the local Superset instance by navigating to [http://localhost:8088/login/](http://localhost:8088/login/) and log in with `admin`, `admin`. It has a connection to cube and you can create your own dashboards, but at the moment there are no readymade dashboards included in this repo.