https://github.com/legomb/fedex-assignment

Analytics Engineering assignment for FedEx using dbt, cube.dev, and Superset
https://github.com/legomb/fedex-assignment

cube dbt dbt-core devcontainer devcontainers diagrams-as-code docker docker-compose duckdb kaggle kaggle-dataset mermaid sqlfluff superset task testing tests

Last synced: 6 months ago
JSON representation

Analytics Engineering assignment for FedEx using dbt, cube.dev, and Superset

Host: GitHub
URL: https://github.com/legomb/fedex-assignment
Owner: legomb
Created: 2024-02-26T16:15:03.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-04-07T09:26:04.000Z (6 months ago)
Last Synced: 2025-04-07T09:27:49.344Z (6 months ago)
Topics: cube, dbt, dbt-core, devcontainer, devcontainers, diagrams-as-code, docker, docker-compose, duckdb, kaggle, kaggle-dataset, mermaid, sqlfluff, superset, task, testing, tests
Language: Dockerfile
Homepage:
Size: 6.56 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# FedEx Analytics Engineering Assignment

## Overview

This is my submission for the FedEx Analytics Engineering Assignment.

It features a contained environment with a data pipeline that ingests, cleans, and enriches the [Amazon E-Commerce Sales Dataset from Kaggle](https://www.kaggle.com/datasets/thedevastator/unlock-profits-with-e-commerce-sales-data), and makes the results available for BI.

```mermaid
flowchart LR
raw["`Raw data
(.csv file)`"]
clean["Clean models
(dbt)"]
enriched["Enriched models
(dbt)"]
kimball["Kimball models
(dbt)"]
SemanticLayer["Semantic Layer
(Cube.dev)"]
BI["BI layer
(Apache Superset)"]

raw --> clean --> enriched --> kimball --> SemanticLayer --> BI
```

### Components

This project includes a workflow with:

- Data transformations using [dbt](https://www.getdbt.com/)
- Data storage using [DuckDB](https://duckdb.org/)
- Semantic Layer models using [cube.dev](https://cube.dev/)
- BI dashboards using [Superset](https://superset.apache.org/)
- A basic data catalog using [dbt docs](https://docs.getdbt.com/docs/collaborate/documentation)
- A local development environment using vscode devcontainer, linters, docker compose.

### Out of scope

Due to time constraints, the following areas are incomplete/out of scope:

- Proper security handling for production, like not committing the `.env` file, using secrets, etc. (`.env` file is commited for demo purposes.)
- Superset works and has a connection to cube, so it can be used to create dashboards. But there are no readymade dashboards included in this repo.
- Devcontainer linters are not configured.
- Limited data cleansing and testing.
- The Pyspark part of this exercise was agreed to be skipped.

## Quick reference

- `REQUIREMENTS.md`: Original requirements.
- `transform/models`: Data transformation models (dbt).
- `cube/schema`: Semantic Layer models, to be used by BI dashboard apps (Cube.dev)
- `superset`: Superset (BI dashboards)
- `docker-compose.yml`: Local environment definition.
- `taskfile.yml`: Available actions, to be used by maintainers and eventually the CI/CD.

## How to run this demo

### Requirements

- Visual Studio Code
- Docker

### Instructions

1. Open this repo in VSCode. Open the command palette (`Shift+Cmd+P` on mac) and select `Dev Containers: Rebuild and Reopen in Container`. This will spin up the environment including a devcontainer, cube, and superset.
2. Open a terminal in the devcontainer and run:

```sh
task demo:run-full-demo
```

3. Then:
- To see an overview of the **data transformation models and their metadata & lineage**, access the local dbt docs instance by navigating to [http://localhost:8080](http://localhost:8080).
- To view and manage the **semantic model data cubes and views**, open the local cube instance by navigating to [http://localhost:4000/](http://localhost:4000/).
- To view and manage **BI dashboards**, open the local Superset instance by navigating to [http://localhost:8088/login/](http://localhost:8088/login/) and log in with `admin`, `admin`. It has a connection to cube and you can create your own dashboards, but at the moment there are no readymade dashboards included in this repo.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/legomb/fedex-assignment

Awesome Lists containing this project

README