https://github.com/albumprinter/storio-analyticsengineer-techassignment
https://github.com/albumprinter/storio-analyticsengineer-techassignment
build-na deploy-na domain-data-ai hosting-na owner-data-platform system-data type-application
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/albumprinter/storio-analyticsengineer-techassignment
- Owner: albumprinter
- License: apache-2.0
- Created: 2025-10-21T14:29:34.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-10-24T14:31:08.000Z (4 months ago)
- Last Synced: 2025-10-24T16:26:43.445Z (4 months ago)
- Topics: build-na, deploy-na, domain-data-ai, hosting-na, owner-data-platform, system-data, type-application
- Language: Python
- Size: 235 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS
Awesome Lists containing this project
README
# Storio - Data Principal Engineer Assignment
This repo contains a scuffed version of `jaffle_shop`, a fictional ecommerce store. This project will be used to test your refactoring skills. In a nutshell: This dbt project transforms raw data from an app database into models ready for analytics.
However, the project is not well-structured, and the code is not very readable. Your task is to refactor the code to make it more readable and maintainable.
Things to consider:
- Warehouse layers
- Code readability
- Testing
The project contains [seeds](https://docs.getdbt.com/docs/building-a-dbt-project/seeds) that includes some (fake) raw data from a fictional app along with some basic dbt [models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models), tests, and docs for this data.
## Running this project
Prerequisities: Python >= 3.8
1. Install the project in a virtual environment using your favorite python/env management tool
- `uv`
- `pipenv`
- `poetry`
- `venv`
- ...
2. (`uv run`) `dbt build`
3. (`uv run`) `dbt docs generate`
4. (`uv run`) `dbt docs serve`
## Verifying your environment
1. Ensure your [profile](https://docs.getdbt.com/reference/profiles.yml) is setup correctly from the command line:
```shell
dbt --version
dbt debug
```
2. Load the CSVs with the demo data set, run the models, and test the output of the models using the [dbt build](https://docs.getdbt.com/reference/commands/build) command:
```shell
dbt build
```
3. Query the data:
Launch a DuckDB command-line interface (CLI):
```shell
duckcli jaffle_shop.duckdb
```
Run a query at the prompt and exit:
```
select * from customers_with_order_info where customer_id = 42;
exit;
```
Alternatively, use a single-liner to perform the query:
```shell
duckcli jaffle_shop.duckdb -e "select * from customers_with_order_info where customer_id = 42"
```
or:
```shell
echo 'select * from customers_with_order_info where customer_id = 42' | duckcli jaffle_shop.duckdb
```
4. Generate and view the documentation for the project:
```shell
dbt docs generate
dbt docs serve
```
## Running `build` steps independently
1. Load the CSVs with the demo data set. This materializes the CSVs as tables in your target schema. Note that a typical dbt project **does not require this step** since dbt assumes your raw data is already in your warehouse.
```shell
dbt seed
```
2. Run the models:
```shell
dbt run
```
> **NOTE:** If you decide to run this project in your own data warehouse (outside of this DuckDB demo) and steps fail, it might mean that you need to make small changes to the SQL in the models folder to adjust for the flavor of SQL of your target database. Definitely consider this if you are using a community-contributed adapter.
3. Test the output of the models using the [test](https://docs.getdbt.com/reference/commands/test) command:
```shell
dbt test
```
## Browsing the data
Some options:
- [DuckDB UI](https://duckdb.org/docs/stable/extensions/ui.html)
- [duckcli](https://pypi.org/project/duckcli/)
- [DuckDB CLI](https://duckdb.org/docs/installation/?environment=cli)