{"id":13622440,"url":"https://github.com/ftupas/dbt-spotify-analytics","last_synced_at":"2025-04-15T06:30:32.139Z","repository":{"id":41398146,"uuid":"303425005","full_name":"ftupas/dbt-spotify-analytics","owner":"ftupas","description":"Containerized end-to-end analytics of Spotify data using Python, dbt, Postgres, and Metabase","archived":false,"fork":false,"pushed_at":"2022-07-08T12:20:49.000Z","size":794,"stargazers_count":123,"open_issues_count":1,"forks_count":32,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-08T09:43:26.472Z","etag":null,"topics":["dbt","docker-containers","metabase","postgres","spotify-data"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ftupas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-12T14:54:48.000Z","updated_at":"2024-10-19T12:08:05.000Z","dependencies_parsed_at":"2022-09-21T07:41:43.680Z","dependency_job_id":null,"html_url":"https://github.com/ftupas/dbt-spotify-analytics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ftupas%2Fdbt-spotify-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ftupas%2Fdbt-spotify-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ftupas%2Fdbt-spotify-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ftupas%2Fdbt-spotify-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ftupas","download_url":"https://codeload.github.com/ftupas/dbt-spotify-analytics/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249020582,"owners_count":21199582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dbt","docker-containers","metabase","postgres","spotify-data"],"created_at":"2024-08-01T21:01:19.236Z","updated_at":"2025-04-15T06:30:32.129Z","avatar_url":"https://github.com/ftupas.png","language":"Python","funding_links":[],"categories":["Python","Sample Projects"],"sub_categories":[],"readme":"# Spotify User Analytics\r\n\r\n## Introduction\r\nIn this project, we will be analyzing our listening history, top tracks \u0026 artists, and genres from Spotify. Here are the tools that we will be using:\r\n- Python - Scraping data from Spotify API endpoints and saving files to CSV\r\n- Postgres - Our database wherein data will be stored into and queried from\r\n- dbt (Data Build Tool) - Data modeling tool to transform our data in staging to fact, dimension tables, and views\r\n- Metabase - Dashboarding tool to analyze our data\r\n- Docker - Containerizing our applications i.e. Postgres, dbt, and Metabase\r\n\r\n## Project Files\r\n- app\r\n    - main.py - Our main ETL script that fetches data from the Spotify API endpoints and saves them to CSV\r\n    - util.py - Utility helper file that contains a custom class **SpotifyUtil**\r\n    - config_template.py - This is where we will store our credentials\r\n- dbt\r\n    - models - Contains the sql scripts and schema.yml files that will be used when we run our transformations\r\n    - dbt_entrypoint.sh - Script that will server as our entrypoint when running the `dbt` container\r\n    - Dockerfile - Contains the commands to create the custom Docker image\r\n    - dbt_project.yml - YAML file to configure dbt\r\n    - packages.yml - YAML file for test dependencies\r\n    - profiles.yml - YAML file to configure connection of `dbt` to `postgres`\r\n- metabase\r\n    - metabase.db - Metadata database of Metabase for the dashboard\r\n- docker-compose.yml - YAML file to orchestrate Docker containers composition\r\n\r\n## Workflow\r\nThe diagram below illustrates the systems design and how the workflow will go.\r\n\r\n![system_design](images/system_design.png)\r\n\r\nLet's break this down into major steps\r\n- Setup\r\n- Get Spotify data\r\n- Build Docker containers\r\n- Transform, model, and load data to Postgres DB using dbt\r\n- Serve to Metabase dashboard\r\n\r\n## Setup\r\n- `cd` to this directory\r\n- Open a terminal, create a Python virtual environment using:\r\n\r\n\r\n    ```\r\n    Windows\r\n    \u003e python -m venv venv\r\n\r\n    Mac/Linux\r\n    $ make build\r\n\r\n    ```\r\n    then activate it by executing \r\n\r\n    ```\r\n    Windows:\r\n    \u003e venv\\Scripts\\activate.bat\r\n    ```\r\n    (For Windows) Install dependencies using:\r\n    ```\r\n    \u003e python -m pip install -r requirements.txt\r\n    ```\r\n- While dependencies are being installed, navigate to [Spotify Developer Page](https://developer.spotify.com/dashboard/login) and login\r\n- Create an app and note down the `Client ID` and `Client Secret`, make sure to add a redirect uri in `Settings` i.e. `http://localhost:8888/callback/`\r\n- Fill the details in [config_template.py](app/config_template.py) and rename it to `config.py`\r\n\r\n## Get Spotify data\r\n- Run the main Python script to fetch the data from Spotify using:\r\n\r\n    ```\r\n    Windows\r\n    \u003e python app\\main.py\r\n\r\n    Mac/Linux\r\n    $ make run\r\n    ```\r\n- While the script is running, it will redirect to a webpage that looks like the one below, and just click `AGREE` \r\n    \r\n    ![spotify](images/token.png)\r\n\r\n    p.s. follow [me](https://open.spotify.com/user/12139930362) for nice tunes! 😁\r\n\r\n## Build Docker containers\r\nNow that we have the CSV files in the `data` folder, we can now build our Docker containers using this command:\r\n```\r\ndocker-compose up\r\n```\r\n\r\nThis command will build our `dbt`, `postgres`, and `metabase` containers. This will also run our data loading, transformations, and modeling in the background.\r\n\r\n## Transform, model, and load data to Postgres DB using dbt\r\nDuring `docker-compose`, dbt runs the following commands\r\n- `dbt init spotify_analytics`: Creates the project folder\r\n- `dbt debug`: Checks the connection with the Postgres database\r\n- `dbt deps`: Installs the test dependencies\r\n- `dbt seed`: Loads the CSV files into staging tables in the database in `postgres`\r\n- `dbt run`: Runs the transformations and loads the data into the database\r\n- `dbt docs generate`: Generates the documentation of the dbt project\r\n- `dbt docs serve`: Serves the documentation on a webserver\r\n\r\nNavigating to http://localhost:8080 to see the documentation, we can see the lineage graph, a DAG (Directed Acyclic Graph).\r\n\r\n![DAG](images/dbt_docs.png)\r\n\r\nThis shows us how the CSV files have been transformed to the fact, dimension tables and views.\r\n\r\n## Serve to Metabase dashboard\r\n\r\nNow that the data is loaded and transformed in our database, we may now view it in http://localhost:3000.\r\nYou may need to login, the credentials are\r\n\r\n```\r\nemail: dbt@spotify.com\r\npassword: password1\r\n```\r\n\r\n![login](images/metabase_login.png)\r\n\r\nThen you can navigate through, play around, and analyze your data.\r\n\r\n## Questions\r\n\r\n- What are the more common tracks in my playlists?\r\n- Avg length of playlists?\r\n- What are my favourites (most listened - top 5) genres in my playlists?\r\n- What are my favourites (most listened - top 10) artists in my playlists?\r\n- Am I born at the right decade? (more common release years of tracks in my playlists)\r\n- What are the two keys that please me more? (2 most commons keys on tracks in my playlists)\r\n- How much hipster am I? (avg popularity of tracks in my playlists)\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fftupas%2Fdbt-spotify-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fftupas%2Fdbt-spotify-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fftupas%2Fdbt-spotify-analytics/lists"}