{"id":22868475,"url":"https://github.com/kellyjadams/spotify-data-analyze","last_synced_at":"2025-05-07T14:20:39.893Z","repository":{"id":267530537,"uuid":"901530615","full_name":"kellyjadams/spotify-data-analyze","owner":"kellyjadams","description":"A serverless data pipeline that logs my Spotify listening history to BigQuery using Cloud Run, then visualizes trends with Looker Studio. Built with Python, Flask, Docker, and GCP..","archived":false,"fork":false,"pushed_at":"2025-05-06T03:58:41.000Z","size":51,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-06T04:33:58.850Z","etag":null,"topics":["data-analysis","data-engineering"],"latest_commit_sha":null,"homepage":"https://lookerstudio.google.com/reporting/e2f6d5f3-c3cf-4687-ba01-d3a47a15998c","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kellyjadams.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-12-10T20:39:05.000Z","updated_at":"2025-05-06T03:58:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"663928bc-d83f-4e74-9c88-06426013658c","html_url":"https://github.com/kellyjadams/spotify-data-analyze","commit_stats":null,"previous_names":["kellyjadams/spotify-data-analyze"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kellyjadams%2Fspotify-data-analyze","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kellyjadams%2Fspotify-data-analyze/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kellyjadams%2Fspotify-data-analyze/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kellyjadams%2Fspotify-data-analyze/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kellyjadams","download_url":"https://codeload.github.com/kellyjadams/spotify-data-analyze/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252892511,"owners_count":21820649,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","data-engineering"],"created_at":"2024-12-13T12:35:16.204Z","updated_at":"2025-05-07T14:20:39.886Z","avatar_url":"https://github.com/kellyjadams.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spotify Listening Logger\n\n\nThis is a personal data project focusing on **data engineering** and **analytics engineering** skills.\n\n- **Data engineering**: ETL pipelines, orchestration, and cloud infrastructure  \n- **Analytics engineering**: **BigQuery SQL**, data modeling, building analysis-ready datasets\n\nIt automatically logs my Spotify listening history every minute and stores it in **BigQuery** for analysis.\n\nLinks: \n- [Blog Post](https://www.kellyjadams.com/post/spotify-listening-logger)\n\n## Why I Built This\n\nAt the end of the year, I want to compare what's logged in BigQuery with my annual **Spotify Wrapped**.\n\nI wanted to explore how to:\n\n- Build a serverless data pipeline using **Cloud Run**, **Docker**, and **GitHub Actions**\n- Manage secrets securely with `.env` files and **GitHub Secrets**\n- Schedule and orchestrate updates using **Cloud Scheduler**\n- Structure and transform data for analysis in **BigQuery**\n- Design pipelines that are scalable and support near real-time insights\n\nThis project brings together core **cloud** and **analytics engineering** tools to build something end-to-end—from ingestion to analysis.\n\n## Key Features\n\n- **Data Ingestion \u0026 Deployment**\n  - Polls Spotify’s now-playing endpoint every minute using a serverless Cloud Run app\n  - Built with a containerized Python app using Flask + Spotipy\n  - Deployed manually using a shell script; Cloud Scheduler handles orchestration by triggering the Cloud Run endpoint every minute.\n- **Data Storage \u0026 Modeling**\n  - Streams listening history into **BigQuery**\n  - Cleans and deduplicates plays for session-level analysis\n  - Stores detailed metadata: artist, album, genre, popularity, duration\n- **Analysis \u0026 Visualization**\n  - Designed analysis-ready datasets using **BigQuery SQL**\n\n## Technical Skills\n\n- **Data Pipeline Design**: Built a serverless ETL pipeline from Spotify API to BigQuery using Python and Cloud Run\n- **Cloud-Native ETL**: Extracted, transformed, and loaded data on a schedule using Cloud Scheduler and containerized Flask app\n- **Python**: Wrote ingestion and transformation logic using the `Spotipy` library\n- **REST API**: Created a lightweight endpoint to trigger ingestion using Flask\n- **BigQuery**: Designed table schema and streamed structured data for analysis\n- **Docker + Cloud Run**: Packaged and deployed the app as a scalable container\n- **Environment variable management**: Handled secrets securely using `.env` and GitHub Secrets\n\n## Project Structure\n\n```\nspotify-data-analyze/\n├── analysis/\n│   ├── queries/\n│   ├── views/\n│   │   ├── deduped_plays_pacific.sql\n├── cloud/\n│   └── playback/\n│       ├── main.py\n│       ├── Dockerfile\n│       ├── requirements.txt\n│       └── deploy.sh\n├── scripts/\n│   ├── create_bigquery_table.py            \n│   ├── delete_bigquery_table.py            \n│   └── load_env.py                \n├── .env                           \n└── .github/workflows/\n    └── cloud-deploy.yml\n```\n\n## Environment \u0026 Deployment\n\nThis project uses a `.env` file for Spotify and GCP credentials. These variables are injected during Cloud Run deployment.\n\nI deployed the app manually using:\n\n```bash\ncd cloud/playback\n./deploy.sh\n```\n\nI automated ingestion, by setting up a **Cloud Scheduler** job to hit the app endpoint every minute.\n\nThe data is stored in a **BigQuery** table with fields like `track`, `artist`, `genre`, and `popularity`. See `create_bigquery_table.py` for schema setup.\n\n## Next Steps\n\nBelow are my next steps:\n- Automate CI/CD deployment via GitHub Actions\n- Finalize my Looker Studio Dashboard that automatically retrieves data from my BigQuery tables\n- Analyze stats using BigQuery SQL queries \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkellyjadams%2Fspotify-data-analyze","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkellyjadams%2Fspotify-data-analyze","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkellyjadams%2Fspotify-data-analyze/lists"}