{"id":24003402,"url":"https://github.com/dmschauer/aws-cdk-save-spotify-data","last_synced_at":"2026-04-20T13:07:34.924Z","repository":{"id":184843415,"uuid":"445960890","full_name":"dmschauer/AWS-CDK-save-spotify-data","owner":"dmschauer","description":"Use AWS CDK (Python) to periodically call Spotify APIs and store artist data using serverless services (Lambda, API-GW, DynamoDB, S3, EventBridge)","archived":false,"fork":false,"pushed_at":"2022-01-09T16:42:10.000Z","size":492,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-02T06:45:05.762Z","etag":null,"topics":["aws","aws-cdk","aws-lambda","dynamodb","python","serverless","spotify"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dmschauer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-01-09T00:40:13.000Z","updated_at":"2022-10-28T13:23:27.000Z","dependencies_parsed_at":"2023-07-30T15:40:24.322Z","dependency_job_id":null,"html_url":"https://github.com/dmschauer/AWS-CDK-save-spotify-data","commit_stats":null,"previous_names":["dmschauer/aws-cdk-save-spotify-data"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dmschauer/AWS-CDK-save-spotify-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmschauer%2FAWS-CDK-save-spotify-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmschauer%2FAWS-CDK-save-spotify-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmschauer%2FAWS-CDK-save-spotify-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmschauer%2FAWS-CDK-save-spotify-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dmschauer","download_url":"https://codeload.github.com/dmschauer/AWS-CDK-save-spotify-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dmschauer%2FAWS-CDK-save-spotify-data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32048474,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T11:35:06.609Z","status":"ssl_error","status_checked_at":"2026-04-20T11:34:48.899Z","response_time":94,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-cdk","aws-lambda","dynamodb","python","serverless","spotify"],"created_at":"2025-01-08T01:10:47.959Z","updated_at":"2026-04-20T13:07:34.909Z","avatar_url":"https://github.com/dmschauer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Save Spotify data using AWS CDK\n\nThis project showcases the use of AWS CDK for automatically deploying a stack of services that essentially query an API, transform the returned data and save it into two data separate sinks: an S3 bucket and an DynamoDB table.\nAll pieces of infrastructure and the entire logic are defined within the project.\n\nIt was built using Python 3.9.9 on Windows 10.\n\n## Description\n\nWhat the stack does is setting up a Lambda function, linking it with an API Gateway for on demand execution and scheduling it to be executed once a day at 8 AM UTC via an EventBridge event. \nIt also sets up the S3 bucket and DynamoDB table to store the data.\nThe Lambda runs Python 3.8 and uses the Layer ```arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python38:1``` supplied by AWS for its libraries.\nWhat the Lambda function does is querying the Spotify Web API for current data about a list of artists (see Configuration). For each artist it will do the following: \n- Query the Spotify API for the Spotify artist ID corresponding to the artist name \n- Query the API again for the 10 most popular tracks of the artist at the time as defined by the Spotify popularity score\n- Save the returned data as JSON in an S3 bucket (```top_tracks_{artist_name}_YYYY-mm-dd.json\"```)\n    - All files are saved in a subdirectory called ```/history```\n- Save the returned data in an DynamoDB table a) as is and b) in a reduced format with only the most important information (track names, popularity scores)\n    - The table has a Partition Key on the artist_name and a Sort Key on the date in YYYY-mm-dd format. The other information is saved in three attributes called artist_id, top_tracks and top_tracks_main_info.\n\n![save-spotify-data-CDK](./images/save-spotify-data-CDK.png)\n\n## Running it\n\nTo deploy the app you can follow the instructions laid out in the official AWS CDK examples repo: https://github.com/aws-samples/aws-cdk-examples/tree/master/python\nBut here I also list the steps and added some details relevant to this specific project:\n- Make sure you have AWS CLI installed on your machine and already linked it to your AWS account using an account with sufficient permissions: ```aws --version```\n- Ensure CDK is installed: ```npm install -g aws-cdk```\n- Create a Python virtual environment: ```python3 -m venv .venv```\n- Activate virtual environment\n    - On MacOS or Linux: ```source .venv/bin/activate```\n    - On Windows: ```.venv\\Scripts\\activate.bat```\n- Install the required dependencies: ```pip install -r requirements.txt```\n- Get Spotify credentials and edit the settings file (see below)\n- Synthesize (`cdk synth`) or deploy (`cdk deploy`) the example: ```cdk deploy```\n\n## Configuration\n\nCritically you will need to get Spotify Web API credentials before you deploy the stack. You will need both the Client ID and Client Secret (https://developer.spotify.com/documentation/web-api/)\nThere are two files to configure:\n- Follow the instructions in ```./settings/spotify_settings_template.py```. Here you will set the ```CLIENT_ID``` and ```CLIENT_SECRET```.\n- Type in some artists that interest you in ```./lambda_save_spotify_data/config/artists.csv```. It's pre-configured with 10 popular artists. The lambda function will first call the API to get their Spotify artist ID and subsequently query it for more data. It's important to note that the first artist ID returned will be used. So in case there are two artists with the same name and you are interested in the less popular one, this won't work. It's a rather quick change but you would need to refactor the code to manually specify the Spotify artist ID instead of the artist name in plain human language.\n\n## Dispose of the stack afterwards\n\n```$ cdk destroy```\n\n## Cost\n\nThis is a serverless stack and running it is almost entirely within the Always Free tier.\nThe only part that will definitely cost you money is the caching used in the Lambda function. At the time of this writing it would cost you about $0.05/12 = $0.004 per each 5 minute interval within which you call the function.\n\nTo make it completely free (as long as you're also otherwise remain within Always Free usage limits) comment out the deploy_options defined within the SaveSpotifyDataCdkStack in ```./save_spotify_data_cdk/save_spotify_data_cdk_stack.py```. This will switch off the caching.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmschauer%2Faws-cdk-save-spotify-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdmschauer%2Faws-cdk-save-spotify-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdmschauer%2Faws-cdk-save-spotify-data/lists"}