{"id":25918316,"url":"https://github.com/akashparley/spotify","last_synced_at":"2026-03-05T08:03:30.262Z","repository":{"id":272452045,"uuid":"872469603","full_name":"AkashParley/Spotify","owner":"AkashParley","description":"This project focuses on creating an interactive Tableau dashboard using a Spotify dataset, featuring track performance metrics like streams, views, and likes, along with attributes such as artist, album, and track features (e.g., energy, danceability). SQL queries were used to prepare the data, enabling insights into track popularity, artist perfor","archived":false,"fork":false,"pushed_at":"2025-11-06T11:52:07.000Z","size":4855,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-30T17:27:09.876Z","etag":null,"topics":["excel","sql","tableau"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AkashParley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-14T13:45:53.000Z","updated_at":"2025-11-06T11:52:11.000Z","dependencies_parsed_at":"2025-01-14T14:54:15.059Z","dependency_job_id":"341e157c-a91b-46a4-8a4b-ec9c9b3ebb34","html_url":"https://github.com/AkashParley/Spotify","commit_stats":null,"previous_names":["akashparley/spotify"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AkashParley/Spotify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashParley%2FSpotify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashParley%2FSpotify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashParley%2FSpotify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashParley%2FSpotify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AkashParley","download_url":"https://codeload.github.com/AkashParley/Spotify/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkashParley%2FSpotify/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30115662,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T03:40:26.266Z","status":"ssl_error","status_checked_at":"2026-03-05T03:39:15.902Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["excel","sql","tableau"],"created_at":"2025-03-03T14:18:50.942Z","updated_at":"2026-03-05T08:03:30.236Z","avatar_url":"https://github.com/AkashParley.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Spotify Advanced SQL Project and Query Optimization\n\nProject Category: Advanced\n[Click Here to get Dataset](https://www.kaggle.com/datasets/sanjanchaudhari/spotify-dataset)\n\n![ ](https://github.com/user-attachments/assets/ef3e4d64-82af-4192-adbd-5529e235e8ea)\n\n## Overview\nThis project involves analyzing a Spotify dataset with various attributes about tracks, albums, and artists using **SQL**. It covers an end-to-end process of normalizing a denormalized dataset, performing SQL queries of varying complexity (easy, medium, and advanced), and optimizing query performance. The primary goals of the project are to practice advanced SQL skills and generate valuable insights from the dataset.\n\n- **Dashboard Link:** [Tableau Dashboard](https://public.tableau.com/views/Book2_17289134476450/Dashboard1?:language=en-US\u0026:sid=\u0026:redirect=auth\u0026:display_count=n\u0026:origin=viz_share_link)\n\n![](https://github.com/user-attachments/assets/63551996-f9aa-4cfe-ab27-e4bbbbd98afd)\n\n\n\n```sql\n-- create table\nDROP TABLE IF EXISTS spotify;\nCREATE TABLE spotify (\n    artist VARCHAR(255),\n    track VARCHAR(255),\n    album VARCHAR(255),\n\n    album_type VARCHAR(50),\n    danceability FLOAT,\n    energy FLOAT,\n    loudness FLOAT,\n    speechiness FLOAT,\n    acousticness FLOAT,\n    instrumentalness FLOAT,\n    liveness FLOAT,\n    valence FLOAT,\n    tempo FLOAT,\n    duration_min FLOAT,\n    title VARCHAR(255),\n    channel VARCHAR(255),\n    views FLOAT,\n    likes BIGINT,\n    comments BIGINT,\n    licensed BOOLEAN,\n    official_video BOOLEAN,\n    stream BIGINT,\n    energy_liveness FLOAT,\n    most_played_on VARCHAR(50)\n);\n```\n## Project Steps\n\n### 1. Data Exploration\nBefore diving into SQL, it’s important to understand the dataset thoroughly. The dataset contains attributes such as:\n- `Artist`: The performer of the track.\n- `Track`: The name of the song.\n- `Album`: The album to which the track belongs.\n- `Album_type`: The type of album (e.g., single or album).\n- Various metrics such as `danceability`, `energy`, `loudness`, `tempo`, and more.\n\n### 4. Querying the Data\nAfter the data is inserted, various SQL queries can be written to explore and analyze the data. Queries are categorized into **easy**, **medium**, and **advanced** levels to help progressively develop SQL proficiency.\n\n#### Easy Queries\n- Simple data retrieval, filtering, and basic aggregations.\n  \n#### Medium Queries\n- More complex queries involving grouping, aggregation functions, and joins.\n  \n#### Advanced Queries\n- Nested subqueries, window functions, CTEs, and performance optimization.\n\n### 5. Query Optimization\n\nTo improve query performance, we carried out the following optimization process:\n\n- **Initial Query Performance Analysis Using `EXPLAIN`**\n    - We began by analyzing the performance of a query using the `EXPLAIN` function.\n    - The query retrieved tracks based on the `artist` column, and the performance metrics were as follows:\n        - Execution time (E.T.): **7 ms**\n        - Planning time (P.T.): **0.17 ms**\n    - Below is the **screenshot** of the `EXPLAIN` result before optimization:\n      ![EXPLAIN Before Index](https://github.com/najirh/najirh-Spotify-Data-Analysis-using-SQL/blob/main/spotify_explain_before_index.png)\n\n- **Index Creation on the `artist` Column**\n    - To optimize the query performance, we created an index on the `artist` column. This ensures faster retrieval of rows where the artist is queried.\n    - **SQL command** for creating the index:\n      ```sql\n      CREATE INDEX idx_artist ON spotify_tracks(artist);\n      ```\n\n- **Performance Analysis After Index Creation**\n    - After creating the index, we ran the same query again and observed significant improvements in performance:\n        - Execution time (E.T.): **0.153 ms**\n        - Planning time (P.T.): **0.152 ms**\n    - Below is the **screenshot** of the `EXPLAIN` result after index creation:\n      ![EXPLAIN After Index](https://github.com/najirh/najirh-Spotify-Data-Analysis-using-SQL/blob/main/spotify_explain_after_index.png)\n\n- **Graphical Performance Comparison**\n    - A graph illustrating the comparison between the initial query execution time and the optimized query execution time after index creation.\n    - **Graph view** shows the significant drop in both execution and planning times:\n      ![Performance Graph](https://github.com/najirh/najirh-Spotify-Data-Analysis-using-SQL/blob/main/spotify_graphical%20view%203.png)\n      ![Performance Graph](https://github.com/najirh/najirh-Spotify-Data-Analysis-using-SQL/blob/main/spotify_graphical%20view%202.png)\n      ![Performance Graph](https://github.com/najirh/najirh-Spotify-Data-Analysis-using-SQL/blob/main/spotify_graphical%20view%201.png)\n\nThis optimization shows how indexing can drastically reduce query time, improving the overall performance of our database operations in the Spotify project.\n---\n  \n---\n\n## 15 Practice Questions\n\n### Easy Level\n1. Retrieve the names of all tracks that have more than 1 billion streams.\n2. List all albums along with their respective artists.\n3. Get the total number of comments for tracks where `licensed = TRUE`.\n4. Find all tracks that belong to the album type `single`.\n5. Count the total number of tracks by each artist.\n\n### Medium Level\n1. Calculate the average danceability of tracks in each album.\n2. Find the top 5 tracks with the highest energy values.\n3. List all tracks along with their views and likes where `official_video = TRUE`.\n4. For each album, calculate the total views of all associated tracks.\n5. Retrieve the track names that have been streamed on Spotify more than YouTube.\n\n### Advanced Level\n1. Find the top 3 most-viewed tracks for each artist using window functions.\n2. Write a query to find tracks where the liveness score is above the average.\n3. **Use a `WITH` clause to calculate the difference between the highest and lowest energy values for tracks in each album.**\n```sql\nWITH cte\nAS\n(SELECT \n\talbum,\n\tMAX(energy) as highest_energy,\n\tMIN(energy) as lowest_energery\nFROM spotify\nGROUP BY 1\n)\nSELECT \n\talbum,\n\thighest_energy - lowest_energery as energy_diff\nFROM cte\nORDER BY 2 DESC\n```\n   \n5. Find tracks where the energy-to-liveness ratio is greater than 1.2.\n6. Calculate the cumulative sum of likes for tracks ordered by the number of views, using window functions.\n\n\nHere’s an updated section for your **Spotify Advanced SQL Project and Query Optimization** README, focusing on the query optimization task you performed. You can include the specific screenshots and graphs as described.\n\n--\n---\n\n## Technology Stack\n- **Database**: PostgreSQL\n- **SQL Queries**: DDL, DML, Aggregations, Joins, Subqueries, Window Functions\n- **Tools**: pgAdmin 4 (or any SQL editor), PostgreSQL (via Homebrew, Docker, or direct installation)\n\n## How to Run the Project\n1. Install PostgreSQL and pgAdmin (if not already installed).\n2. Set up the database schema and tables using the provided normalization structure.\n3. Insert the sample data into the respective tables.\n4. Execute SQL queries to solve the listed problems.\n5. Explore query optimization techniques for large datasets.\n\n---\n\n## Next Steps\n- **Visualize the Data**: Use a data visualization tool like **Tableau** or **Power BI** to create dashboards based on the query results.\n- **Expand Dataset**: Add more rows to the dataset for broader analysis and scalability testing.\n- **Advanced Querying**: Dive deeper into query optimization and explore the performance of SQL queries on larger datasets.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakashparley%2Fspotify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fakashparley%2Fspotify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fakashparley%2Fspotify/lists"}