{"id":25026367,"url":"https://github.com/hadjuse/spotify-recommendation-knn","last_synced_at":"2026-05-03T04:34:20.721Z","repository":{"id":154488319,"uuid":"627551769","full_name":"hadjuse/Spotify-recommendation-KNN","owner":"hadjuse","description":"Here is a personal project where i use KNN project to classify songs ","archived":false,"fork":false,"pushed_at":"2023-04-24T20:20:45.000Z","size":4732,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-05T17:48:02.720Z","etag":null,"topics":["data-science","graphics","interpretation","knn-classification","machine-learning","matplotlib","music","numpy","plotly","spotify"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hadjuse.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-04-13T17:47:35.000Z","updated_at":"2023-05-20T07:13:35.000Z","dependencies_parsed_at":null,"dependency_job_id":"827251c5-f3ed-4041-bd73-19ad7a3e2454","html_url":"https://github.com/hadjuse/Spotify-recommendation-KNN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadjuse%2FSpotify-recommendation-KNN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadjuse%2FSpotify-recommendation-KNN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadjuse%2FSpotify-recommendation-KNN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hadjuse%2FSpotify-recommendation-KNN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hadjuse","download_url":"https://codeload.github.com/hadjuse/Spotify-recommendation-KNN/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246342985,"owners_count":20761947,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","graphics","interpretation","knn-classification","machine-learning","matplotlib","music","numpy","plotly","spotify"],"created_at":"2025-02-05T17:36:10.960Z","updated_at":"2026-05-03T04:34:15.699Z","avatar_url":"https://github.com/hadjuse.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cp align=\"center\" width=\"100%\"\u003e\n    \u003cimg width=\"25%\" src=\"https://github.com/hadjuse/Spotify-recommendation-KNN/blob/main/images/logo.png\"\u003e\n\u003c/p\u003e\n\n\n# Spotify-classification-song-KNN\nHere is a project that I made which contains graphics and analysis of a dataset that I found about songs and the caracteristic of them.\nThe aim of the project is to make prediction of what modes (major or minor) belong a sampled songs using 2 caracteristics like danceability and rythm.\n\n## Summaries:\n  1. **[Data cleaning](#Data-cleaning)**\n  2. **[Data selection](#Data-selection)**\n  3. **[Interpretation, Classification and plot](#ploty)**\n        - [Interpretation](#i)\n        - [Classification Data](#c)\n  4. **[Machine Learning: KNN](#train)**\n        - [Optimisation](#opti)\n  5. **[Visualisation of the results](#result)**\n       - [Confusion matrix](#confusion)\n       - [Boundary decision](#boundary)\n # \u003ca name = \"Data-cleaning\"\u003e\u003c/a\u003eData cleaning\n First of all, we import all necessary libraries:\n  ```python\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport plotly.express as px\nfrom sklearn import preprocessing\nimport seaborn as sns\nfrom sklearn.metrics import confusion_matrix\nfrom sklearn import metrics\n ```\n Nothing to do here because the data is organized and clean. So we use this code below:\n ```python\n data = pd.read_csv(\"hit_songs/Hit Songs/spotify_hits_dataset_complete.csv\", sep='\\t', parse_dates=True)\nprint(data.shape)\ndata.head()\n ```\n # \u003ca name = \"Data-selection\"\u003e\u003c/a\u003eData selection\nThe interesting datas are : Song_name, artist name, and the different information about the song, such as the key of the song or bpm.\nCode:\n```python\nsong_information = data.iloc[0:25, 12:23]\n```\n# \u003ca name = \"ploty\"\u003e\u003c/a\u003eInterpretation of the datas\nLets plot some information like the popularity of the first 25 songs:\n```python\nartist_information[\"popularity\"].plot.bar()\n```\n![](images/popularity.png)\n\nThis one is about the the technical's details on the songs\n```python\nsong_information[[\"acousticness\", \"liveness\", \"valence\"]].plot.bar(figsize=(10,7))\n```\n![](images/alv.png)\n\n# \u003ca name=\"i\"\u003e\u003ca/\u003eInterpretation\nWe can give an interpretation about these two graphics. \n- The first graphic shows the popularity of each song that we pick up.\n- The second graphic shows differents bar which represent (based on 3 caracteristic) the efficiency of each song.\n\n\nLet's explain these 3 caracteristics: \n1. **acousticness**:\n\n    - Informs the probability of a song to be acoustic or not.\n\n\n2. **liveness**:\n\n    - Detects the presence of an audience in a song.       \n    The higher the liveness value, the higher the        \n    probability of a song being performed live.\n    \n    \n3. **valence**:\n\n    - Describes the positiveness within a song.          \n      High valence values represent happier songs,        \n      whereas low values characterize the opposite.\n\nFinally we can assert that the most popular song on this subdata, which is *côte ouest* **n°17**, is not a liveness song but he's quite **acoustic** and has an average **valence**.\nAnd the less popular songs on this selection has a less probability of acousticsness.\n# \u003ca name=\"c\"\u003eClassification of the songs\nThis section is to classify the different song according to their:\n\n1. **danceability**:\n    - Combines tempo, rhythm and other elements         \n      to inform if a song is suitable for dancing.\n\n2. **mode**:\n    - If the song is in the minor key or major key.\n3. **energy**:\n    - Represents the intensity and activity of a song by     \n      combining information such as dynamic range, perceived   \n    loudness, timbre, onset rate, and general entropy.\n\nLet's separate in 2 group.\nCode:\n```python\ndf = data[[\"mode\", \"danceability\", \"energy\"]].iloc[0: 750]\ndf_dance=df[df[[\"danceability\", \"energy\"]] \u003e= 0.5] # ig the song is \"danceable\"\ndf_less_dance=df[df[[\"danceability\", \"energy\"]] \u003c= 0.5] # if the song is not \"danceable\" \n\n\nax = df_dance.plot.scatter(x=\"danceability\", y=\"energy\", color=\"DarkBlue\", label=\"Danceable and energy\")\ndf_less_dance.plot.scatter(x=\"danceability\", y=\"energy\", color=\"DarkGreen\", label=\"less Danceable and energy\", ax=ax)\ndf.plot.scatter(x=\"danceability\", y=\"energy\", c=\"mode\" ,cmap=\"viridis\", s=50, figsize=(10,7))\n```\nThe idea above is to separate in 2 equivalent proportionnal groups showing how many songs have a higher chance to be danceable or not.\n![](images/danceability.png)\n\nAs we can see, there is much more songs on the dataset which have a good chance to be energetic and danceable.\n\nNow, let's see if there is a good repartition between the songs according to their \u003ca name=\"c\"\u003e modes (i.e major or minor)\n![](images/mode.png)\nWe see that the modes have a good repartition between the songs.\n# Final interpretation:\n    1. There is much song in the minor key that they are in the major key.\n    2. There is more songs that are danceable and energic.\n\n# \u003ca name=\"train\"\u003e Machine learning using KNN\nLet's implement an algorithm using the model KNN with python in order to group the different type of music using 3 caracteristic such as danceability, energy, mode.\n## Sampling data\nCode:\n```python\ndf1 = data[[\"song_name\",\"danceability\", \"energy\", \"mode\"]].iloc[0: 700]\ndf1.to_numpy()\ndf1.hist(figsize=(10,10))\nsample = np.random.randint(df1.shape[0], size=500)\nx = df1[[\"danceability\", \"energy\"]].iloc[sample].to_numpy() # \"danceability and energy of the song\"\ny = df1[\"mode\"].iloc[sample]\nlab = preprocessing.LabelEncoder()\ny_transformed = lab.fit_transform(y)\n```\n![](images/hist.png)\n\n## Train set and test set:\n### We'll try to train our model firstly with K = 6.\n```python\nfrom sklearn.model_selection import train_test_split\nxtrain, xtest, ytrain, ytest = train_test_split(x, y_transformed, train_size=0.8)\nfrom sklearn import neighbors\nknn = neighbors.KNeighborsClassifier(n_neighbors=6)\nknn.fit(xtrain, ytrain.ravel())\n```\n    Let's see the first performance of our model:\n```python\nerror = 1 - knn.score(xtest,ytest)\nprint(f\"Error = {error}\")\n```\n    output: Error = 0.43999999999999995\nWell... our model do a mistake 1 over 2 times it's quite not good.\nSo let's see what is the most optimal K in order to have a better score in our prediction.\n## \u003ca name=\"opti\"\u003eOptimisation\n\n```python\nerrors = []\nfor k in range(2,15):\n    knn = neighbors.KNeighborsClassifier(k)\n    errors.append(100*(knn.fit(xtrain, ytrain.ravel()).score(xtest, ytest)))\nplt.plot(range(2,15), errors, 'o-')\nplt.show()\n```\n![](images/score_curve.png)\n ## New score for k = 2\n ```python\n best_k = neighbors.KNeighborsClassifier(n_neighbors=2)\nbest_k.fit(xtrain, ytrain.ravel())\nprint(f\"Modèle d'entrainement {best_k.score(xtrain,ytrain)}\")\npredicted = best_k.predict(xtest)\nprint(f\"Best score model now: {best_k.score(xtest, ytest)}\")\n ```\n    training set score: 0.83\n    test set score: 0.7\n    It's a little bit better\n# \u003ca name=\"result\"\u003eVisualisation \n```python\ncnf_matrix = metrics.confusion_matrix(ytest,predicted)\np = sns.heatmap(pd.DataFrame(cnf_matrix), annot=True, cmap=\"YlGnBu\" ,fmt='g')\nplt.title('Confusion matrix', y=1.1)\nplt.ylabel('Actual label')\nplt.xlabel('Predicted label')\n```\n![](images/cfm.png)\n\n### \u003ca name=\"confusion\"\u003eHere's the confusion matrix for the predicted value.\n\nThe top left square show us that **43** value predicted are really in the major key. The top right square shows us that our model predicted **7** songs which don't     belong to the major key but these songs belong for real to the major key.\nThe bottom left square represents the song which were predicted as  being in the minor key but are not really in the minor key. Thus, the opposite represent the songs that are really in the minor key.\n\n## \u003ca name=\"boundary\"\u003eBoundary decision\n### Final visuation of the Boundary decision\n```python\nh = .02   \ncolors = \"bry\"   \nx_min, x_max = x[:, 0].min() - 1, x[:, 0].max() + 1   \ny_min, y_max = x[:, 1].min() - 1, x[:, 1].max() + 1   \nxx, yy = np.meshgrid(np.arange(x_min, x_max, h),   \n                     np.arange(y_min, y_max, h))   \n   \nZ = best_k.predict(np.c_[xx.ravel(), yy.ravel()])   \nZ = Z.reshape(xx.shape)   \ncs = plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)   \nplt.axis('tight')   \n   \nfor i, color in zip(best_k.classes_, colors):   \n    idx = np.where(y == i)   \n    plt.scatter(x[idx, 0], x[idx, 1], c=color, cmap=plt.cm.Paired, edgecolor='black', s=20)\n```\n![](images/knn_results.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadjuse%2Fspotify-recommendation-knn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhadjuse%2Fspotify-recommendation-knn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhadjuse%2Fspotify-recommendation-knn/lists"}