An open API service indexing awesome lists of open source software.

https://github.com/leftcoastnerdgirl/unsupervised_learning

This project is the first step in machine learning, using K Means and Principal Component Analysis.
https://github.com/leftcoastnerdgirl/unsupervised_learning

kmeans-clustering pca pca-analysis principal-component-analysis unsupervised-learning unsupervised-machine-learning

Last synced: 6 days ago
JSON representation

This project is the first step in machine learning, using K Means and Principal Component Analysis.

Awesome Lists containing this project

README

        

# Unsupervised learning using K Means and Principal Component Analysis (PCA)

This project uses unsupervised learning techniques including k-means clustering and PCA to predict changes cryptocurrencies.

# Prepare the Data
-Use the StandardScaler() module from scikit-learn to normalize the data from the CSV file.
-Create a DataFrame with the scaled data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

![image](https://github.com/user-attachments/assets/7b6a61c1-0516-405f-bb0d-14ed11397755)

# Data discovery
-Find the Best Value for k Using the Original Scaled DataFrame.
-Use the elbow method to find the best value for k using the following steps:
-Create a list with the number of k values from 1 to 11.
-Create an empty list to store the inertia values.
-Create a for loop to compute the inertia with each possible value of k.
-Create a dictionary with the data to plot the elbow curve.
-Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.
-Answer the following question in your notebook: What is the best value for k?

![image](https://github.com/user-attachments/assets/e4fac361-aca8-457b-8f2e-3e5f39c230a6)


# Cluster Cryptocurrencies with K-means Using the Original Scaled Data
-Use the following steps to cluster the cryptocurrencies for the best value for k on the original scaled data:
-Initialize the K-means model with the best value for k.
-Fit the K-means model using the original scaled DataFrame.
-Predict the clusters to group the cryptocurrencies using the original scaled DataFrame.
-Create a copy of the original data and add a new column with the predicted clusters.
-Create a scatter plot using hvPlot as follows:
-Set the x-axis as "price_change_percentage_24h" and the y-axis as "price_change_percentage_7d".
-Color the graph points with the labels found using K-means.
-Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

![image](https://github.com/user-attachments/assets/776cb605-af01-4191-8445-5238525a9b30)

# Optimize Clusters with Principal Component Analysis
-Using the original scaled DataFrame, perform a PCA and reduce the features to three principal components.
-Retrieve the explained variance to determine how much information can be attributed to each principal component and then answer the following question in your notebook:
-What is the total explained variance of the three principal components?
-Create a new DataFrame with the PCA data and set the "coin_id" index from the original DataFrame as the index for the new DataFrame.

![image](https://github.com/user-attachments/assets/a10297d7-280d-4f50-a845-ca93a3676f34)

# Find the Best Value for k Using the PCA Data
Use the elbow method on the PCA data to find the best value for k using the following steps:
-Create a list with the number of k-values from 1 to 11.
-Create an empty list to store the inertia values.
-Create a for loop to compute the inertia with each possible value of k.
-Create a dictionary with the data to plot the Elbow curve.
-Plot a line chart with all the inertia values computed with the different values of k to visually identify the optimal value for k.

![image](https://github.com/user-attachments/assets/f339c749-7fb7-44d9-bb5b-9a553f49b8d4)

# Cluster Cryptocurrencies with K-means Using the PCA Data
Use the following steps to cluster the cryptocurrencies for the best value for k on the PCA data:
-Initialize the K-means model with the best value for k.
-Fit the K-means model using the PCA data.
-Predict the clusters to group the cryptocurrencies using the PCA data.
-Create a copy of the DataFrame with the PCA data and add a new column to store the predicted clusters.
-Create a scatter plot using hvPlot as follows:
-Set the x-axis as "PC1" and the y-axis as "PC2".
-Color the graph points with the labels found using K-means.
-Add the "coin_id" column in the hover_cols parameter to identify the cryptocurrency represented by each data point.

![image](https://github.com/user-attachments/assets/b0cb69bf-7d3d-4d4d-a64d-a48f35a57e25)