An open API service indexing awesome lists of open source software.

https://github.com/sleeplessglory/big-data

Projects regarding big data analysis, presented within Jupyter Notebook
https://github.com/sleeplessglory/big-data

big-data data-analysis data-visualization jupyter python

Last synced: 11 days ago
JSON representation

Projects regarding big data analysis, presented within Jupyter Notebook

Awesome Lists containing this project

README

          

## 💿 Introduction
This repository contains all the projects I've done previously for the big data analysis.
These projects have been reuploaded within a single day since I didn't intend to show them on GitHub before.
Now I'll guide you through them.
## 📑 Pandas and sklearn
Within this project I learned how to apply these libraries for big data analysis.
Feel free to head to the folder and check out the .ipynb file, since there's a rendered output by Jupyter Notebook.
Here's a sneak peak:

**Records where the average age of houses in the area is over 50 years and the population is over 2500 people**

![image](https://github.com/user-attachments/assets/47c386f1-08f1-46ba-81af-c02d5076222c)

## 📆 Statistics
Regarding this project some basic statistics and even more have been implemented.
Head to the corresponding folder to check out the JN file for all results. I'll include some of them here:

**Histograms for numerical data**

![image](https://github.com/user-attachments/assets/443430d6-e98a-4630-b54f-d8f9a23a10ef)


**Standard and average deviations**

![image](https://github.com/user-attachments/assets/1d51ee27-fee6-4713-a167-cdf5e0da7708)


**Expenses distribution normality checking**

![image](https://github.com/user-attachments/assets/7a36e0a7-4424-408c-86e3-d4c81e998386)

## 🎯 t-SNE multidimensional visualisation
The project is related to nonlinear dimensional reduction methods to visualise multidimensional data. The t-SNE algorithm is used for this purpose.
Let's check out some of the results. All rendered outputs are available within the corresponding folder within the repository.

**Perplexity 5 visualisation**

![image](https://github.com/user-attachments/assets/d0858019-22f5-42f4-8c84-8e26e189a496)


**Multidimensional data visualisation**

![image](https://github.com/user-attachments/assets/7bec554b-6066-4a9d-8633-2467b636c82c)

## 🏰 Clustering
Regarding this project the clustering algorithms have been applied.
Let's check some of the results:

**Clustered data visualisation**

![image](https://github.com/user-attachments/assets/1eace2f0-5472-47bb-9564-5e01ff606af1)


**K-means clustering algorithm**

![image](https://github.com/user-attachments/assets/a5ff45f7-8984-43f9-83cc-923297fda92b)


**Agglomerative hierarchical clustering algorithm**

![image](https://github.com/user-attachments/assets/57cae624-67cb-44be-9eed-456692d55b91)


**DBSCAN clustering algorithm**

![image](https://github.com/user-attachments/assets/e9236571-a4c3-4f9a-9572-8ef142175556)

## 🎤 Association rules learning
This project is related to the ARL method.
Check out some of the results:

**Relative frequency visualisation**

![image](https://github.com/user-attachments/assets/7a841d5c-8f21-4542-b3fe-2d88faa65b7d)


**Algorithms execution time**

![image](https://github.com/user-attachments/assets/2a3d9b06-dcc4-41be-bf36-97d6c8e03f03)