{"id":24300578,"url":"https://github.com/sleeplessglory/big-data","last_synced_at":"2026-04-16T21:32:19.419Z","repository":{"id":272487574,"uuid":"916464018","full_name":"sleeplessglory/big-data","owner":"sleeplessglory","description":"Projects regarding big data analysis, presented within Jupyter Notebook","archived":false,"fork":false,"pushed_at":"2025-01-16T16:40:02.000Z","size":11705,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-06T12:52:43.113Z","etag":null,"topics":["big-data","data-analysis","data-visualization","jupyter","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sleeplessglory.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-14T06:29:41.000Z","updated_at":"2025-01-16T16:40:03.000Z","dependencies_parsed_at":"2025-01-14T19:56:08.017Z","dependency_job_id":null,"html_url":"https://github.com/sleeplessglory/big-data","commit_stats":null,"previous_names":["sleeplessglory/big-data"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sleeplessglory/big-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sleeplessglory%2Fbig-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sleeplessglory%2Fbig-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sleeplessglory%2Fbig-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sleeplessglory%2Fbig-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sleeplessglory","download_url":"https://codeload.github.com/sleeplessglory/big-data/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sleeplessglory%2Fbig-data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31905432,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"ssl_error","status_checked_at":"2026-04-16T18:21:47.142Z","response_time":69,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-analysis","data-visualization","jupyter","python"],"created_at":"2025-01-16T23:14:43.520Z","updated_at":"2026-04-16T21:32:19.380Z","avatar_url":"https://github.com/sleeplessglory.png","language":"Jupyter Notebook","readme":"## 💿 Introduction\nThis repository contains all the projects I've done previously for the big data analysis. \nThese projects have been reuploaded within a single day since I didn't intend to show them on GitHub before.\nNow I'll guide you through them.\n## 📑 Pandas and sklearn\nWithin this project I learned how to apply these libraries for big data analysis.\nFeel free to head to the folder and check out the .ipynb file, since there's a rendered output by Jupyter Notebook.\nHere's a sneak peak:\n\u003cbr\u003e**Records where the average age of houses in the area is over 50 years and the population is over 2500 people**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/47c386f1-08f1-46ba-81af-c02d5076222c)\n\n## 📆 Statistics\nRegarding this project some basic statistics and even more have been implemented.\nHead to the corresponding folder to check out the JN file for all results. I'll include some of them here:\n\u003cbr\u003e**Histograms for numerical data**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/443430d6-e98a-4630-b54f-d8f9a23a10ef)\n\n\u003cbr\u003e**Standard and average deviations**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/1d51ee27-fee6-4713-a167-cdf5e0da7708)\n\n\u003cbr\u003e**Expenses distribution normality checking**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/7a36e0a7-4424-408c-86e3-d4c81e998386)\n\n## 🎯 t-SNE multidimensional visualisation\nThe project is related to nonlinear dimensional reduction methods to visualise multidimensional data. The t-SNE algorithm is used for this purpose.\nLet's check out some of the results. All rendered outputs are available within the corresponding folder within the repository.\n\u003cbr\u003e**Perplexity 5 visualisation**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/d0858019-22f5-42f4-8c84-8e26e189a496)\n\n\u003cbr\u003e**Multidimensional data visualisation**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/7bec554b-6066-4a9d-8633-2467b636c82c)\n\n## 🏰 Clustering\nRegarding this project the clustering algorithms have been applied.\nLet's check some of the results:\n\u003cbr\u003e**Clustered data visualisation**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/1eace2f0-5472-47bb-9564-5e01ff606af1)\n\n\u003cbr\u003e**K-means clustering algorithm**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/a5ff45f7-8984-43f9-83cc-923297fda92b)\n\n\u003cbr\u003e**Agglomerative hierarchical clustering algorithm**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/57cae624-67cb-44be-9eed-456692d55b91)\n\n\u003cbr\u003e**DBSCAN clustering algorithm**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/e9236571-a4c3-4f9a-9572-8ef142175556)\n\n## 🎤 Association rules learning\nThis project is related to the ARL method.\nCheck out some of the results:\n\u003cbr\u003e**Relative frequency visualisation**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/7a841d5c-8f21-4542-b3fe-2d88faa65b7d)\n\n\u003cbr\u003e**Algorithms execution time**\n\u003cbr\u003e![image](https://github.com/user-attachments/assets/2a3d9b06-dcc4-41be-bf36-97d6c8e03f03)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsleeplessglory%2Fbig-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsleeplessglory%2Fbig-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsleeplessglory%2Fbig-data/lists"}