{"id":20020590,"url":"https://github.com/soroush-04/python-data-visualization","last_synced_at":"2026-04-15T22:34:47.377Z","repository":{"id":182740732,"uuid":"669014066","full_name":"soroush-04/Python-data-visualization","owner":"soroush-04","description":"Implementation of various data visualization approaches on real-world datasets","archived":false,"fork":false,"pushed_at":"2023-09-27T17:55:55.000Z","size":8603,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-12T16:12:22.863Z","etag":null,"topics":["big-data","data-science","data-visualization","python"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/soroush-04.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-07-21T06:26:34.000Z","updated_at":"2023-12-25T12:05:39.000Z","dependencies_parsed_at":"2023-09-27T23:08:13.171Z","dependency_job_id":null,"html_url":"https://github.com/soroush-04/Python-data-visualization","commit_stats":null,"previous_names":["soroush-04/intoxication-behavior-visualization","soroush-04/data-visualization","soroush-04/python-data-visualization"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soroush-04%2FPython-data-visualization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soroush-04%2FPython-data-visualization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soroush-04%2FPython-data-visualization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/soroush-04%2FPython-data-visualization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/soroush-04","download_url":"https://codeload.github.com/soroush-04/Python-data-visualization/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241454525,"owners_count":19965405,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-science","data-visualization","python"],"created_at":"2024-11-13T08:33:06.801Z","updated_at":"2026-04-15T22:34:47.346Z","avatar_url":"https://github.com/soroush-04.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Visualization Projects\n---\n\n\nThis repository contains detailed explanations of various data visualization approaches I have used to uncover hidden patterns within datasets and attributes.\n\n\u003cu\u003eTable of contents\u003c/u\u003e\n=======\n\n\u003c!--ts--\u003e\n  - [Project 1 - Intoxication Behavior](#project-1---intoxication-behavior)\n      - [Problem Statement ](#problem-statement-1)\n      - [Data Analysis \u0026 Preprocessing](#project1-2)\n      - [Visualization](#visualization1)\n  - [Project 2 - Breast Cancer Diagnosis](#project-2---dimension-reduction)\n      - [Problem Statement](#problem-statement-2)\n      - [Data Analysis \u0026 Preprocessing](#project2-2)\n      - [Visualization](#visualization2)\n\u003c!--te--\u003e\n\n---\n\n## Project 1 - Intoxication Behavior\n#### Problem Statement \u003ca id=\"problem-statement-1\"\u003e\u003c/a\u003e\nIn the realm of public health and safety, the issue of intoxication behavior poses significant challenges. Understanding and effectively addressing this problem is crucial for ensuring the well-being of individuals and communities alike. The main focus of this data visualization project is on detecting hidden patterns in intoxication behavior data. By uncovering these patterns, we aim to provide valuable insights for improving public health and safety measures.\n\n#### Data Analysis \u0026 Preprocessing \u003ca id=\"project1-2\"\u003e\u003c/a\u003e\n\nThis dataset forms the basis of a research study conducted with 13 participants, representing a substantial and intricate dataset. The primary challenge in visualizing this dataset lies in the division of valuable data between two distinct datasets: accelerometer and TAC (Transdermal Alcohol Content). The accelerometer dataset comprises 14,057,567 records across five dimensions, encompassing Participant ID (PID), time, and participant acceleration in the X, Y, and Z directions. The accelerometer data was collected using mobile devices for each participant. Conversely, the TAC dataset contains PID and TAC values at specific timestamps, representing the Transdermal Alcohol Content, providing a critical link between an individual's alcohol level and their movement patterns. Our core objective is to establish meaningful relationships between TAC levels and participant mobility, contributing to a deeper understanding of the data.\n\n\n#### Visualization  \u003ca id=\"visualization1\"\u003e\u003c/a\u003e\n\nAs illustrated in the first figure below, the movement behavior of user BK7610 is distinctly evident. This visualization is generated using the accelerometer dataset and its respective dimensions, including X, Y, and Z. It allows us to observe real-time variations in these specific directions, offering insights across various timestamps throughout the day. For a comprehensive understanding of the participants' behavior, similar plots have been created for all other users and are individually presented. Explore the remaining visualizations below.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-1.png\" alt=\"Image 2-2\" width=500 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-2.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-3.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-4.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-5.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-6.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-7.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nFollowing our analysis of the accelerometer dataset, our focus now shifts to the TAC dataset, where we explore the relationship between each participant's TAC levels and the temporal variations within them. As depicted here, we present a comprehensive view of the TAC level fluctuations over the course of a day for all participants. Each participant is represented by a specific color, and a red threshold line at 0.08, signifying the legal TAC limit, enhances the interpretability of the visualization. Values above this threshold indicate intoxication, while values below signify sobriety, offering valuable insights into participants' alcohol levels during specific time intervals.\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-8.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nAlso here Is the visualization of sober and intoxicated data, offering both frequency and percentage versions to enhance comprehension. It also illustrates the percentage distribution of sober and intoxicated values for each participant. These comprehensive visualizations encapsulate the essential insights within the dataset, highlighting significant relationships throughout its entirety. Moreover, our approach of providing detailed, individual visualizations for each participant serves as a valuable resource for future research opportunity.\n\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-9.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project1/1-10.png\" alt=\"Image 2-2\" width=700 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\n\n---\n## Project 2 - Breast Cancer Diagnosis \n\n#### Problem Statement \u003ca id=\"problem-statement-2\"\u003e\u003c/a\u003e\nThe dataset provides valuable insights into breast cancer diagnoses, with a focus on various attributes related to tumor characteristics. The main goal of this project is to detect hidden patterns between various attributes within the dataset by utilizing data visualization methods.\n\n#### Data Analysis \u0026 Preprocessing \u003ca id=\"project2-2\"\u003e\u003c/a\u003e\nThe dataset includes 569 samples with 32 dimensions, comprising patient IDs, diagnosis status, and 30 tumor-related features. The dataset can be divided into three subsets, each consisting of 10 features:\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-1.png\" alt=\"Image 2-1\" width=\"300\" style=\"margin: 0 10px;\"\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n  \u003cimg src=\"./Images/Project2/2-2.png\" alt=\"Image 2-2\" width=\"360\" style=\"margin: 0 10px;\"\u003e\n\u003c/div\u003e\n\n\n\n\u003cbr\u003e\u003cbr\u003e\nTumor Geometry\n- Radius: Mean distance from center to perimeter points\n- Texture: Standard deviation of gray-scale values\n- Perimeter: Perimeter measurement\n- Area: Area measurement\n\u003cbr\u003e\n\nTumor Smoothness\n- Smoothness: Local variation in radius lengths\n- Compactness: Computed as (perimeter^2) / area - 1.0\n- Concavity: Severity of concave portions of the contour\n- Concave Points: Number of concave portions of the contour\n\u003cbr\u003e\n\nTumor Symmetry\n- Symmetry: Symmetry of tumor\n- Fractal Dimension: Fractal dimension of tumor\n \u003cbr\u003e\n\n\u003c!-- ![hey](./test.jpg) --\u003e\n\nTo effectively utilize this dataset for visualization and future modeling, it's essential to ensure that all variables are in numeric format. The dataset primarily comprises numeric attributes, with one exception: the \"Diagnosis\" column. It contains two parameters: \u003cbr\u003e\n- \"Malignant\" cases indicate tumors that are potentially dangerous and have spread significantly.\n- \"Benign\" cases refer to tumors that are less dangerous and have not spread extensively.\n\nTo prepare the dataset for analysis, we will convert the \"Diagnosis\" variable into a numeric format, enabling us to employ various visualization techniques and modeling approaches effectively.\n\n#### Visualization  \u003ca id=\"visualization2\"\u003e\u003c/a\u003e\nAfter preprocessing phase, let's start with visualization and utilizing various visualization techniques to gather more concise understanding.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-3.png\" alt=\"Image 2-1\" width=540 style=\"margin: 0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nNext, the analysis involves visualizing the remaining 30 features to explore their relationships. histogram visualizations are provided for the mean, standard error (SE), and worst-case features. These visualizations offer a clear view of each variable's behavior.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-4.png\" alt=\"Image 2-1\" width=500 style=\"display:inline-block; margin:0 2px;\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-5.png\" alt=\"Image 2-2\" width=500 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-6.png\" alt=\"Image 2-1\" width=500 style=\"display:inline-block; margin:0 2px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nAlso, density plots were employed to provide a visual representation. These plots effectively reveal exponential behavior within various variables. For instance, the plots clearly indicate that the variables \"area,\" \"compactness,\" and \"fractal_dimension\" exhibit exponential distributions.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-7.png\" alt=\"Image 2-1\" width=500 style=\"display:inline-block; margin:0 2px;\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-8.png\" alt=\"Image 2-2\" width=500 style=\"display:inline-block; margin:0 10px;\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-9.png\" alt=\"Image 2-1\" width=500 style=\"display:inline-block; margin:0 2px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nAt this stage, sufficient knowledge about the dataset has been acquired, enabling us to proceed with the visualization of a heatmap. There is a potential usage of heatmap for dimension reduction to observe the correlations between all features of the dataset.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-10.png\" alt=\"Image 2-1\" width=600 style=\"display:inline-block; margin:0 2px;\"\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nIn the pursuit of feature selection, a comprehensive comparison was conducted among the subsets (mean, SE, and worst). Notably, correlations emerged within these subsets.\nThis process continued iteratively until no fully correlated features, characterized by a correlation coefficient of 1, remained in the heatmap. \n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"./Images/Project2/2-11.png\" alt=\"Image 2-1\" width=600 style=\"display:inline-block; margin:0 2px;\"\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoroush-04%2Fpython-data-visualization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsoroush-04%2Fpython-data-visualization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsoroush-04%2Fpython-data-visualization/lists"}