{"id":29311499,"url":"https://github.com/lashawnfofung/super-heroes-analysis-project","last_synced_at":"2025-07-07T08:15:04.717Z","repository":{"id":302596657,"uuid":"1012773829","full_name":"LashawnFofung/Super-Heroes-Analysis-Project","owner":"LashawnFofung","description":"This portfolio project involves a detailed analysis of 732 superhero records from the heroes_information.csv dataset, comprising 11 columns of unique characteristics for each hero. The primary goal is to showcase key insights derived from this rich dataset, demonstrating proficiency in data analysis using SQL. ","archived":false,"fork":false,"pushed_at":"2025-07-03T07:20:04.000Z","size":707,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-03T08:30:42.718Z","etag":null,"topics":["data-analysis","datasets","mysql-database","mysql-server","mysql-workbench","sql"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LashawnFofung.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-02T21:30:46.000Z","updated_at":"2025-07-03T07:20:08.000Z","dependencies_parsed_at":"2025-07-03T08:32:14.494Z","dependency_job_id":"4b44f78a-e8b3-4024-9294-2f880c514da0","html_url":"https://github.com/LashawnFofung/Super-Heroes-Analysis-Project","commit_stats":null,"previous_names":["lashawnfofung/super-heroes-analysis-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LashawnFofung/Super-Heroes-Analysis-Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LashawnFofung%2FSuper-Heroes-Analysis-Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LashawnFofung%2FSuper-Heroes-Analysis-Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LashawnFofung%2FSuper-Heroes-Analysis-Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LashawnFofung%2FSuper-Heroes-Analysis-Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LashawnFofung","download_url":"https://codeload.github.com/LashawnFofung/Super-Heroes-Analysis-Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LashawnFofung%2FSuper-Heroes-Analysis-Project/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264040975,"owners_count":23548077,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-analysis","datasets","mysql-database","mysql-server","mysql-workbench","sql"],"created_at":"2025-07-07T08:15:02.936Z","updated_at":"2025-07-07T08:15:04.706Z","avatar_url":"https://github.com/LashawnFofung.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003ch1\u003eSuperhereos Analysis Project\u003c/h1\u003e\n\u003ch1\u003e\u003c/h1\u003e\n\n\u003ch2\u003eTable of Contents\u003c/h2\u003e\n  \n   - [\u003cb\u003eOverview\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#overview)\n   - [\u003cb\u003eDataset\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#dataset)\n   - [\u003cb\u003eTask Performed\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#task-perfomed)\n   - [\u003cb\u003eTech Stack\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#tech-stack)\n   - [\u003cb\u003eAbout SQL\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#about-sql)\n   - [\u003cb\u003eData Source\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#data-source)\n   - [\u003cb\u003eTableau Public: Superheroes Analysis Project\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/README.md#tableau-public-dashboard)\n\n\n\u003ch1\u003e\u003c/h1\u003e\n\n\n\u003ch2\u003eOverview\u003c/h2\u003e\n\nAs an enthusiast of superheroes, particularly those from Marvel and DC Comics, I found the `heroes_information.csv` dataset to be an incredibly exciting resource for exploration. This project delves into the characteristics of various comic book heroes, aiming to uncover fascinating insights into their attributes and distributions. It served as an excellent opportunity to refresh and apply my SQL skills to a dataset I genuinely enjoyed analyzing. The analysis was conducted in July 2025.\n\nThis portfolio project involves a detailed analysis of 732 superhero records from the heroes_information.csv dataset, comprising 11 columns of unique characteristics for each hero. The primary goal is to showcase key insights derived from this rich dataset, demonstrating proficiency in data analysis using SQL.\n\nLeveraging MySQL and MySQL Workbench, this project undertakes comprehensive exploratory data analysis (EDA) to uncover various patterns and distributions within the superhero universe. The analysis includes examining the distribution of heroes by publisher, gender, and alignment, calculating average physical attributes, and performing more advanced inquiries such as understanding hero composition by race within publishers and identifying publishers with diverse character alignments.\n\n\u003ch1\u003e\u003c/h1\u003e\n\n\u003ch2\u003eDataset\u003c/h2\u003e\n\n\u003cb\u003e`Size`:\u003c/b\u003e 49.2KB\n\nThis project utilizes the `heroes_information.csv` dataset, which contains detailed information about various comic book heroes. Each row in the dataset represents a unique hero, and the columns provide different attributes about them. The dataset has 732 rows (representing a unique hero) and 11 columns (providing attributes) .\n\nKey features and columns of the dataset include:\n\n  -  `name`: The name of the hero.\n  \n  -  `Gender`: The gender of the hero.\n  \n  -  `Eye color`: The eye color of the hero.\n  \n  -  `Race`: The race or species of the hero.\n  \n  -  `Hair color`: The hair color of the hero.\n  \n  -  `Height`: The height of the hero (note: '-99.0' typically indicates missing or unknown values).\n  \n  -  `Publisher`: The comic book publisher associated with the hero (e.g., Marvel Comics, DC Comics).\n  \n  -  `Skin color`: The skin color of the hero.\n  \n  -  `Alignment`: The moral alignment of the hero (e.g., 'Good', 'Bad', 'Neutral').\n  \n  -  `Weight`: The weight of the hero (note: '-99.0' typically indicates missing or unknown values).\n\nThis dataset is ideal for performing various analytical tasks related to character demographics, publisher distribution, and other attributes within the superhero universe.\n\n\n\u003ch1\u003e\u003c/h1\u003e\n\n\n\u003ch2\u003eTask Perfomed\u003c/h2\u003e\n\nThis project involved performing comprehensive exploratory data analysis (EDA) on the `heroes_information.csv` dataset using SQL. The primary objective was to gain insights into various characteristics of the comic book heroes, their publishers, and their attributes.\n\nKey analytical tasks performed include:\n\n  - \u003cb\u003eDemographic Analysis:\u003c/b\u003e Investigating the distribution of heroes across different publishers, genders, and alignments (Good, Bad, Neutral).\n\n  - \u003cb\u003ePhysical Attribute Analysis:\u003c/b\u003e Calculating the average height and weight of heroes, with careful handling of missing data represented by '-99.0'.\n\n  - \u003cb\u003eGrouped Physical Attributes:\u003c/b\u003e Determining the average height and weight broken down by both gender and alignment to understand trends within specific hero categories.\n\n  - \u003cb\u003ePublisher and Race Composition:\u003c/b\u003e Analyzing the composition of heroes by race for each publisher to understand character diversity within different comic universes.\n\n  - \u003cb\u003eAlignment Diversity per Publisher:\u003c/b\u003e Identifying and querying publishers that feature heroes spanning multiple distinct alignments, highlighting publishers with diverse character moral landscapes.\n\nAll analyses were conducted using standard SQL queries, leveraging clauses such as `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `ORDER BY`, `COUNT`, `AVG`, `DISTINCT`, and `HAVING` to extract meaningful patterns and statistics from the dataset.\n\n\u003ci\u003eReview the Data Exploration SQL Scripts:\u003c/i\u003e \n  - `Basic`: [\u003cb\u003eHERE\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/SQL%20Queries/Basic%20Queries.sql)\n  - `Advanced`: [\u003cb\u003eHERE\u003c/b\u003e](https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/SQL%20Queries/Advanced%20Queries.sql)\n\n\u003ch1\u003e\u003c/h1\u003e\n\n\u003ch2\u003eTech Stack\u003c/h2\u003e\n \n  - \u003cb\u003eMySQL\u003c/b\u003e\n    - This is an open-source relational database management system (RDBMS). It's widely used for web applications and is known for its speed, reliability, and ease of use. It stores data in tables with rows and columns.\n  - \u003cb\u003eMySQL Workbench\u003c/b\u003e\n    - This is a visual database design tool that integrates SQL development, administration, database design, creation, and maintenance for the MySQL database system. It provides a graphical interface for users to interact with their MySQL databases.\n  - \u003cb\u003eTableau\u003c/b\u003e\n    - Tableau is a leading business intelligence (BI) and data visualization software. Its primary purpose is to help people and organizations \"see and understand their data.\n    \n\n\u003ch1\u003e\u003c/h1\u003e\n\n\u003ch2\u003eAbout SQL\u003c/h2\u003e\n\nSQL is a standard programming language used to manage relational databases and perform various operations on the data within them. Key capabilities of SQL include:\n\n  - \u003cb\u003eQuerying Data:\u003c/b\u003e Retrieving specific information from a database.\n\n  - \u003cb\u003eData Manipulation:\u003c/b\u003e Inserting new data, updating existing data, and deleting data.\n\n  - \u003cb\u003eData Definition:\u003c/b\u003e Creating, modifying, and deleting database schemas (tables, indexes, views, etc.).\n\n  - \u003cb\u003eData Control:\u003c/b\u003e Managing database access permissions.\n\n\u003ch1\u003e\u003c/h1\u003e\n\n\u003ch2\u003eTableau Public Dashboard\u003c/h2\u003e\n\nI created a Tableau Public dashboard to analyze superhero data, focusing on publisher distribution, and average hero height and weight by gender and alignment. My process involved cleaning the dataset by handling missing values and standardizing text, then building individual visualizations in Tableau Desktop to answer specific questions like \"Hero Count by Publisher\" and \"Average Hero Physique (by Gender \u0026 Alignment)\". Finally, I designed an interactive dashboard, arranging charts logically and adding filters for user exploration, before publishing it to Tableau Public for wider access. Tableau is a powerful data visualization tool that allows users to connect to various data sources, create interactive dashboards and worksheets, and share their insights visually, without requiring extensive coding knowledge.\n\nTo view the interactive dashboard: [\u003cb\u003eTableau Public: Superheroes Analysis Project\u003c/b\u003e](https://public.tableau.com/views/SuperheroesAnalysisProject/Dashboard1?:language=en-US\u0026:sid=\u0026:redirect=auth\u0026:display_count=n\u0026:origin=viz_share_link)\n\n\u003cbr\u003e\n\n\u003cimg src=\"https://github.com/LashawnFofung/Super-Heroes-Analysis-Project/blob/main/Images/Superheroes%20Analysis%20Dashboard%20screenshot.png\" width=\"550\" alt=\"Tableau Public Dashboard Superhero Analysis\"\u003e\n\n\u003ch1\u003e\u003c/h1\u003e\n\n\u003ch2\u003eData Source\u003c/h2\u003e\n\nThe `heroes_information.csv` dataset utilized in this project is part of a larger Super Heroes Dataset originally sourced from Kaggle:\n\n\u003cbr\u003e\n\n\u003cb\u003eKaggle Dataset Link:\u003c/b\u003e https://www.kaggle.com/datasets/claudiodavi/superhero-set\n\n\u003cbr\u003e\n\n\u003cb\u003eAbout the Dataset's Context:\u003c/b\u003e\nThis dataset was compiled to provide an overview of comic book heroes, focusing on their physical and power characteristics. It aims to help researchers and enthusiasts identify trends and patterns, particularly in the context of increasing diversity within popular culture's superheroes.\n\n\u003cbr\u003e\n\n\u003cb\u003eContent Note:\u003c/b\u003e\nThe data was collected in June 2017 from SuperHeroDb and has not been updated since, meaning it might not reflect the most current information. The Kaggle dataset originally contains two files; this project specifically uses the one detailing hero characteristics (e.g., gender, eye color, height, publisher, alignment, weight).\n\n\u003cbr\u003e\n\n\u003cb\u003eAcknowledgements:\u003c/b\u003e\n\n-  The original data was scraped from SuperHeroDb.\n-  The analysis of the dataset was conducted in July 2025.\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flashawnfofung%2Fsuper-heroes-analysis-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flashawnfofung%2Fsuper-heroes-analysis-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flashawnfofung%2Fsuper-heroes-analysis-project/lists"}