{"id":25549523,"url":"https://github.com/sameer6690/data_analytics_bootcamp_hdnb","last_synced_at":"2026-02-14T13:30:13.068Z","repository":{"id":278505565,"uuid":"932093006","full_name":"Sameer6690/Data_Analytics_Bootcamp_HDNB","owner":"Sameer6690","description":"This is an analytics project on the \"Titanic - Machine Learning From Disaster\" dataset's train.csv file. I performed data cleaning with MS Excel before using SQL to query results based on the questions provided for the completion of the project. Finally I visualized the data on Google Looker Studio.","archived":false,"fork":false,"pushed_at":"2025-02-20T05:56:00.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-20T06:28:15.162Z","etag":null,"topics":["bigquery","excel","looker-studio","sql"],"latest_commit_sha":null,"homepage":"https://lookerstudio.google.com/reporting/a5ca317c-23c9-45af-a8e2-2ceb80aaaab2","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Sameer6690.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-13T10:58:06.000Z","updated_at":"2025-02-20T05:56:03.000Z","dependencies_parsed_at":"2025-02-20T06:38:19.642Z","dependency_job_id":null,"html_url":"https://github.com/Sameer6690/Data_Analytics_Bootcamp_HDNB","commit_stats":null,"previous_names":["sameer6690/data_analytics_bootcamp_hdnb"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameer6690%2FData_Analytics_Bootcamp_HDNB","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameer6690%2FData_Analytics_Bootcamp_HDNB/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameer6690%2FData_Analytics_Bootcamp_HDNB/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Sameer6690%2FData_Analytics_Bootcamp_HDNB/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Sameer6690","download_url":"https://codeload.github.com/Sameer6690/Data_Analytics_Bootcamp_HDNB/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239816514,"owners_count":19701753,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigquery","excel","looker-studio","sql"],"created_at":"2025-02-20T10:18:29.132Z","updated_at":"2026-02-14T13:30:12.994Z","avatar_url":"https://github.com/Sameer6690.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Titanic Dataset Analysis and Dashboard Assessment\n\n## Overview\nThis project analyzes the Titanic dataset using **SQL in Google BigQuery** and visualizes key insights through a **dashboard**. The analysis explores survival rates based on different factors such as **passenger class, gender, fare, age, and embarkation port**. The dashboard provides a visual representation of the findings to enhance data-driven storytelling.\n\n## Dataset\nThe dataset used in this project is the **Titanic Dataset (train.csv) from Kaggle**. It contains essential details about the passengers, including their **survival status, age, gender, class, and fare**.\n[Dataset Link](https://www.kaggle.com/competitions/titanic)\n\n### Key Fields:\n- `PassengerId`: Unique identifier for each passenger.\n- `Survived`: 0 = Did not survive, 1 = Survived.\n- `Pclass`: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd).\n- `Name`: Passenger's name.\n- `Sex`: Gender of the passenger.\n- `Age`: Age of the passenger.\n- `SibSp`: Number of siblings and spouses aboard.\n- `Parch`: Number of parents and children aboard.\n- `Ticket`: Ticket number.\n- `Fare`: Fare paid for the ticket.\n- `Cabin`: Cabin number (if available).\n- `Embarked`: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).\n\n---\n\n## Project Workflow\n### 1. Data Preparation\n- **Import Data into BigQuery**: Upload **train.csv** into a BigQuery project.\n- **Data Cleaning**:\n  - Handle missing values in `Age`, `Fare`, and `Embarked` columns.\n  - Ensure correct data types (e.g., `Pclass` and `Survived` as integers, `Fare` and `Age` as numeric values).\n\n### 2. Data Analysis (SQL Queries in BigQuery)\nThe following questions were answered using SQL queries:\n\n1. **Overall Survival Rate**: What percentage of passengers survived?\n2. **Survival by Passenger Class**: What is the survival rate for each class (1st, 2nd, 3rd)?\n3. **Survival by Gender**: How does gender impact survival rates?\n4. **Fare vs. Survival**: What is the average fare paid by survivors vs. non-survivors?\n5. **Age vs. Survival**: What is the average age of survivors and non-survivors?\n6. **Survival by Embarkation Port**: How does the port of embarkation affect survival rates?\n7. **Family Size vs. Survival**: How does family size (sum of `SibSp` and `Parch`) influence survival chances?\n8. **Top 10 Survivors by Fare Paid**: Who were the top 10 passengers who paid the highest fare and survived?\n\n## Technologies Used\n- **Google BigQuery**: SQL-based data analysis.\n- **Data Visualization**: Google Data Studio.\n\n![Titanic Dashboard](./HDNB_Project_Dashboard.png)\n[Dashboard Link](https://lookerstudio.google.com/reporting/a5ca317c-23c9-45af-a8e2-2ceb80aaaab2)\n\n\n## How to Use\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/yourusername/titanic-analysis.git\n   ```\n2. Import `train.csv` into **Google BigQuery**.\n3. Run SQL queries to analyze survival trends.\n4. Use visualization tools to create the dashboard.\n\n## Insights \u0026 Conclusion\nThis analysis provides valuable insights into the factors influencing Titanic passengers' survival. The dashboard effectively visualizes key trends, making data interpretation easier.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsameer6690%2Fdata_analytics_bootcamp_hdnb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsameer6690%2Fdata_analytics_bootcamp_hdnb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsameer6690%2Fdata_analytics_bootcamp_hdnb/lists"}