Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ashishsingh789/data_visualization

Data visualization project using Python to analyze categorical and continuous variables. Includes bar charts, histograms, and scatter plots. Libraries used: pandas, matplotlib, and seaborn.
https://github.com/ashishsingh789/data_visualization

analysis barchart data data-science data-visualization histogram matplotlib pandas-dataframe scatter-plot seaborn

Last synced: 2 days ago
JSON representation

Data visualization project using Python to analyze categorical and continuous variables. Includes bar charts, histograms, and scatter plots. Libraries used: pandas, matplotlib, and seaborn.

Awesome Lists containing this project

README

        

# Data Visualization Task - Prodigy Infotech Internship #

This repository contains the analysis and visualizations created as part of my internship at Prodigy Infotech. The objective was to explore and visualize data to understand the distribution of various categorical and continuous variables.

# Problem Statement

Task:

Create a bar chart or histogram to visualize the distribution of a categorical or continuous variable, such as the distribution of ages or genders in a population.

# Solution Overview

In this project, I worked on analyzing and visualizing the distribution of variables from a given dataset. Multiple types of charts were created to represent these distributions effectively. Below are the key steps involved in the analysis:

# 1. Data Cleaning

Removed missing or null values to ensure a clean dataset for visualization.
Converted data types where necessary (e.g., integers for categorical values such as 'Survived' or 'Gender').
2. Data Analysis
I performed the following types of analyses:

Categorical Data Visualization: Created bar charts to show distributions of variables like gender, survival status, etc.
Continuous Data Visualization: Visualized the distribution of numerical variables like age and fare using histograms and scatter plots.

# 3. Visualizations Created

Bar Chart: Visualization of categorical variables like 'Gender' and 'Survival Status'.
Box Plot: Analysis of continuous data like 'Fare' based on survival rates.
Scatter Plot: To observe relationships between 'Fare' and survival.
Each visualization provides insights into the data patterns and allows for a deeper understanding of the distribution of different variables.

# 4. Tools & Libraries Used

Python: For data manipulation and visualization.
Libraries:
pandas for data handling.
matplotlib and seaborn for creating visualizations.
Files
train.csv: The dataset used for analysis.
visualization_code.py: Python script with all the code used for data analysis and generating visualizations.
fare_vs_survival.png: Image showing the relationship between fare and survival.
Additional visualizations such as bar charts and histograms.

# Data Analysis and Visualisation

![Male vs Female Population Distribution in Rural Areas by State (Ascending Order)](https://github.com/user-attachments/assets/3f1dfb3f-dc56-49f3-8b51-244409d591dc)

![Male vs Female Population Distribution in Urban Areas by State (Ascending Order)](https://github.com/user-attachments/assets/5644abc0-5b5b-490f-a223-4a4dda0157a2)

![Rural Population Distribution by State](https://github.com/user-attachments/assets/eb31eebe-bdfd-47a3-8bb4-1aeb970270dc)

![Total Female Population by State (Rural + Urban](https://github.com/user-attachments/assets/679e4586-503e-4de1-9809-3b9eee1f0472)

![Total Male Population by State (Rural + Urban)](https://github.com/user-attachments/assets/541e6938-b298-4df1-8b63-65a358e2aad9)

![Urban Population Distribution by State](https://github.com/user-attachments/assets/adac0642-33e8-4e24-9c0e-5a658c46b5e8)

![Sex Ratio by Capital City of States](https://github.com/user-attachments/assets/dd8eb763-a010-4a27-9936-de9edc43a42a)