Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/madhurimarawat/data-visualization-using-python
This repository contains data visualization programs on various datasets done using python.
https://github.com/madhurimarawat/data-visualization-using-python
automobile-dataset bar-plots color-codes data-visualization dimensionality-reduction geopandas geospatial-data house-pricing-dataset jittering left-skwed matplotlib pandas partial-transparency principal-component-analysis python3 right-skwed seaborn skewed-data time-series-analysis trend-line-chart
Last synced: about 2 months ago
JSON representation
This repository contains data visualization programs on various datasets done using python.
- Host: GitHub
- URL: https://github.com/madhurimarawat/data-visualization-using-python
- Owner: madhurimarawat
- Created: 2023-07-23T14:16:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-21T06:51:12.000Z (about 1 year ago)
- Last Synced: 2024-01-28T17:40:39.260Z (11 months ago)
- Topics: automobile-dataset, bar-plots, color-codes, data-visualization, dimensionality-reduction, geopandas, geospatial-data, house-pricing-dataset, jittering, left-skwed, matplotlib, pandas, partial-transparency, principal-component-analysis, python3, right-skwed, seaborn, skewed-data, time-series-analysis, trend-line-chart
- Language: Jupyter Notebook
- Homepage:
- Size: 14.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Visualization-using-python
This repository contains data visualization programs on various datasets done using python.
Data Visualization
![What-is-Data-Visualization-Blog-Header](https://github.com/madhurimarawat/Data-Visualization-using-python/assets/105432776/ea30609d-c156-4c80-b701-05194192e6a6)
--> Data visualization is the graphical representation of information and data in a pictorial or graphical format(Example: charts, graphs, and maps).
--> Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers.
--> Data visualization tools and technologies are essential to analyzing massive amounts of information and making data-driven decisions.
--> The concept of using pictures is to understand data that has been used for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.---
Various forms of Data Visualization
---
# About Python Programming
--> Python is a high-level, general-purpose, and very popular programming language.
--> Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.
--> Python is available across widely used platforms like Windows, Linux, and macOS.
--> The biggest strength of Python is huge collection of standard library.---
# Mode of Execution Used
--> Colaboratory, or βColabβ for short, is a product from Google Research which allows anybody to write and execute python code in Jupyter notebook through the browser.
--> Visit colab at:Β
--> Create account using google account.
--> Once account creation is done, we can directly start coding in colab.
--> It supports Python and R.
--> Files are directly saved in Google Drive.---
## Table Of Contents π π π
### 1. [House Pricing Dataset - Aesthetics Mapping](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%201%20House%20pricing%20dataset.ipynb)
**Description:** In this experiment, we download the House Pricing dataset from Kaggle and map the values to various aesthetics using visualizations such as color, shape, and size to represent the data features.
### 2. [Rainfall Prediction - Different Color Scales](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%202%20Rainfall%20Prediction%20Color%20Scales.ipynb)
**Description:** This experiment involves using different color scales to visualize the Rainfall Prediction dataset. We explore the impact of various color palettes and their readability in different visual contexts.
### 3. [Bar Plots for Dataset Variables](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%203%20Bar%20plots%20for%20variable.ipynb)
**Description:** We create different bar plots to represent categorical variables from a given dataset, providing insights into the distribution and comparison across categories.
### 4. [Skewed Data - Detection and Removal](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%204%20Skewedness%20and%20Removal%20of%20Skewedness.ipynb)
**Description:** This experiment demonstrates how to identify skewed data, visualize its distribution, and apply transformations to remove skewness for more accurate analysis.
### 5. [Time Series Visualization for Sales Data](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%205%20Time%20Series%20Visualization.ipynb)
**Description:** A time series visualization is performed on a sales dataset, showcasing trends, seasonality, and patterns in the data over time.
### 6. [Scatterplot with Dimension Reduction Suggestions](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%206%20Dimension%20Reduction.ipynb)
**Description:** A scatterplot is created for a dataset, followed by recommendations for dimension reduction techniques such as PCA or t-SNE to simplify the data while preserving key information.
### 7. [Geospatial Data and Projections](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%207%20Geospatial%20Data%20Projections.ipynb)
**Description:** This experiment covers the use of geospatial data and applying various projections to visualize geographical datasets accurately on different types of maps.
### 8. [Trend Line with Confidence Band](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%208%20Trend%20Line%20and%20Confidence%20Band.ipynb)
**Description:** A trend line is plotted with a confidence band to showcase the relationship between variables in a dataset, offering insights into trends and uncertainty around predictions.
### 9. [Partial Transparency and Jittering](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%209%20Partial%20Transparency%20and%20Jittering.ipynb)
**Description:** This experiment illustrates the use of partial transparency and jittering in scatter plots to handle overlapping points and improve clarity in dense data visualizations.
### 10. [Usage of Different Color Codes](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%2010%20Use%20of%20Color%20Codes.ipynb)
**Description:** The experiment explores how different color codes (RGB, HEX, and named colors) can be applied to enhance data visualizations, improving the visual appeal and understanding of complex datasets.
---
## Various Libraries in Python for Data Visualization
To install python library this command is used-
```
pip install library_name
```---
# Dataset Used
Housing Dataset
--> Dataset is taken from:
--> CSV file which contains house pricing data.
--> Price of house with respect to area and other basic amenties.
Rainfall Prediction Dataset
--> Dataset is taken from:
--> CSV file which contains the rainfall data.
--> Sub-division wise monthly data for 115 years from 1901-2015.
Buisness Dataset
--> Dataset is taken from:
--> Business financial data provides sales, purchases, salaries and wages, and operating profit estimates for most market industries in New Zealand, and information on stocks for selected industries.
--> This collection uses a combination of survey, tax, and other administrative data.
Sales Dataset
--> Dataset is taken from:
--> CSV file which contains the sales data.
Mineral ores round the world Dataset
--> Dataset is taken from:
--> Dataset of minerals found around the world.
Automobile Dataset
--> Dataset is taken from: π
--> This contains data about various automobile in Comma Separated Value (CSV) format.
--> CSV file contains the details of automobile-mileage,length,body-style among other attributes.
--> It contains the following dimensions-[60 rows X 6 columns].
--> The csv file is already preprocessed ,thus their is no need for data cleaning.
NBA Players Dataset
--> Dataset is taken from: π
--> This contains data about various NBA Players in Comma Separated Value (CSV) format.
--> CSV file contains the details of players-height,weight,team,position among other attributes.
--> It contains the following dimensions-[457 rows X 9 columns].
--> The csv file is already preprocessed ,thus their is no need for data cleaning.
---
Libraries Used
Short Description about all libraries used.
- NumPy (Numerical Python) β Enables with collection of mathematical functions
to operate on array and matrices. - Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing,
cleaning, exploring, and manipulating data. - Matplotlib - It is a data visualization and graphical plotting library.
- Seaborn - It is an extension of Matplotlib library used to create more attractive and
informative statistical graphics. - SciPy (Scientific Python) - used for scientific computation. SciPy contains modules for optimization, linear algebra, integration, interpolation, special
functions, FFT, signal and image processing - Scikit-learn - It is a machine learning library that enables tools for used for many other
machine learning algorithms such as classification, prediction, etc. - Geopandas-GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.
---
## Thanks for Visiting π
Drop a π if you find this repository useful.
If you have any doubts or suggestions, feel free to reach me.
π« How to reach me: Β [![Linkedin Badge](https://img.shields.io/badge/-madhurima-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/madhurima-rawat/) Β Β