Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/madhurimarawat/data-visualization-using-python

This repository contains data visualization programs on various datasets done using python.
https://github.com/madhurimarawat/data-visualization-using-python

automobile-dataset bar-plots color-codes data-visualization dimensionality-reduction geopandas geospatial-data house-pricing-dataset jittering left-skwed matplotlib pandas partial-transparency principal-component-analysis python3 right-skwed seaborn skewed-data time-series-analysis trend-line-chart

Last synced: about 2 months ago
JSON representation

This repository contains data visualization programs on various datasets done using python.

Awesome Lists containing this project

README

        

# Data-Visualization-using-python

This repository contains data visualization programs on various datasets done using python.

Data Visualization

![What-is-Data-Visualization-Blog-Header](https://github.com/madhurimarawat/Data-Visualization-using-python/assets/105432776/ea30609d-c156-4c80-b701-05194192e6a6)



--> Data visualization is the graphical representation of information and data in a pictorial or graphical format(Example: charts, graphs, and maps).


--> Data visualization tools provide an accessible way to see and understand trends, patterns in data, and outliers.


--> Data visualization tools and technologies are essential to analyzing massive amounts of information and making data-driven decisions.


--> The concept of using pictures is to understand data that has been used for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, Dashboards.

---

Various forms of Data Visualization


Various forms of Data Visualization

---

# About Python Programming

--> Python is a high-level, general-purpose, and very popular programming language.


--> Python programming language (latest Python 3) is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.


--> Python is available across widely used platforms like Windows, Linux, and macOS.


--> The biggest strength of Python is huge collection of standard library.

---

# Mode of Execution Used Google Colab

--> Colaboratory, or β€œColab” for short, is a product from Google Research which allows anybody to write and execute python code in Jupyter notebook through the browser.


--> Visit colab at:Β  Google Colab


--> Create account using google account.


--> Once account creation is done, we can directly start coding in colab.


--> It supports Python and R.


--> Files are directly saved in Google Drive.

---

## Table Of Contents πŸ“” πŸ”– πŸ“‘

### 1. [House Pricing Dataset - Aesthetics Mapping](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%201%20House%20pricing%20dataset.ipynb)

**Description:** In this experiment, we download the House Pricing dataset from Kaggle and map the values to various aesthetics using visualizations such as color, shape, and size to represent the data features.

### 2. [Rainfall Prediction - Different Color Scales](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%202%20Rainfall%20Prediction%20Color%20Scales.ipynb)

**Description:** This experiment involves using different color scales to visualize the Rainfall Prediction dataset. We explore the impact of various color palettes and their readability in different visual contexts.

### 3. [Bar Plots for Dataset Variables](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%203%20Bar%20plots%20for%20variable.ipynb)

**Description:** We create different bar plots to represent categorical variables from a given dataset, providing insights into the distribution and comparison across categories.

### 4. [Skewed Data - Detection and Removal](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%204%20Skewedness%20and%20Removal%20of%20Skewedness.ipynb)

**Description:** This experiment demonstrates how to identify skewed data, visualize its distribution, and apply transformations to remove skewness for more accurate analysis.

### 5. [Time Series Visualization for Sales Data](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%205%20Time%20Series%20Visualization.ipynb)

**Description:** A time series visualization is performed on a sales dataset, showcasing trends, seasonality, and patterns in the data over time.

### 6. [Scatterplot with Dimension Reduction Suggestions](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%206%20Dimension%20Reduction.ipynb)

**Description:** A scatterplot is created for a dataset, followed by recommendations for dimension reduction techniques such as PCA or t-SNE to simplify the data while preserving key information.

### 7. [Geospatial Data and Projections](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%207%20Geospatial%20Data%20Projections.ipynb)

**Description:** This experiment covers the use of geospatial data and applying various projections to visualize geographical datasets accurately on different types of maps.

### 8. [Trend Line with Confidence Band](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%208%20Trend%20Line%20and%20Confidence%20Band.ipynb)

**Description:** A trend line is plotted with a confidence band to showcase the relationship between variables in a dataset, offering insights into trends and uncertainty around predictions.

### 9. [Partial Transparency and Jittering](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%209%20Partial%20Transparency%20and%20Jittering.ipynb)

**Description:** This experiment illustrates the use of partial transparency and jittering in scatter plots to handle overlapping points and improve clarity in dense data visualizations.

### 10. [Usage of Different Color Codes](https://github.com/madhurimarawat/Data-Visualization-using-python/blob/main/Experiment%2010%20Use%20of%20Color%20Codes.ipynb)

**Description:** The experiment explores how different color codes (RGB, HEX, and named colors) can be applied to enhance data visualizations, improving the visual appeal and understanding of complex datasets.

---

## Various Libraries in Python for Data Visualization

To install python library this command is used-

```
pip install library_name
```

python Library

---

# Dataset Used

Housing Dataset


--> Dataset is taken from: Housing Dataset


--> CSV file which contains house pricing data.


--> Price of house with respect to area and other basic amenties.

Rainfall Prediction Dataset


--> Dataset is taken from: Housing Dataset


--> CSV file which contains the rainfall data.


--> Sub-division wise monthly data for 115 years from 1901-2015.

Buisness Dataset


--> Dataset is taken from: Buisness Dataset


--> Business financial data provides sales, purchases, salaries and wages, and operating profit estimates for most market industries in New Zealand, and information on stocks for selected industries.


--> This collection uses a combination of survey, tax, and other administrative data.

Sales Dataset


--> Dataset is taken from: Sales Dataset


--> CSV file which contains the sales data.

Mineral ores round the world Dataset


--> Dataset is taken from: Minerals Dataset


--> Dataset of minerals found around the world.

Automobile Dataset


--> Dataset is taken from: πŸ”—Automobile Dataset


--> This contains data about various automobile in Comma Separated Value (CSV) format.


--> CSV file contains the details of automobile-mileage,length,body-style among other attributes.


--> It contains the following dimensions-[60 rows X 6 columns].


--> The csv file is already preprocessed ,thus their is no need for data cleaning.

NBA Players Dataset


--> Dataset is taken from: πŸ”—NBA Dataset


--> This contains data about various NBA Players in Comma Separated Value (CSV) format.


--> CSV file contains the details of players-height,weight,team,position among other attributes.


--> It contains the following dimensions-[457 rows X 9 columns].


--> The csv file is already preprocessed ,thus their is no need for data cleaning.

---

Libraries Used


Short Description about all libraries used.



  • NumPy (Numerical Python) – Enables with collection of mathematical functions
    to operate on array and matrices.

  • Pandas (Panel Data/ Python Data Analysis) - This library is mostly used for analyzing,
    cleaning, exploring, and manipulating data.

  • Matplotlib - It is a data visualization and graphical plotting library.

  • Seaborn - It is an extension of Matplotlib library used to create more attractive and
    informative statistical graphics.

  • SciPy (Scientific Python) - used for scientific computation. SciPy contains modules for optimization, linear algebra, integration, interpolation, special
    functions, FFT, signal and image processing

  • Scikit-learn - It is a machine learning library that enables tools for used for many other
    machine learning algorithms such as classification, prediction, etc.

  • Geopandas-GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data.

---

## Thanks for Visiting πŸ˜„

Drop a 🌟 if you find this repository useful.


If you have any doubts or suggestions, feel free to reach me.


πŸ“« How to reach me: Β  [![Linkedin Badge](https://img.shields.io/badge/-madhurima-blue?style=flat&logo=Linkedin&logoColor=white)](https://www.linkedin.com/in/madhurima-rawat/) Β  Β 
Mail IllustrationπŸ“«