An open API service indexing awesome lists of open source software.

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.
https://github.com/soham7998/data-analysis-projects

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: about 2 months ago
JSON representation

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

Awesome Lists containing this project

README

          

# Data-Analysis-Projects
# 1) Netflix EDA

![NFLX](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/3b070b75-8717-4133-bed6-fdf62fd6a790)

This dataset & Repository consists of all Netflix original films released as of June 1st, 2021. Additionally, it also includes all Netflix documentaries and specials. The data was webscraped off of this Wikipedia page, which was then integrated with a dataset consisting of all of their corresponding IMDB scores. IMDB scores are voted on by community members, and the majority of the films have 1,000+ reviews.
Dataset consist of:
Title
Genre
Premiere date,IMDB scores
Runtime,Languages

# 2) Football EDA
This repository will be looking at Football doing a range of different activities with football data this will include Exploratory Data Analysis, Data visualization,many other topics. This repository will consist of mainly Jupyter Notebooks and Python programming language.

# 3) Twitter Senitment Analysis
It is a Natural Language Processing Problem where Sentiment Analysis is done by Classifying the Positive tweets from negative tweets by classification, text mining, text analysis, data analysis and data visualization

![1693288099245](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/594eca24-9c82-4a95-8642-c283d2155d71)

# 4) Power BI Dashboard
**Power BI Sales Dashboard for Global Super Store**
• The project involves creating an interactive Power BI Sales Dashboard using Global_super_store sales data.

• The ETL process was performed to clean and transform the data using Power query.

• DAX was used for creating calculated measures and calculated columns.

• Visualizations and reports were created using cards, charts and slicers to provide insights and easy understanding for end users.

• The tools used were Microsoft Power BI and MS Excel.

![Super Sales Dashboard](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/ebb4f8e1-e1d2-4ee8-a70f-7c40c9aa1e49)

# 5) Data Science EDA

**Data Science Job Salaries Dataset contains 11 columns, each are:**

• work_year: The year the salary was paid.

• experience_level: The experience level in the job during the year

• employment_type: The type of employment for the role

• job_title: The role worked in during the year.

• salary: The total gross salary amount paid.

• employee_residence: Employee's primary country of residence in during the work year as an ISO 3166 country code.

• remote_ratio: The overall amount of work done remotely

• company_location: The country of the employer's main office or contracting branch

• company_size: The median number of people that worked for the company during the year

![ds](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/fd6cf30d-5895-441b-b555-a5ab5a4fab3f)

# 6) IPL Data Analysis_Using Apache Spark
Here are the things I have done.

•Basics of Apache Spark (architecture, transformation, action, lazy evaluation)

•Creating a Databricks account and the basics of it

•Structured API and how to write transformation functions

•Using SQL to analyze IPL Data

•Building visualization to gain more insights

The goal of this project is to give you an overall understanding of Apache Spark and its different functions to write transformation blocks on top of that you will learn SQL to analyze data and build visualization.

![Screenshot 2024-05-03 162548](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/74102fef-8da2-48f0-b962-c65a33b4a4af)

# 7) Loan Eligibility Prediction
**Data Loading and Exploration:**
Imported necessary libraries and loaded the dataset from a CSV file.
Explored the dataset with head(), info(), shape, and describe() methods to understand its structure and summary statistics.

**Identified missing values using isnull().sum().**
Filled missing values in categorical columns (e.g., Gender, Married) with the mode, and in numerical columns (e.g., LoanAmount, Loan_Amount_Term) with mean or mode as appropriate.
Feature Engineering:

**Created new features such as TotalIncome by summing ApplicantIncome and CoapplicantIncome.**
Transformed skewed data using logarithmic scaling (LoanAmount_log and TotalIncome_log).

**Data Visualization:**
Used histograms and boxplots to visualize the distribution of ApplicantIncome, CoapplicantIncome, LoanAmount, and their logarithmic transformations.
Examined the relationship between Credit_History and Loan_Status using cross-tabulation.

**Data Preparation:**
Selected relevant features for model training and separated the target variable (Loan_Status).
Split the data into training and testing sets using train_test_split.
Encoded categorical variables into numerical values using LabelEncoder.

**Model Training and Evaluation:**
Applied the Naive Bayes Classifier to train the model on the training set.
Evaluated the model's performance on the test set, likely calculating metrics such as accuracy, precision, recall, and F1-score (though the evaluation part isn't explicitly mentioned in the provided code).

![image](https://github.com/soham7998/Data-Analysis-Projects/assets/112894790/caab9cf9-55ac-4fb1-9283-429d16a06000)