Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/abhihirekhan/abhi-s-data-science-portfolio

A list of data science projects completed by me for academic, self learning, and creative purposes
https://github.com/abhihirekhan/abhi-s-data-science-portfolio

datascience ipynb-jupyter-notebook jupyter jupyter-kernels jupyter-notebook keras-tensorflow machine-learning numpy pandas portfolio python tensorflow

Last synced: about 2 months ago
JSON representation

A list of data science projects completed by me for academic, self learning, and creative purposes

Awesome Lists containing this project

README

        

# Abhi-s-Data-Science-Portfolio
A list of data science projects completed by me for academic, self learning, and creative purposes
Presented in the form of iPython Notebooks
To know more check out my [Medium profile](https://medium.com/@abhijeet.herokhan/abhis-data-science-portfolio-2e6014635e08)
The projects are either written in Python (Jupyter Notebook) or (kaggle kernel)
If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at [email protected]

Tools

Python: NumPy, Pandas, Seaborn, Matplotlib,Plotly ;
Machine Learning: scikit-learn,scipy, TensorFlow, keras

Please contact me on [Linkedin](https://in.linkedin.com/in/abhijeet-hirekhan-15903a132) or [Facebook](https://www.facebook.com/abhijeet.hirekhan) or [Twitter](https://twitter.com/kidwiththeheat) if you are looking to hire a data scientist.

## Projects :

### [1st Kernal EDA on Census of India 2011](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/1st%20Kernal%20EDA%20on%20%20Census%20of%20India%202011)
* Census of India is a rich database which can tell stories of over a billion Indians. It is important not only for research point of view, but commercially as well for the organizations that want to understand India's complex yet strongly knitted heterogeneity.
* 2011 India census dataIncludes population/demographic data and housing data for each district.
* This is an EDA on Indian Census
* **Keywords**(Python, indian census,census of india 2011, india ,EDA)

---

### [A geographical analysis on China's Wuhan Coronavirus](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/A%20geographical%20analysis%20on%20China's%20Wuhan%20Coronavirus)

* 2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC.
*Wuhan Coronavirus : A geographical analysis
With the news coming in that the World Health Organization has declared the novel coronavirus outbreak a public health emergency, it has increased the general fear among the public. A lot of countires have heightened their measures to fight with this virus with the condition in China still senitive..More than 20 countries and territories outside of mainland China have confirmed cases of the virus -- spanning Asia, Europe, North America and the Middle East -- as India, Italy and the Philippines reported their first cases on Thursday
* This specific kernel deals with how it has spread from wuhan acroos the world
* **Keywords**(Python, China,wuhan,covid-19,)

---

### [Airbnb Data - New York City Complete Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Airbnb%20Data%20-%20New%20York%20City%20Complete%20Analysis)

*Since 2008, guests and hosts have used Airbnb to expand on traveling possibilities and present more unique, personalized way of experiencing the world. This dataset describes the listing activity and metrics in NYC, NY for 2019.
*This EDA includes all needed information to find out more about hosts, geographical availability, necessary metrics to make predictions and draw conclusions.
* **Keywords**(Python, Analysis of Airbnb , New York City)

---

### [Analysis Coronavirus](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Analysis%20Coronavirus)

*Coronaviruses (CoV) are a large family of viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans.
*This EDA deals with visulizations and additional information on the novel corona virus
* **Keywords**(Python,analysis,covid-19,EDA)

---

### [Analysis of Rainfall in India](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Analysis%20of%20Rainfall%20in%20%20India)

* This EDA contains analysis from data set that contains monthly rainfall detail of 36 meteorological sub-divisions of India.
* **Keywords**(Python, numpy,pandas,matplotlib.pyplot)


---

### [Analysis on NSE India's stocks (Indices)](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Analysis%20on%20NSE%20India's%20stocks%20(Indices))

* The NIFTY 50 index is National Stock Exchange of India's benchmark stock market index for Indian equity market. It is a well diversified 50 stock index accounting for 22 sectors of the economy. It is used for a variety of purposes such as bench-marking fund portfolios, index based derivatives and index funds.
*EDA From the data frame with 8 variables: index, date, time, open, high, low, close and id. For each year from 2013 to 2016, the number of trading data of each minute of given each date. The currency of the price is Indian Rupee (INR).
* **Keywords**(Python,pandas,seaborn,squarify)

---

### [Analysis onCorona Cases in India](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Analysis%20onCorona%20Cases%20in%20India)

* The first COVID-19 case was reported on 30 January in a student who arrived in Kerala state from Wuhan. Then 2 more cases were reported in the next 2 days in Kerala again. For almost a month, no new cases were reported in India, however, on 8th March, five new cases of coronavirus in Kerala were again reported and since then the cases have been rising affecting 14 states. Here is the Data of of Covid-19 patients in India up till 31st March 2020
* EDA On current situation in indian peninsula.
* **Keywords**(Python, numpy, matplotlib.pyplot,pandas,seaborn)

---

### [Candidates info- 2019 elections EDA](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Candidates%20info-%202019%20elections%20EDA)

* Analysis on number of Indian Candidates for General Election 2019
* With over 600 Million voters voting for 8500+ candidates across 543 constituencies, the general elections in the world's largest democracy are a potential goldmine of data. While there are existing separate datasets about the votes each candidate received and the personal information of each candidate, there was no comprehensive dataset that included both these information
* **Keywords**(Python, elections candidates,india)

---

### [Corona Virus Dataset Exploratory Data Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Corona%20Virus%20Dataset%20Exploratory%20Data%20Analysis)

* A SARS-like virus outbreak originating in Wuhan, China, is spreading into neighboring Asian countries, and as far afield as Australia, the US a and Europe.

On 31 December 2019, the Chinese authorities reported a case of pneumonia with an unknown cause in Wuhan, Hubei province, to the World Health Organisation (WHO)’s China Office. As more and more cases emerged, totaling 44 by 3 January, the country’s National Health Commission isolated the virus causing fever and flu-like symptoms and identified it as a novel coronavirus, now known to the WHO as 2019-nCoV.
* Only 0.4% of patient died but also only 0.7% of patient recovered and still around 98.9% of patient are under isolation.
So, the death probability is low but also recovering from this virus is difficult.
The most of the patient died within a 5 days after confirmation. So, the treatment of corona virus has to be started immediately after confirmation as it's impact is really hazardous.
The patient with age between 35 and 45 years is more likely to get released but this is not true in all cases.
Most of the patient with age greater than 55 couldn't survived from this virus.
*So, even though death percentage is low but recovering from this virus is difficult.
* **Keywords**(Python, COVID-19,EDA)

---

### [Covid simple Exploratory Data Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Covid%20simple%20Exploratory%20Data%20Analysis)

* train.csv - the training data up to Mar 18, 2020.
test.csv - the dates to predict; there is a week of overlap with the training data for the initial Public leaderboard. Once submissions are paused, the Public leaderboard will update based on last 28 days of predicted data.
submission.csv - a sample submission in the correct format; again, predictions should be cumulative
* This evaluation data for this competition comes from John Hopkins CSSE,
* **Keywords**(Python, Simple EDA,COVID-19)

---

### [Data Visualization on Indian Premier League(IPL)](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Data%20Visualization%20on%20Indian%20Premier%20League(IPL))

*The Indian Premier League (IPL) is a professional Twenty20 cricket league in India contested during April and May of every year by teams representing Indian cities and some states. The league was founded by the Board of Control for Cricket in India (BCCI) in 2008. The IPL is the most-attended cricket league in the world and in 2014 ranked sixth by average attendance among all sports leagues. There have been ten seasons of the IPL tournament
*Data

We have 5 different files in the dataset:

1) DIM_PLAYER.csv: Details of all the players who have played in IPL alongwith their country, date of birth, batting/bowling style.

2) DIM_PLAYER_MATCH.csv: Various stats of players like team name, captaincy, keeper etc.

3) DIM_TEAM.csv: IPL team names and ID

4) FACT_BALL_BY_BALL.csv: Ball by ball details

5) DIM_MATCH.csv: Match details
* **Keywords**(Python, IPL,Data Visualization,Indian Premiere League)

---

### [EDA on India's Education](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/EDA%20%20on%20India's%20Education)

* data set merging the census 2011 of Indian Cities with Population more than 1 Lac and City wise number of Graduates from the Census 2011, to create a visualization of where the future cities of India stands today, I will try to add more columns [ fertility rate, religion distribution, health standards, number of schools, Mortality rate ] in the future, hope people will contribute.
*EDA ON THE SAME
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)

---

### [EDA Tutorial on Data Cleaning](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/EDA%20Tutorial%20on%20Data%20Cleaning)

* This data is quite challenging to clean and you may not agree with my approach
* EDA on Data cleaning
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)

---

### [EDA and Data Cleaning for Beginner](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/EDA%20and%20Data%20Cleaning%20for%20Beginner)

* This is the kernel where I will be following along a number of other well-established guides for Exploratory Data Analysis and Data Cleaning as applied to the House Prices: Advanced Regression Techniques dataset
* Learn how to better understand data prior to applying any machine learning models.
*Learn data cleaning best practices.
*Practice documenting my process of EDA and data cleaning.
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot,scipy,scipy.stats,sklearn.linear_model,sklearn.preprocessing,sklearn.preprocessing)

---

### [Exploratory Analysis of India's Air Quality based on Tableau](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Exploratory%20Analysis%20of%20India's%20%20Air%20Quality%20based%20on%20Tableau)

* High So2 and No2 statewise
Uttranchal,Jarkhand and Sikkim states have High So2 Emission
WestBengal,Delhi and Jarkhand states have High No2 Emission
* Monte Carlos simulation used to simulate the winner of the election.
* Compared simulated results with exchange rates fluctuations to see if market is efficient.
*Visualization with the help of Tableau
* **Keywords**(Python, tableau)

---

### [Indian Census 2001 Data Exploration Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Indian%20Census%202001%20Data%20Exploration%20Analysis)

*Indian census data explorations
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)

---

### [Indian Census Data Exploratory Data Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Indian%20Census%20Data%20Exploratory%20Data%20Analysis)

*Exploratory Data analysis on census of india
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)



---

### [Indian Startups Python Explorations](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Indian%20Startups%20Python%20%20Explorations)

*The EDA based on the dataset has funding information of the Indian startups from January 2015 to August 2017. It includes columns with the date funded, the city the startup is based out of, the names of the funders, and the amount invested (in USD).
* Possible questions which could be answered are:
How does the funding ecosystem change with time?
Do cities play a major role in funding?
Which industries are favored by investors for funding?
Who are the important investors in the Indian Ecosystem?
How much funds does startups generally get in India?
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)

---

### [Indians Suicide Report Exploratory Data Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Indians%20Suicide%20Report%20Exploratory%20Data%20Analysis)

*EDA ON THE SUCIDE RATE OF INDIAN POPULATION
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)

---

### [Most followed 100 personalities on Twitter in 2019 Exploratory Data Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Most%20followed%20100%20%20personalities%20on%20Twitter%20in%202019%20%20Exploratory%20Data%20Analysis)

* EDA based on The dataset consists of 100 top followed personalities on Twitter as of Dec'2019.
* The dataset has been scraped from the site called friendorfollow.com, The initial dataset consisted of only the number of followers, people following and tweets. The fields consisting of Natioanalit, Industry, and activity has been added manually.
For the people who have dual citizenship, the country where the person is currently most active in has been included.
*Twitter, founded in 2006 is an online social networking and microblogging service that allows users to post text-based status updates and messages of up to 280 characters in length. These messages are known as tweets. As of the second quarter of 2019, Twitter had 139 million monetizable daily active users (mDAU) worldwide
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot)

---

### [Myers-Briggs 16-Personality Analysis](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Myers-Briggs%2016-Personality%20Analysis)

*The Myers Briggs Type Indicator (or MBTI for short) is a personality type system that divides everyone into 16 distinct personality types across 4 axis:

Introversion (I) – Extroversion (E)
Intuition (N) – Sensing (S)
Thinking (T) – Feeling (F)
Judging (J) – Perceiving (P)
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot,seaborn)


---

### [Super Market Analysis simple EDA](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Super%20Market%20Analysis%20simple%20EDA)

* EDA ON SUPER MARKET
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot,seaborn)

---

### [Twitter Data Analysis of Trump-Hilary for Presidential Election 2016](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/tree/master/Twitter%20Data%20Analysis%20of%20Trump-Hilary%20%20for%20Presidential%20Election%202016)

* EDA and ANALYSIS on TRUMP-HILLARY TWEETS for Presidential Election 2016
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot,seaborn)

---

### [Web Scraping based on Python and BeautifulSoup](https://github.com/abhihirekhan/Abhi-s-Data-Science-Portfolio/blob/master/Web%20Scraping%20based%20on%20Python%20and%20BeautifulSoup.ipynb)

simple web-scrapping with python and beautifulsoup
* **Keywords**(Python,Pandas,Numpy,Matplotlib.pyplot,seaborn)

---