Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/macabdul9/flipr-challenge
flipr hackathon challenge
https://github.com/macabdul9/flipr-challenge
boosting-algorithms covid-19 deep-learning forecasting lstm-neural-networks machine-learning neural-network regression-models time-series-analysis
Last synced: about 4 hours ago
JSON representation
flipr hackathon challenge
- Host: GitHub
- URL: https://github.com/macabdul9/flipr-challenge
- Owner: macabdul9
- Created: 2020-03-20T18:04:16.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-03-23T08:06:22.000Z (almost 5 years ago)
- Last Synced: 2024-04-28T04:49:17.542Z (10 months ago)
- Topics: boosting-algorithms, covid-19, deep-learning, forecasting, lstm-neural-networks, machine-learning, neural-network, regression-models, time-series-analysis
- Language: Jupyter Notebook
- Size: 7.38 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# flipr-challenge
flipr hackathon challenge# Repository Structure
- ./data/
- It has the dataset files (I should have ignored it (by mentioning it in .gitignore)
- ./src/
- two notebooks corresponding to each task
- ./predictions/
- three .csv files containing predicted values for Infect_prob, Diuresis value on 27-03-2020 predicted by the time series model and Infect_prob on new Diuresis value(predicted by the Time series model)
- ./assets/
- containing figures plotted in the task- ## Actual Infect_prob vs Predicted Infect_prob (100 samples)
![Actual Infect_prob vs Predicted Infect_prob](assets/actualvspredicted.svg)- ## Diuresis Forecasting
![Diuresis Forecasting for 27-03-2020](assets/TimeSeriesForecasting.png)# Flipr Hackathon Hiring Program 4.
## Module 04: Machine Learning
Coronavirus disease 2019 (COVID- 19 ) is an infectious disease caused by severe acute
respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease was first identified in
2019 in Wuhan, China, and has since spread globally, resulting in the 2019 – 20
coronavirus pandemic. Epidemiologists are teaming up with data scientists to stem the
spread of the novel coronavirus by tapping big data, machine learning and other digital
tools. The goal is to get real-time forecasts and other critical information to front-line
health-care workers and public policy makers as the outbreak unfolds. The objective of
the Hackathon is to predict the probability of person getting infected by Covid-19.
## Background
Coronaviruses are a family of hundreds of viruses that can cause fever, respiratory
problems, and sometimes gastrointestinal symptoms too. The 2019 novel
coronavirus is one of seven members of this family known to infect humans, and
the third in the past three decades to jump from animals to humans. Since emerging
in China in December, this new coronavirus has caused a global health emergency,
sickening almost 200,000 people worldwide, and so far killing more than 9,000. As
of March 19, about 10000 cases had been reported in the US, and 155 people have
died.In Wuhan, home to 11 million people, the initial number of cases was 40,
estimated by a group of researchers led by Natsuko Imai of Imperial College. The
number of exposed was assumed to be 20 times this number. The basicreproduction number (BRN) is the expected number of cases directly generated
by one case. A BRN greater than one indicates that the outbreak is self-sustaining,while a BRN less than one indicates that the number of new cases decreases over
time and eventually the outbreak will stop. Ideally, the BRN should be reduced inorder to slow down an epidemic. The BRN in the first three phases was estimated
to be 3.1, 2.6, and 1.9, respectively. In the _Cell Discovery_ article, the BRN is
assumed to have decreased to 0.9 or 0.5 in phase IV, based on previousexperience in SARS. According to an article in _Science_ in 2003, the BRN of SARS
decreased from 2.7 to 0.25 after the patients were isolated and the infectionstarted being controlled.
The better we can track the virus, the better we can fight it. By analyzing
different parameters responsible for the outbreak of coronavirus, we can take
controlling measures in an accelerated way.## Problem Statement
India has 197 Total cases, out of which there are 4 deaths reported and 173 of
those cases are still active. With a hope of controlling the epidemic, this
machine learning problem is designed to cater the need of a prediction model
that can predict the **probability of a person getting infected by covid- 19**.The whole world is participating in a fight against this pandemic. The
healthcare data science community can have a big impact on combating this
disease. There have been many excellent efforts to use data
visualization and monte carlo simulations to help combat the spread of this
pandemic. The expected prediction model would address a complimentary
and important aspect of health policy, identifying those most at risk. By
combining the efforts of these and many other excellent efforts in the
healthcare technology space, we hope to mitigate the effects of this terrible
disease.Part -01 :
The objective of the first part of the problem statement is to predict the
probability of a person getting infected by Covid- 19 on 20th March 2020. The
output file 01 should contain only people_ID and the respective infect_prob
for the test data.Part -02 :
The Diuresis of a person is a time-dependent parameter, for which you have to
come up with a Time-series prediction model. Using the Diuresis predicted by
the model, you need to calculate the infect_prob on 27th March 2020 for every
people_ID in the test data.. The output file 02 should contain only people_ID
and the respective infect_prob on 27th March.```
There are 3 files provided:
```
**1. Variable_Description.xlsx** :
This file contains description of all the variables available in the dataset
**2. Training_data.xlsx** :
This is the training dataset on which model has to be trained, which contains
parameters of a person on 20th March 2020
**3. Test_data.xlsx** :
This is the test data on which accuracy of the model will be computed## Competition Rules
There should only be **one submission per participant**
Privately sharing of code is not permitted. In case of plagiarism, the
participant shall be disqualified
Those attempting both the parts should send 2 separate .csv/.xlsx file,
containing **people_ID** and **infect_prob** on 20th March and 27th March
respectively The **solution_sheet** should also be attached along with the results
Share all your files in this Google form link:
https://docs.google.com/forms/d/18SkI7vbSc-
dHdlnjLMtYsbZZ4kN_vk5XIFxGEyp2QDc/viewform?edit_requested=true