Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning
A semantic browser using deep learning to search in COVID papers
https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning
covid-19 deep-learning nlp python
Last synced: about 2 months ago
JSON representation
A semantic browser using deep learning to search in COVID papers
- Host: GitHub
- URL: https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning
- Owner: FrancescoSaverioZuppichini
- Created: 2020-04-25T09:05:21.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-07-25T10:59:40.000Z (6 months ago)
- Last Synced: 2024-11-30T11:50:15.496Z (about 2 months ago)
- Topics: covid-19, deep-learning, nlp, python
- Language: Jupyter Notebook
- Homepage:
- Size: 47.3 MB
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# COVID Search Papers
![alt](https://github.com/FrancescoSaverioZuppichini/Search-COVID-papers-with-Deep-Learning/blob/develop/medium/images/cl.gif?raw=true)
A semantic browser that uses deep learning to search in more than 50k papers about the recent COVID-19 disease.It uses a deep learning model from [HuggingFace's `transformers`](https://github.com/huggingface/transformers) to embed each paper in a fixed `[768]` vector. We load all the papers + the embeddings into Elastic Search. Search is performed by computing cosine similarity between a query and all the documents' embedding.
My medium article []
## Getting Started
We assume you have Elastic Search installed and running on your machine. We provided the embeddings and the index file from here (TODO).### Fill up the database
To recreate the database you have to first install [elasticsearch-dump](https://github.com/taskrabbit/elasticsearch-dump)
Then, download the mapping and the data files from [here](https://drive.google.com/file/d/1ab_1e7lPOjQ4my3ok-7ARvBIwkJyJ8f_/view?usp=sharing) and unzip. Fire up a terminal an `cd` in the unzipped folder, from there run:
```
elasticdump \
--input=./covid_mapping.json \
--output=http://localhost:9200/covid \
--type=mapping
```and
```
elasticdump \
--input=./covid_data.json \
--output=http://localhost:9200/covid \
--type=data
```This may take a while.
### Run command line interface
#### Python
Run```
pip install -r requirements.txt
python main.py
```#### Docker (suggested)
To create the container run```
// at root level
docker build -t covid-search .
docker run --net="host" -i covid-search
```### Dump the database
We dumped the database using [elasticsearch-dump](https://github.com/taskrabbit/elasticsearch-dump) by running```
elasticdump \
--input=http://localhost:9200/covid \
--output=./covid_mapping.json \
--type=mapping
```and
```
elasticdump \
--input=http://localhost:9200/covid \
--output=./covid_data.json \
--type=data
```