https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning

A semantic browser using deep learning to search in COVID papers
https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning

covid-19 deep-learning nlp python

Last synced: 5 months ago
JSON representation

A semantic browser using deep learning to search in COVID papers

Host: GitHub
URL: https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning
Owner: FrancescoSaverioZuppichini
Created: 2020-04-25T09:05:21.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2024-07-25T10:59:40.000Z (9 months ago)
Last Synced: 2024-11-30T11:50:15.496Z (5 months ago)
Topics: covid-19, deep-learning, nlp, python
Language: Jupyter Notebook
Homepage:
Size: 47.3 MB
Stars: 4
Watchers: 3
Forks: 1
Open Issues: 3
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# COVID Search Papers

![alt](https://github.com/FrancescoSaverioZuppichini/Search-COVID-papers-with-Deep-Learning/blob/develop/medium/images/cl.gif?raw=true)
A semantic browser that uses deep learning to search in more than 50k papers about the recent COVID-19 disease.

It uses a deep learning model from [HuggingFace's `transformers`](https://github.com/huggingface/transformers) to embed each paper in a fixed `[768]` vector. We load all the papers + the embeddings into Elastic Search. Search is performed by computing cosine similarity between a query and all the documents' embedding.

My medium article []

## Getting Started
We assume you have Elastic Search installed and running on your machine. We provided the embeddings and the index file from here (TODO).

### Fill up the database

To recreate the database you have to first install [elasticsearch-dump](https://github.com/taskrabbit/elasticsearch-dump)

Then, download the mapping and the data files from [here](https://drive.google.com/file/d/1ab_1e7lPOjQ4my3ok-7ARvBIwkJyJ8f_/view?usp=sharing) and unzip. Fire up a terminal an `cd` in the unzipped folder, from there run:

```
elasticdump \
--input=./covid_mapping.json \
--output=http://localhost:9200/covid \
--type=mapping
```

and

```
elasticdump \
--input=./covid_data.json \
--output=http://localhost:9200/covid \
--type=data
```

This may take a while.

### Run command line interface
#### Python
Run

```
pip install -r requirements.txt
python main.py
```

#### Docker (suggested)
To create the container run

```
// at root level
docker build -t covid-search .
docker run --net="host" -i covid-search
```

### Dump the database
We dumped the database using [elasticsearch-dump](https://github.com/taskrabbit/elasticsearch-dump) by running

```
elasticdump \
--input=http://localhost:9200/covid \
--output=./covid_mapping.json \
--type=mapping
```

and

```
elasticdump \
--input=http://localhost:9200/covid \
--output=./covid_data.json \
--type=data
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/francescosaveriozuppichini/search-covid-papers-with-deep-learning

Awesome Lists containing this project

README