Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kaaass/pubmedtoolkit

A bundle of scripts searching, downloading pdf, and analyzing from Pubmed.
https://github.com/kaaass/pubmedtoolkit

Last synced: 5 days ago
JSON representation

A bundle of scripts searching, downloading pdf, and analyzing from Pubmed.

Awesome Lists containing this project

README

        

# Pubmed Toolkit
A bundle of python scripts searching, downloading pdf, and analyzing from Pubmed.

## Installation

Before installation, ensure that you had installed the Python 3 and Pip tool.

1. Clone the repository

```bash
git clone https://github.com/kaaass/PubmedToolkit.git
cd PubmedToolkit
```

2. Install the dependency using pip

```bash
pip install -r requirements.txt
```

## pubmed_central.py

Download PDF from pubmed central by PMIDs or PMID Source File. See "PMID Source File Schema" for more detail about the source file schema.

- Support resuming from break point
- Support retrying failed tasks
- Support proxy pool against anti-spider

### Usage

```
usage: pubmed_central.py [-h] [-o OUTPUT_DIR] [--resume] [--retry] [--use-proxy]
[PMIDs or PMID source file [PMIDs or PMID source file ...]]

Download PDFs from pubmed central by PMIDs

positional arguments:
PMIDs or PMID source file
PMIDs to download, or filepath of PMID source file.

optional arguments:
-h, --help show this help message and exit
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
output directory
--resume Allow resume from an exist lock file
--retry Retry the tasks in the failed file
--use-proxy Use proxy pool to access Pubmed Central
```

### Examples

1. Download pdf from pmid 29138661, 29123944

```bash
python pubmed_central.py 29138661 29123944
```

2. Download pdf from pmid source file

```bash
python pubmed_central.py data.json
```

3. Resume from an interrupted task

```bash
python pubmed_central.py data.json
```

### PMID Source File Schema

PMID Source File is a JSON file stores an array of objects. This file could be generated by `pubmed_search.py`.

```javascript
[
{
"pmid": 0, // PMID
// Other attributes will be ignored
},
// ...
]
```

For other formats, you might need to edit the function `load_source_file`.

### Todo

- [ ] Support schema: each pmid a line
- [ ] Support schema: bibtex library

## pubmed_search.py

*WARNING: This is an incomplete script, you might need to edit the source code for using it.*

Search entries from pubmed using a given query, saving the information as a JSON file.

### Usage

Change the variable `query` to your favor. The query could be built by https://www.ncbi.nlm.nih.gov/pubmed/advanced.

Change the parameter `max_results` to specify the maximum number of result.

Run `python pubmed_search.py`, the result will be stored in `data.json`.

## [WIP] pubmed_info.py

Download metadata, figures and extract text from PDFs.

## Thanks

1. https://github.com/gijswobben/pymed/
2. https://github.com/zotero/translators