Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kaaass/pubmedtoolkit
A bundle of scripts searching, downloading pdf, and analyzing from Pubmed.
https://github.com/kaaass/pubmedtoolkit
Last synced: 5 days ago
JSON representation
A bundle of scripts searching, downloading pdf, and analyzing from Pubmed.
- Host: GitHub
- URL: https://github.com/kaaass/pubmedtoolkit
- Owner: kaaass
- License: mit
- Created: 2020-05-09T05:58:08.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-07-06T20:39:22.000Z (over 2 years ago)
- Last Synced: 2023-03-02T22:41:43.679Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 32.2 KB
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pubmed Toolkit
A bundle of python scripts searching, downloading pdf, and analyzing from Pubmed.## Installation
Before installation, ensure that you had installed the Python 3 and Pip tool.
1. Clone the repository
```bash
git clone https://github.com/kaaass/PubmedToolkit.git
cd PubmedToolkit
```2. Install the dependency using pip
```bash
pip install -r requirements.txt
```## pubmed_central.py
Download PDF from pubmed central by PMIDs or PMID Source File. See "PMID Source File Schema" for more detail about the source file schema.
- Support resuming from break point
- Support retrying failed tasks
- Support proxy pool against anti-spider### Usage
```
usage: pubmed_central.py [-h] [-o OUTPUT_DIR] [--resume] [--retry] [--use-proxy]
[PMIDs or PMID source file [PMIDs or PMID source file ...]]Download PDFs from pubmed central by PMIDs
positional arguments:
PMIDs or PMID source file
PMIDs to download, or filepath of PMID source file.optional arguments:
-h, --help show this help message and exit
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
output directory
--resume Allow resume from an exist lock file
--retry Retry the tasks in the failed file
--use-proxy Use proxy pool to access Pubmed Central
```### Examples
1. Download pdf from pmid 29138661, 29123944
```bash
python pubmed_central.py 29138661 29123944
```2. Download pdf from pmid source file
```bash
python pubmed_central.py data.json
```3. Resume from an interrupted task
```bash
python pubmed_central.py data.json
```### PMID Source File Schema
PMID Source File is a JSON file stores an array of objects. This file could be generated by `pubmed_search.py`.
```javascript
[
{
"pmid": 0, // PMID
// Other attributes will be ignored
},
// ...
]
```For other formats, you might need to edit the function `load_source_file`.
### Todo
- [ ] Support schema: each pmid a line
- [ ] Support schema: bibtex library## pubmed_search.py
*WARNING: This is an incomplete script, you might need to edit the source code for using it.*
Search entries from pubmed using a given query, saving the information as a JSON file.
### Usage
Change the variable `query` to your favor. The query could be built by https://www.ncbi.nlm.nih.gov/pubmed/advanced.
Change the parameter `max_results` to specify the maximum number of result.
Run `python pubmed_search.py`, the result will be stored in `data.json`.
## [WIP] pubmed_info.py
Download metadata, figures and extract text from PDFs.
## Thanks
1. https://github.com/gijswobben/pymed/
2. https://github.com/zotero/translators