Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nathanielng/research_tools
Coding Tools for Scientific Research
https://github.com/nathanielng/research_tools
Last synced: 25 days ago
JSON representation
Coding Tools for Scientific Research
- Host: GitHub
- URL: https://github.com/nathanielng/research_tools
- Owner: nathanielng
- License: mit
- Created: 2019-08-22T12:37:05.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-05-20T12:01:30.000Z (over 4 years ago)
- Last Synced: 2024-11-08T13:12:23.350Z (3 months ago)
- Language: Python
- Homepage: https://nathanielng.github.io/research_tools
- Size: 44.9 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Coding Tools for Scientific Research
## 1. Background
This is a set of coding tools for scientific research. The intention is to have at least three parts:
1. an article lookup / API client
2. a parsing and natural language processing engine
3. a bot linking to Twitter feeds and Discord### 1.1 Article Lookup / API Client
Article lookup will be carried out using an API Client for scientific papers.
Initially, development will be prioritized towards the following1. Open Citations API at [opencitations.net](http://opencitations.net/index/coci/api/v1)
2. [Crossref API](https://www.crossref.org/services/metadata-delivery/rest-api/) via [crossref_commons_py](https://gitlab.com/crossref/crossref_commons_py)Tentative APIs currently under exploration:
1. Elsevier API at [dev.elsevier.com/](http://dev.elsevier.com/)
2. [CORE](https://core.ac.uk/services/api/)
3. [Microsoft Academic Services](https://docs.microsoft.com/en-us/academic-services/)### 1.2 Parsing & Natural Language Processing Engine
For the NLP aspect, initial work will focus on on Materials Science,
via the [mat2vec](https://github.com/materialsintelligence/mat2vec) library
with a general direction to eventually cover at least the following 4 areas:1. Materials Science
2. Engineering
3. Physics
4. ChemistryFor a start, the initial tool is `pdf_extract.py` which extracts the text of a `pdf` file.
The text is parsed for DOI (digital object identifier) data and arXiv IDs.
At a later stage, the text may be used as an input to the parsing & Natural Language Processing (NLP) engine.## 2. Usage
### 2.1 Parser
#### 2.1.1 Extracting a DOI from a file
```bash
python src/pdf_extract.py --file $FILE
```#### 2.1.2 Extracting multiple DOIs from a folder
```bash
python src/pdf_extract.py --path $FOLDER
```### 2.2 Discord Bot
A bot based on some of the tools here will eventually be made available for testing at the following Discord: https://discord.gg/ZPnKCkU