https://github.com/anasaito/um6p-chemistry-kg
Constructing a knowledge graph from chemistry research papers' abstract using NLP AND graphs
https://github.com/anasaito/um6p-chemistry-kg
Last synced: 14 days ago
JSON representation
Constructing a knowledge graph from chemistry research papers' abstract using NLP AND graphs
- Host: GitHub
- URL: https://github.com/anasaito/um6p-chemistry-kg
- Owner: AnasAito
- Created: 2021-07-29T22:08:54.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-07-29T23:00:39.000Z (almost 4 years ago)
- Last Synced: 2025-03-29T22:11:38.033Z (about 1 month ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 1.68 MB
- Stars: 7
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Knowledge graph from Um6p' chemistry papers using NLP and Graphs

## Problem :
Literature reviewing and documentation, since finding good and relatable scientific resources for your research can be challenging and time-consuming.
## Solution :
Build a knowledge graph connecting scientific entities occurring in chemistry papers. This enables users to start from a simple query (a chemical substance name or a laboratory procedure) in order to find relatable papers for their quest.
## How to build it?
The pipeline starts with scientific entities' extraction from chemistry papers abstracts using a fine-tuned NLP model on biomedical corpus maintained by Alan Turing Institute researchers. Those entities are then cross-matched with the UMLS database in order to label them.
Finally, we can build the knowledge graph by connecting keywords that co-occurred in the same paper abstract.
## Try it yourself!
I uploaded commented notebooks detailing the pipeline :
> Chimestry-extractIn this notebook, i use the Scpacy pipeline to extract and label keywords from papers abstracts then save them to a JSON file with the paper id and other metadata
- Data format (You can found it in chimestry_papers.json)
```json
{
"paper_id": {
"year": 2020,
"title": "D",
"paper_type": "Article",
"keywords": [
{
"label": "",
"canonical form": "",
"type": ""
},...
]
},...
}
```
> Chimestry-processThe model used is finetuned on Biomedical data so we might have some false labeled keywords we use manual utils to fix the problem
> Chimestry-graph
In this notebook, we give the code for the knowledge graph creation## Data
The data is extracted from the Web of Science database, the papers used for the graph creation are chemistry papers## Results
- we use Gephy for graph visualition

- the graph can be used eventualy to feed a client app (to be done in the future)
- you can find edges list in the data folder