Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jsmatias/aiod-paper-metadata-extractor
A python service to retrieve metadata extract keywords from scientific papers
https://github.com/jsmatias/aiod-paper-metadata-extractor
pdfparser python
Last synced: 3 days ago
JSON representation
A python service to retrieve metadata extract keywords from scientific papers
- Host: GitHub
- URL: https://github.com/jsmatias/aiod-paper-metadata-extractor
- Owner: jsmatias
- Created: 2023-10-05T10:10:01.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-03-07T10:35:48.000Z (8 months ago)
- Last Synced: 2024-03-07T11:57:11.653Z (8 months ago)
- Topics: pdfparser, python
- Language: Jupyter Notebook
- Homepage:
- Size: 2.82 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AIoD Paper Metadata Extractor
Extracts metadata from PDF files.
1. First it gets the DOI from the text using regex
2. Makes a get request to an API to retrieve the metadata of this specific paper.
3. Cross validates the DOI by matching the retrieved title with the text content of the PDF
4. Tries to extract the key words from different sources in this order:
a. From the PDF metadata
b. From the text itself using a regex pattern