https://github.com/demetersson83/arxiv-parser
A set of scripts for parsing scientific articles from arXiv.
https://github.com/demetersson83/arxiv-parser
arxiv-api metadata-extraction scientific-papers scientific-publications scientific-research
Last synced: 7 months ago
JSON representation
A set of scripts for parsing scientific articles from arXiv.
- Host: GitHub
- URL: https://github.com/demetersson83/arxiv-parser
- Owner: DemetersSon83
- License: mit
- Created: 2021-03-18T19:40:35.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-18T19:48:38.000Z (over 4 years ago)
- Last Synced: 2025-02-27T17:04:25.366Z (7 months ago)
- Topics: arxiv-api, metadata-extraction, scientific-papers, scientific-publications, scientific-research
- Language: Python
- Homepage:
- Size: 4.88 KB
- Stars: 0
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# arXiv Parser
(C) 2021 Mark M. Bailey, PhD
## About
This set of scripts is useful for parsing arXiv using its API. The 'arxiv_scraper.py' script will save atom XML output from the API as a set of JSON files. The 'arxiv_parse.py' script will convert all the json files into one json file with the arxiv query metadata removed. This script is useful for collecting data for meta analysis of large bodies of scientific work.## Future Work
At some point, maybe I will build this into a library.