https://github.com/datadavev/txt2pid
Extract a list of identifiers from text
https://github.com/datadavev/txt2pid
Last synced: about 1 year ago
JSON representation
Extract a list of identifiers from text
- Host: GitHub
- URL: https://github.com/datadavev/txt2pid
- Owner: datadavev
- License: agpl-3.0
- Created: 2024-04-01T18:41:00.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-02T13:19:35.000Z (about 2 years ago)
- Last Synced: 2025-02-02T09:27:24.109Z (over 1 year ago)
- Language: Python
- Size: 23.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# txt2pid
Extract a list of identifiers from text.
For example, using the cli to output json-lines:
```
curl -s "https://sis.web.cern.ch/submit-and-publish/persistent-identifiers/pids-for-objects" | \
pandoc -s -r html -t plain | \
python -m txt2pid
{"offset": 14567, "source": "ark:/13030/tf5p30086k", "scheme": "ark", "content": "13030/tf5p30086k"}
{"offset": 15033, "source": "arXiv:1207.7214", "scheme": "arXiv", "content": "1207.7214"}
{"offset": 15929, "source": "10.23731/CYRM-2019-007", "scheme": "doi", "content": "10.23731/CYRM-2019-007"}
{"offset": 15986, "source": "10.7483/OPENDATA.CMS.6O84.WLN8", "scheme": "doi", "content": "10.7483/OPENDATA.CMS.6O84.WLN8"}
{"offset": 16037, "source": "10.5281/zenodo.821635", "scheme": "doi", "content": "10.5281/zenodo.821635"}
{"offset": 16081, "source": "10.1016/j.physletb.2012.08.020", "scheme": "doi", "content": "10.1016/j.physletb.2012.08.020"}
{"offset": 17060, "source": "hdl:2381/12775", "scheme": "hdl", "content": "2381/12775"}
{"offset": 18819, "source": "urn:isbn:0451450523", "scheme": "isbn", "content": "0451450523"}
```