https://github.com/davidar/oai-sync
Simple script for syncing with OAI repositories
https://github.com/davidar/oai-sync
Last synced: about 1 year ago
JSON representation
Simple script for syncing with OAI repositories
- Host: GitHub
- URL: https://github.com/davidar/oai-sync
- Owner: davidar
- Created: 2015-08-18T07:53:10.000Z (almost 11 years ago)
- Default Branch: master
- Last Pushed: 2015-08-25T09:57:24.000Z (almost 11 years ago)
- Last Synced: 2025-02-09T23:47:56.401Z (over 1 year ago)
- Language: Perl
- Size: 125 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This is a quick and dirty script for harvesting metadata from an [OAI-PMH](https://en.wikipedia.org/wiki/Open_Archives_Initiative_Protocol_for_Metadata_Harvesting) enabled repository.
It relies on the `Net::OAI::Harvester` perl module, and is derived from the
`oai-listrecords` script provided in the examples for that library.
Example for harvesting arXiv metadata (honouring flow control directives):
```bash
mkdir arxiv{0,1,2}
# initial sync
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
--dumpDir=./arxiv0/
# resume an interrupted sync with the given token
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
--dumpDir=./arxiv1/ --resumptionToken='XXXXXX|XXXXXX'
# sync any new changes since the date of the last sync
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
--dumpDir=./arxiv2/ --from='2015-03-14'
# split into individual records
for F in ./arxiv*/*.xml; do [ -s $F ] && ./oai-split.sh < $F; done
```