https://github.com/davidar/oai-sync

Simple script for syncing with OAI repositories
https://github.com/davidar/oai-sync

Last synced: about 1 year ago
JSON representation

Simple script for syncing with OAI repositories

Host: GitHub
URL: https://github.com/davidar/oai-sync
Owner: davidar
Created: 2015-08-18T07:53:10.000Z (almost 11 years ago)
Default Branch: master
Last Pushed: 2015-08-25T09:57:24.000Z (almost 11 years ago)
Last Synced: 2025-02-09T23:47:56.401Z (over 1 year ago)
Language: Perl
Size: 125 KB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

This is a quick and dirty script for harvesting metadata from an [OAI-PMH](https://en.wikipedia.org/wiki/Open_Archives_Initiative_Protocol_for_Metadata_Harvesting) enabled repository.

It relies on the `Net::OAI::Harvester` perl module, and is derived from the
`oai-listrecords` script provided in the examples for that library.

Example for harvesting arXiv metadata (honouring flow control directives):

```bash
mkdir arxiv{0,1,2}

# initial sync
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
--dumpDir=./arxiv0/

# resume an interrupted sync with the given token
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
--dumpDir=./arxiv1/ --resumptionToken='XXXXXX|XXXXXX'

# sync any new changes since the date of the last sync
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
--dumpDir=./arxiv2/ --from='2015-03-14'

# split into individual records
for F in ./arxiv*/*.xml; do [ -s $F ] && ./oai-split.sh < $F; done
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/davidar/oai-sync

Awesome Lists containing this project

README