https://github.com/caltechlibrary/irdm_harvester
Automatically harvest publications for an InvenioRDM repository
https://github.com/caltechlibrary/irdm_harvester
Last synced: about 1 year ago
JSON representation
Automatically harvest publications for an InvenioRDM repository
- Host: GitHub
- URL: https://github.com/caltechlibrary/irdm_harvester
- Owner: caltechlibrary
- License: other
- Created: 2023-04-24T17:04:14.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-04-11T15:18:42.000Z (about 1 year ago)
- Last Synced: 2025-04-13T05:49:48.769Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 738 KB
- Stars: 1
- Watchers: 5
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Support: SUPPORT.md
- Codemeta: codemeta.json
Awesome Lists containing this project
README
# InvenioRDM Harvester
This is a harvester that can automatically collect and submit works to an
InvenioRDM repository. It currently works with the CaltechAUTHORS repository and looks at CrossRef and ORCID.
[](https://choosealicense.com/licenses/bsd-3-clause)
[](https://github.com/irdm_harvester/template/releases)
[](https://data.caltech.edu/records/c14ab-m2d78/latest)
## Table of contents
* [Introduction](#introduction)
* [Installation](#installation)
* [Usage](#usage)
* [Known issues and limitations](#known-issues-and-limitations)
* [Getting help](#getting-help)
* [Contributing](#contributing)
* [License](#license)
* [Authors and history](#authors-and-history)
* [Acknowledgments](#authors-and-acknowledgments)
## Introduction
Currently harvesting:
- CrossRef by ROR
- ORCID
- CrossRef DOIs
## Usage
The harvests are typically run through [GitHub actions](https://github.com/caltechlibrary/irdm_harvester/actions)
but could also be run on the command line.
You need to have a CaltechAUTHORS token available in the environment variable
`RDMTOK`. For a CrossRef ROR harvest type
```bash
python harvest.py crossref
```
You can harvest a specific DOI with
```bash
python harvest.py -doi 10.7717/peerj-cs.1023
```
For an ORCID harvest type:
```bash
python harvest.py orcid -orcid 0000-0001-9266-5146
```
For all harvests there is an `-actor` flag, which gets included in the message when the record is added to the queue.
## Installation
For command line use you need the latest version of `irdmtools` installed:
`curl https://caltechlibrary.github.io/irdmtools/installer.sh | sh`
Then install the python requirements with
`pip install -r requirements.txt`
## Known issues and limitations
While this approach should work for any InvenioRDM repository, it has only been tested on
CaltechAUTHORS. If you're interested in using this with a different repository reach out as we
would be happy to make it a bit more flexible.
Publishers use a wide variety of urls for licenses. We are currently adding
variants to the license.csv file, which is a custom file that connects urls to
the InvenioRDM license names. It is almost certainly incomplete.
## Getting help
Open an issue in the issue tab.
## Contributing
Pull requests are appreciated.
## License
Software produced by the Caltech Library is Copyright © 2022 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the [LICENSE](LICENSE) file for more information.
## Authors and history
GitHub action created by Tom Morrell. Robert Doiel and Tom Morrell wrote
the source irdmtools package.
## Acknowledgments
This work was funded by the California Institute of Technology Library.