https://github.com/caltechlibrary/epxml_to_datacite
Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories
https://github.com/caltechlibrary/epxml_to_datacite
datacite datacite-xml eprints
Last synced: about 1 year ago
JSON representation
Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories
- Host: GitHub
- URL: https://github.com/caltechlibrary/epxml_to_datacite
- Owner: caltechlibrary
- License: other
- Created: 2018-04-13T21:34:44.000Z (about 8 years ago)
- Default Branch: main
- Last Pushed: 2023-12-14T19:19:53.000Z (over 2 years ago)
- Last Synced: 2025-04-12T03:17:21.076Z (about 1 year ago)
- Topics: datacite, datacite-xml, eprints
- Language: Python
- Homepage:
- Size: 26.6 MB
- Stars: 5
- Watchers: 5
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff
- Codemeta: codemeta.json
Awesome Lists containing this project
README
# epxml_to_datacite
[](https://data.caltech.edu/badge/latestdoi/129455716)
Convert Eprints XML to DataCite XML and mint DOIs. Only tested on Caltech repositories.
## Contents
- caltech_thesis - Generate DataCite metadata and DOIs from CaltechTHESIS
- caltech_authors_tech_report - Generate DataCite metadata and DOIs from
CaltechAUTHORS tech reports
- caltech_authors_to_data - Make DataCite metadata for data files in
CaltechAUTHORS
## Setup
### Prerequisites
You need to have Python 3.7 on your machine
([Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a great
installation option). Test whether you have python installed by opening a terminal or
anaconda prompt window and typing `python -V`, which should print version 3.7
or greater. It's best to download this software using git. To install git, type
`conda install git` in your terminal or anaconda prompt window.
### Clone epxml_to_datacite
Find where you want the epxml_to_datacite folder to live on your computer in File Explorer or Finder
(This could be the Desktop or Documents folder, for example). Type `cd `
in anaconda prompt or terminal and drag the location from the file browser into
the terminal window. The path to the location
will show up, so your terminal will show a command like
`cd /Users/tmorrell/Desktop`. Hit enter. Then type
`git clone https://github.com/caltechlibrary/epxml_to_datacite.git`. Once you
hit enter you'll see an epxml_to_datacite folder. Type `cd epxml_to_datacite`
### Install
Now that you're in the epxml_to_datacite folder, type `python setup.py install`
to install dependencies.
If you're on a Mac, you'll need to authorize the underlying eputil application.
Open the `epxml_to_datacite` directory in finder, open the `epxml_support`
directory, and right click on `eputil` and select 'Open'. Agree that you
authorize the executible. This is a one-time installation step.
If you will be minting DOIs, you need to create a file called `pw` using a text
editor that contains your DataCite password. The username is hardcoded in the
script, since non-Caltech users will have to modify the script to work with
their Eprints installation. If you don't have a text editor on your machine, type
`conda install -c swc nano`
### Updating
When there is a new version of the software, go to the epxml_to_datacite
folder in anaconda prompt or terminal and type `git pull`. You shouldn't need to re-do
the installation steps unless there are major updates.
## Options
There are three different scripts
- `caltech_thesis.py`
- `caltech_authors_to_data.py` (Prepares metadata from CaltechAUTHORS for submission to CaltechDATA)
- `caltech_authors_tech_report.py` (Prepares metadata from CaltechAUTHORS tech reports with `monograph` item type (Report or Paper))
In this documentation we use `caltech_thesis.py` as the example script, but in most cases you can substitute one of the other sources.
## Basic operation
If you have Eprints XML files (from thesis.library.caltech.edu/rest/eprint/1234.xml, for example), put them in the epxml_to_datacite folder. Type
`python caltech_thesis.py`
And you'll get '\_datacite.xml' for each xml file in the folder
## Downloading Eprints XML
You can use Eprints ids (e.g. 9690) to download Eprints xml files by adding a
`-ids` option to any command.
`python caltech_thesis.py -ids 9690`
Alternativly, you can provide a tsv file, where the first column is the Eprints
id using the `-id_file` option
`python caltech_thesis.py -id_file ids.tsv`
## Mint DOIs
You can also have the script submit the metadata to DataCite and add the DOI to the source repository. Add the `-mint`
option and if you want to make test DOIs add the `-test` option to the command line.
`python caltech_thesis.py -mint -ids 9690`
## Custom Prefixes
`caltech_authors_tech_report.py` has support for alternative DOI prefixes. By
adding the -prefix option you can mint a DOI for any of the DataCite prefixes
controlled by the library.
`python caltech_authors_tech_report.py -prefix 10.26206 -ids 99015`
Custom prefixes can also trigger metadata changes. For example, the publisher
for prefix 10.26206 is the Keck Institute for Space Studies"
### Advanced
You can also import the metadata transformation function into another python
script by including `from caltech_thesis import epxml_to_datacite` at the top of your new script.
Then you will be able to call `epxml_to_datacite(eprint)`, where eprint is an
xml file parsed by something like:
```
infile = open('10271.xml',encoding="utf8")
eprint = xmltodict.parse(infile.read())['eprints']['eprint']
datacite = epxml_to_datacite(eprint)
```