{"id":23473054,"url":"https://github.com/caltechlibrary/epxml_to_datacite","last_synced_at":"2025-04-14T18:42:27.442Z","repository":{"id":111429171,"uuid":"129455716","full_name":"caltechlibrary/epxml_to_datacite","owner":"caltechlibrary","description":"Transform Eprints XML to DataCite XML and mint DOIs in Eprints repositories","archived":false,"fork":false,"pushed_at":"2023-12-14T19:19:53.000Z","size":27869,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-12T03:17:21.076Z","etag":null,"topics":["datacite","datacite-xml","eprints"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/caltechlibrary.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":"codemeta.json","zenodo":null}},"created_at":"2018-04-13T21:34:44.000Z","updated_at":"2024-07-25T00:18:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"56e3da61-e04d-487a-bc30-6767bf6191da","html_url":"https://github.com/caltechlibrary/epxml_to_datacite","commit_stats":null,"previous_names":[],"tags_count":25,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fepxml_to_datacite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fepxml_to_datacite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fepxml_to_datacite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/caltechlibrary%2Fepxml_to_datacite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/caltechlibrary","download_url":"https://codeload.github.com/caltechlibrary/epxml_to_datacite/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248938393,"owners_count":21186397,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["datacite","datacite-xml","eprints"],"created_at":"2024-12-24T17:14:54.986Z","updated_at":"2025-04-14T18:42:27.415Z","avatar_url":"https://github.com/caltechlibrary.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# epxml_to_datacite\n\n[![DOI](https://data.caltech.edu/badge/129455716.svg)](https://data.caltech.edu/badge/latestdoi/129455716)\n\nConvert Eprints XML to DataCite XML and mint DOIs.  Only tested on Caltech repositories.\n\n## Contents\n\n- caltech_thesis - Generate DataCite metadata and DOIs from CaltechTHESIS\n- caltech_authors_tech_report - Generate DataCite metadata and DOIs from\n  CaltechAUTHORS tech reports\n- caltech_authors_to_data - Make DataCite metadata for data files in\n  CaltechAUTHORS\n\n## Setup\n\n### Prerequisites\n\nYou need to have Python 3.7 on your machine\n([Miniconda](https://docs.conda.io/en/latest/miniconda.html) is a great\ninstallation option).  Test whether you have python installed by opening a terminal or\nanaconda prompt window and typing `python -V`, which should print version 3.7\nor greater. It's best to download this software using git.  To install git, type\n`conda install git` in your terminal or anaconda prompt window.  \n\n### Clone epxml_to_datacite\n\nFind where you want the epxml_to_datacite folder to live on your computer in File Explorer or Finder\n(This could be the Desktop or Documents folder, for example).  Type `cd ` \nin anaconda prompt or terminal and drag the location from the file browser into\nthe terminal window.  The path to the location\nwill show up, so your terminal will show a command like \n`cd /Users/tmorrell/Desktop`.  Hit enter.  Then type \n`git clone https://github.com/caltechlibrary/epxml_to_datacite.git`. Once you\nhit enter you'll see an epxml_to_datacite folder.  Type `cd epxml_to_datacite`\n\n### Install\n\nNow that you're in the epxml_to_datacite folder, type `python setup.py install`\nto install dependencies.\n\nIf you're on a Mac, you'll need to authorize the underlying eputil application.\nOpen the `epxml_to_datacite` directory in finder, open the `epxml_support`\ndirectory, and right click on `eputil` and select 'Open'. Agree that you\nauthorize the executible. This is a one-time installation step.\n\nIf you will be minting DOIs, you need to create a file called `pw` using a text\neditor that contains your DataCite password.  The username is hardcoded in the\nscript, since non-Caltech users will have to modify the script to work with\ntheir Eprints installation. If you don't have a text editor on your machine, type\n`conda install -c swc nano`\n\n### Updating\n\nWhen there is a new version of the software, go to the epxml_to_datacite\nfolder in anaconda prompt or terminal and type `git pull`.  You shouldn't need to re-do\nthe installation steps unless there are major updates.\n\n## Options\n\nThere are three different scripts\n\n- `caltech_thesis.py`\n- `caltech_authors_to_data.py` (Prepares metadata from CaltechAUTHORS for submission to CaltechDATA)\n- `caltech_authors_tech_report.py` (Prepares metadata from CaltechAUTHORS tech reports  with `monograph` item type (Report or Paper))\n\nIn this documentation we use `caltech_thesis.py` as the example script, but in most cases you can substitute one of the other sources.\n\n## Basic operation\n\nIf you have Eprints XML files (from thesis.library.caltech.edu/rest/eprint/1234.xml, for example), put them in the epxml_to_datacite folder.  Type\n\n`python caltech_thesis.py`\n\nAnd you'll get '\\_datacite.xml' for each xml file in the folder\n\n## Downloading Eprints XML\n\nYou can use Eprints ids (e.g. 9690) to download Eprints xml files by adding a\n`-ids` option to any command.\n\n`python caltech_thesis.py -ids 9690`\n\nAlternativly, you can provide a tsv file, where the first column is the Eprints\nid using the `-id_file` option\n\n`python caltech_thesis.py -id_file ids.tsv`\n\n## Mint DOIs\n\nYou can also have the script submit the metadata to DataCite and add the DOI to the source repository. Add the `-mint`\noption and if you want to make test DOIs add the `-test` option to the command line.  \n\n`python caltech_thesis.py -mint -ids 9690`\n\n## Custom Prefixes\n\n`caltech_authors_tech_report.py` has support for alternative DOI prefixes. By\nadding the -prefix option you can mint a DOI for any of the DataCite prefixes\ncontrolled by the library.\n\n`python caltech_authors_tech_report.py -prefix 10.26206 -ids 99015`\n\nCustom prefixes can also trigger metadata changes.  For example, the publisher\nfor prefix 10.26206 is the Keck Institute for Space Studies\"\n\n### Advanced\n\nYou can also import the metadata transformation function into another python\nscript by including `from caltech_thesis import epxml_to_datacite` at the top of your new script.\nThen you will be able to call `epxml_to_datacite(eprint)`, where eprint is an\nxml file parsed by something like:\n\n```\ninfile = open('10271.xml',encoding=\"utf8\")\neprint = xmltodict.parse(infile.read())['eprints']['eprint']\ndatacite = epxml_to_datacite(eprint)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaltechlibrary%2Fepxml_to_datacite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcaltechlibrary%2Fepxml_to_datacite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcaltechlibrary%2Fepxml_to_datacite/lists"}