https://github.com/caltechlibrary/inveniordm-migrate
Scripts to migrate content into Invenio RDM
https://github.com/caltechlibrary/inveniordm-migrate
Last synced: about 1 year ago
JSON representation
Scripts to migrate content into Invenio RDM
- Host: GitHub
- URL: https://github.com/caltechlibrary/inveniordm-migrate
- Owner: caltechlibrary
- License: other
- Created: 2020-03-05T18:18:16.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-04-11T22:45:57.000Z (about 3 years ago)
- Last Synced: 2025-04-13T05:50:04.217Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 912 KB
- Stars: 2
- Watchers: 6
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
Assorted scripts to migrate content to InvenioRDM and S3 data sources
=====================================================
This repo holds scripts user to migrate content into InvenioRDM. These have
generally been used for one-time migration activities, but may be useful in the
future.
[](https://choosealicense.com/licenses/bsd-3-clause)
[](https://github.com/caltechlibrary/inveniordm-migrate/releases)
Table of contents
-----------------
* [Usage](#usage)
* [Getting help](#getting-help)
* [License](#license)
* [Authors and history](#authors-and-history)
* [Acknowledgments](#authors-and-acknowledgments)
Usage
-----
### CaltechDATA
`migrate_caltechdata.py` was usilized to move records from the TIND-managed
Invenio instance to InvenioRDM
### CaltechTHESIS
`migrate_caltechthesis.py` was utilized to creats some minimal test records in
InvenioRDM. It is not complete.
### OSN Migration
For large collections of data we sometimes need to move the data first, and
then create InvenioRDM records. An S3 object store like the Open Storage
Network is a great option. You can bulk move records efficiently with
[s5cmd](https://github.com/peak/s5cmd) and the management scripts.
Run `python make_command.py` to generate a list of files to sync. You'll need
to set environment variables with
```
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
S3_ENDPOINT_URL https://renc.osn.xsede.org
AWS_REGION us-east-1
```
Then run the command with
`nohup ./s5cmd -numworkers 100 run commands.txt >> & log2017.txt ; echo Done >> & log2017.txt &`.
You may be able to adjust the numworkers component depending on the OS.
Getting help
------------
Raise an issue on the issue tacker.
License
-------
Software produced by the Caltech Library is Copyright (C) 2023, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the [LICENSE](LICENSE) file for more information.
Authors and history
---------------------------
These scripts were written by Tom Morrell.
Acknowledgments
---------------
This work was funded by the California Institute of Technology Library.