{"id":47326326,"url":"https://github.com/ubffm/oaipmharvest","last_synced_at":"2026-03-17T19:10:05.486Z","repository":{"id":115184118,"uuid":"473958854","full_name":"ubffm/oaipmharvest","owner":"ubffm","description":"Highly-configurable oai-pmh harvester","archived":false,"fork":false,"pushed_at":"2025-05-28T06:43:29.000Z","size":57,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-06T00:55:51.984Z","etag":null,"topics":["harvester","oai-pmh","oai-pmh-client"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ubffm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-25T10:08:52.000Z","updated_at":"2025-05-28T06:43:33.000Z","dependencies_parsed_at":"2025-03-18T11:43:16.802Z","dependency_job_id":null,"html_url":"https://github.com/ubffm/oaipmharvest","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/ubffm/oaipmharvest","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ubffm%2Foaipmharvest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ubffm%2Foaipmharvest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ubffm%2Foaipmharvest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ubffm%2Foaipmharvest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ubffm","download_url":"https://codeload.github.com/ubffm/oaipmharvest/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ubffm%2Foaipmharvest/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30628862,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-17T17:32:55.572Z","status":"ssl_error","status_checked_at":"2026-03-17T17:32:38.732Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["harvester","oai-pmh","oai-pmh-client"],"created_at":"2026-03-17T19:10:04.506Z","updated_at":"2026-03-17T19:10:05.443Z","avatar_url":"https://github.com/ubffm.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# oaipmharvest\n\n## Description\n\n_oaipmharvest_ is a harvester for [OAI-PMH](https://www.openarchives.org/OAI/openarchivesprotocol.html) written in python\nand based on [sickle](https://sickle.readthedocs.io) (for now). It's special focus lies on support for advanced\nnon-standard use cases and supporting endpoints that behave slightly out of the ordinary. If you just need the standard\nfeature set, you might be better off with something more mature and better tested.\n\n_oaipmharvest_ will connect to a given OAI endpoint and, by default, store its responses in an output folder. It enables you\nto make incremental requests from the given OAI-endpoint or restrict the result set by a given date. In addition to\nthat, it provides several features to dynamically construct set specifiers from smaller parts.\n\n**This is an alpha release. Use with caution.**\n\n## Features\n\n* Configuration via TOML\n* Advanced configuration support for dynamic sets (for e.g. those supported by [BASE](http://oai.base-search.net/))\n\n## Installation\n\nIf you want to use _oaipmharvest_ as a standalone application, installation via [pipx](https://github.com/pypa/pipx) is recommended.\n\n```\npipx install oaipmharvest\n```\n\nInstallation via other package managers is of course possible, too. This is esp. recommended, if _oaipmharvest_ should be used as a library.\n\n```\npip install oaipmharvest\n```\n\n## Running\n\nIn order to run the application after installation, you can call the CLI command `oaipm_harvest`, which also provides a help function\nby calling `oaipm_harvest -h`.\n\n```\nusage: oaipm_harvest [-h] [--from FROM] [--until UNTIL] file\n\npositional arguments:\n  file                  Config file (TOML)\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --from FROM, -f FROM  Harvest only items that where published after the specified date\n  --until UNTIL, -u UNTIL\n                        Harvest only items that where published before the specified date\n```\n\nTo harvest a specific OAI-PMH endpoint, you have to provide a TOML config file. An example config file for the\nmost basic use case could be `conf/my-journal.conf` and would contain, for example:\n\n```\nendpoint_url = \"https://www.contributions-to-entomology.org/oai/\"\nmetadata_prefixes = [\"marcxml\"]\nout_dir = \"./out_cte\"\nuse_sets = false\n```\n\nwhere\n\n**endpoint\\_url** is the OAI-base-URL you want to connect to.\n\n**metadata\\_prefixes** is a list of formats you want to download. The format is simply handed to the OAI-interface and, hence, it depends on the OAI-interface, if it supports the given format or not.\n\n**out\\_dir** is the directory, where all the downloaded data will be stored. If the given folder(s) do not exists, they will be created.\n\n**use\\_sets** false\n\n## Licence\n\nAll parts of this code are copyrighted by the University Library JCS, Frankfurt a. M. The project is made available\nunder the Mozilla Public License 2.0.\n\n## Acknowledgement  \n\nThis is a project originially created by the [Specialised Information Service for Linguistics](https://www.linguistik.de/en/)\nat the [University Library J. C. Senckenberg](https://www.ub.uni-frankfurt.de/) and funded by the German Research Foundation (DFG; project identifier [326024153](https://gepris.dfg.de/gepris/projekt/326024153?language=en)).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fubffm%2Foaipmharvest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fubffm%2Foaipmharvest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fubffm%2Foaipmharvest/lists"}