Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/openbookpublishers/cit-ex
A tool to extract citation data from EPUBs and upload it to a metadata repository.
https://github.com/openbookpublishers/cit-ex
Last synced: 26 days ago
JSON representation
A tool to extract citation data from EPUBs and upload it to a metadata repository.
- Host: GitHub
- URL: https://github.com/openbookpublishers/cit-ex
- Owner: OpenBookPublishers
- License: gpl-3.0
- Created: 2022-12-13T12:19:41.000Z (about 2 years ago)
- Default Branch: master
- Last Pushed: 2024-11-11T12:47:53.000Z (about 2 months ago)
- Last Synced: 2024-11-11T13:23:18.423Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 104 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cit-ex
A tool to extract citation data from EPUBs and upload it to a metadata repository._cit-ex_ parses EPUB files looking for all the bibliographic references that match the html class(es) defined at prompt.
These, in turn, get parsed for more granular results and then uploaded to a metadata of choice.As of v.0.0.10, the unstructured citations are parsed to find only DOI data and the only metadata repository supported is [Thoth](https://thoth.pub/)
## Installation
Firstly, you need to initialise a virtual envieronment:
$ `cd path/to/the/cit-ex/folder/`
$ `python3 -m venv .env`
Then, install the required dependencies:
(.env) $ `python3 -m pip install -r requirements.txt`
## Usage example
Given that your epub file is stored at _~/file.epub_ and the bibliographic references are marked with an HTML class _biblio_ in the EPUB:
(.env) $ `python3 cit-ex/main.py ~/file.epub -c biblio --dry-run`
If your references are marked either as _biblio_ or _biblio2_
(.env) $ `python3 cit-ex/main.py ~/file.epub -c biblio biblio2 --dry-run`
### Usage example with Thoth
Make sure your login credentials are stored in the environment variables "THOTH_EMAIL" and "THOTH_PWD".
You also need to know the identifier (either its DOI or UUID) of the work you with to append the citation data to.Given that these pre-requisites are satisfied and your identifier is _10.11647/OBP.0288_, you can run the command:
(.env) $ `python3 cit-ex/main.py ~/file.epub -c biblio biblio2 -i 10.11647/OBP.0288 -r thoth`
## Development setup
On top of the steps listed in "Installation", install the dev dependencies with:
(.env) $ `python -m pip install -r requirements-dev.txt`
## Extra packages
### OBP loader
The file `cit-ex/obp-loader.py` is an OBP-specific wrapper to load chapter-level citations to the repository (Thoth).It relies on each book chapter to report the URL of their HTML edition. This file is downloaded, embedded into an EPUB and finally run through _cit-ex_.
The wrapper runs with:
(.env) $ `python3 obp-loader.py 10.11647/obp.0085`
where "10.11647/obp.0085" is the DOI of the book to be parsed.
#### Run OBP loader with Docker
Clone the repository and build the image with:
$ `docker build . -f Dockerfile-obp-loader -t openbookpublishers/cit-ex-obp-loader`
Deploy a container with:
```
docker run --rm \
-e THOTH_EMAIL=$THOTH_EMAIL \
-e THOTH_PWD=$THOTH_PWD \
openbookpublishers/cit-ex-obp-loader \
obp-loader.py 10.11647/obp.0337```
Where `$THOTH_EMAIL` and `$THOTH_PWD` are your thoth credentials and 10.11647/obp.0337 is the book-level DOI you wish to process.