Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/openbookpublishers/chapter-splitter
Splits PDF books by chapters and write metadata into the generated files.
https://github.com/openbookpublishers/chapter-splitter
Last synced: 26 days ago
JSON representation
Splits PDF books by chapters and write metadata into the generated files.
- Host: GitHub
- URL: https://github.com/openbookpublishers/chapter-splitter
- Owner: OpenBookPublishers
- License: gpl-3.0
- Created: 2019-02-13T16:26:55.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-24T09:44:27.000Z (5 months ago)
- Last Synced: 2024-07-24T11:21:20.825Z (5 months ago)
- Language: Python
- Size: 95.7 KB
- Stars: 7
- Watchers: 5
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# chapter-splitter
*chapter-splitter* is a tool to split PDF books into individual chapters.Chapter data needs to be previously submitted to [Crossref](https://www.crossref.org/) or [Thoth](https://thoth.pub/) so that `chapter-splitter` can query the server and retrieve information such as chapter page ranges, title and author(s) to add to the output PDFs.
# Usage
The help page $ `python3 ./main.py --help` reports:
```
Usage: main.py [OPTIONS] DOIArguments:
DOI [required]Options:
--input-file PATH [default: ./file.pdf]
--output-folder PATH [default: ./output/]
--database TEXT [default: thoth]
--write-urls / --no-write-urls [default: write-urls]
--help Show this message and exit.
```so a running command would look something like this:
$ `python3 ./main.py --input-file my_file.pdf --output-folder ~/output \
--database crossref 10.11647/obp.0309`or querying Thoth:
$ `python3 ./main.py --input-file my_file.pdf --output-folder ~/output \
--database thoth 10.11647/obp.0309``chapter-splitter` would try to append both the front cover of the original PDF and the copyright page to the output files. Page numbers (of these pages in the original document) are defined with the environment variables `COVER_PAGE` and `ENV COPYRIGHT_PAGE` (number, zero based).
$ `COVER_PAGE=0`
$ `COPYRIGHT_PAGE=4`The `--write_urls` option attempts to write the appropriate OBP-specific Landing Page URL and Full Text URL to Thoth for each chapter created. For this, it is necessary to provide Thoth login credentials via the environment variables `THOTH_EMAIL` and `THOTH_PWD`.
$ `[email protected]`
$ `THOTH_PWD=password`
$ `python3 ./main.py --input-file my_file.pdf --output-folder ~/output \
--database thoth --write-urls 10.11647/obp.0309`## Running with docker
Running the command reported above in docker would be:
```
docker run --rm \
-e [email protected] \
-e THOTH_PWD=password \
-v /path/to/local.pdf:/ebook_automation/file.pdf \
-v /path/to/output:/ebook_automation/output \
openbookpublishers/chapter-splitter \
main.py 10.11647/obp.0309
```Alternatively you may clone the repo, build the image using `docker build . -t some/tag` and run the command above replacing `openbookpublishers/chapter-splitter` with `some/tag`.
## Running locally
### Installation
*chapter-splitter* requires **exiftool** to be installed on your system. These tools are available in the official repositories of debian/debian-based distributions.
Run `apt-get install exiftool`.Besides python standard libraries, *chapter-splitter* requires some extra-libraries noted in `requirements.txt`. To install them (within a virtual environment, if you prefer), run `pip3.5 install requirements.txt`.
## Dev
### Git hooks
Use `pre-commit.sh` as a pre commit git hook to build a test image that will run `flake8` to enforce PEP8 style.```
ln -sf ../../pre-commit.sh .git/hooks/pre-commit
```