https://github.com/martinus/epuboptim

A highly parallel EPUB compressor
https://github.com/martinus/epuboptim

Last synced: 11 months ago
JSON representation

A highly parallel EPUB compressor

Host: GitHub
URL: https://github.com/martinus/epuboptim
Owner: martinus
License: mit
Created: 2024-06-13T12:49:54.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-06-14T19:28:50.000Z (over 1 year ago)
Last Synced: 2025-02-25T04:11:24.858Z (11 months ago)
Language: Python
Size: 6.84 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# epuboptim - Optimize/compress EPUB files

`epuboptim` is a CLI tool to optimize existing EPUB files. It does so by utilizing 3rd party tools like `jpegoptim`, `pngoptim` etc.

# Why?

EPUBs are often very large, and handheld readers have limited ersources (disk, CPU, RAM). EPUB files are just zip files, so recompressing
them is not too hard.

# Features?

* Configurable recompression tools
* E.g. by mapping file extensions to tools
* Rule based syntax, rules are processed one after the other.
* .jpg, .jpeg => "jpegtran -copy none -optimize -progressive -maxmemory 100M -outfile {{file}}"
* configuration in a JSON file (yes, just plain json...)
* Highly parallel, task based
* A "find" task finds epub files, creates tasks to unzip them
* unzip task uncomresses zip, creates tasks based on the configurable rules (jpg recompressions, PNG, ...)

# How to paralleize?

The goal should be
* Always use all CPU resources
* For single epub files (recompress the individual files in parallel, biggest files first)
* process multiple epub files in parallel (but try to finish one book before working on the next)

How?
* Let's keep it simple. Create a single queue of tasks, probably a `queue.LifoQueue` https://docs.python.org/3/library/queue.html
* Find all epubs, and push a "unzip" task to the queue.
* Workers take an item from the queue, process it, and create one, zero, or multiple work items they push back onto the queue.
* "unzip" task unzips, sorts files by size as a heuristic, then creates "recompress" work items of each file, pushes these to the queue
* "recompress" tasks have a shared atomic counter, so once the last "recompress" task of a book is done, it creates a new task to "rezip" the book.
* "zip" task compresses the minimized files into a new .epub file and replaces the old one.

# commands

```
oxipng -a -o2 -s -Z -t1
```

`cargo install minhtml` https://github.com/wilsonzlin/minify-html

DON'T do it, it makes things kaputt :-(

```
minhtml *.html *.htm *.xml *.css *.xhtml *.ncx
```

```
zip -r -9 -X ../zipnew.epub .
```
`-X` is important when there are a huge number of files, to make it smaller.

* lossless
* jpegtran -copy none -optimize -progressive -maxmemory 100M -outfile
* optipng -fix -clobber -strip all -o{level} -out
* lossy jpegoptim
* pngoptim
* 7z a -mx9 ../y.epub
* oxipng, a newer optipng https://github.com/shssoichiro/oxipng
*

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/martinus/epuboptim

Awesome Lists containing this project

README