https://github.com/openzim/nautilus
Turns a collection of documents into a browsable ZIM file
https://github.com/openzim/nautilus
scraper zim
Last synced: 11 months ago
JSON representation
Turns a collection of documents into a browsable ZIM file
- Host: GitHub
- URL: https://github.com/openzim/nautilus
- Owner: openzim
- License: gpl-3.0
- Created: 2020-01-27T09:56:05.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-05-14T09:38:31.000Z (about 2 years ago)
- Last Synced: 2024-05-14T13:37:24.685Z (about 2 years ago)
- Topics: scraper, zim
- Language: Python
- Size: 280 KB
- Stars: 18
- Watchers: 9
- Forks: 14
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Nautilus
`nautilus` turns a collection of documents into a browsable [ZIM file](https://openzim.org).
[](https://www.codefactor.io/repository/github/openzim/nautilus)
[](https://www.gnu.org/licenses/gpl-3.0)
[](https://codecov.io/gh/openzim/nautilus)
[](https://pypi.org/project/nautiluszim/)
[](https://pypi.org/project/nautiluszim)
[](https://ghcr.io/openzim/nautilus)
It downloads the video (webm or mp4 format – optionally recompress it in lower-quality, smaller size), the thumbnails, the subtitles and the authors' profile pictures ; then, it creates a static HTML files folder of it before creating a ZIM off of it.
# Preparing the archive
To be used with nautilus, your archive should be a ZIP file.
* it doesn't need to be structured, but it can.
* it doesn't need to be compressed. It's usually recommended not to.
* it should contain a `collection.json` file, but it can also be provided separately (see below).
* it should only contain to-be-included files. No filtering is done.
* Audio and video files should be in ogg format with an `.ogg`/`.ogv` extension to be supported on all platforms (`mp3`/`mp4` would work only on platforms with native support).
```
cd content/path
zip -r -0 -T ../content_name.zip *
```
## JSON collection file
Either inside the archive ZIP as `/collection.json` or elsewhere,
specified via `--collection mycollection.json`, you must supply a JSON file describing your content.
The user-interface only gives access to files referenced properly in the collection.
At the moment, the JSON file needs to provide the following fields for each item in an array:
```json
[
{
"title": "...",
"description": "...",
"authors": "...",
"files": ["relative/path/to/file"]
},
{
"title": "...",
"description": "...",
"authors": "...",
"files": [
{
"archive-member": "01 BOOK for printing .pdf", // optional, member name inside archive (same as simpler format)
"url": "http://books.com/310398120.pdf", // optional, has precedence over `archive-member`, url to download file from
"filename": "My book.pdf", // optional, filename to use in ZIM, regardless of original one
}
]
}
]
```
## About page
Either inside the archive ZIP as `/about.html` or elsewhere, specified via `--about myabout.html`,
- You may supply an about page in HTML format. It will be displayed in a modal popup and will be included.
- At its bottom your *secondary-logo* if provided.
* Use only content tags (no `