Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pbinkley/iiif-manifest-enhancer
Scripts to help enhance a plain manifest with an OCR-based table of contents
https://github.com/pbinkley/iiif-manifest-enhancer
iiif
Last synced: about 2 months ago
JSON representation
Scripts to help enhance a plain manifest with an OCR-based table of contents
- Host: GitHub
- URL: https://github.com/pbinkley/iiif-manifest-enhancer
- Owner: pbinkley
- Created: 2019-05-06T21:08:25.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-05-06T21:11:35.000Z (over 5 years ago)
- Last Synced: 2024-10-11T19:51:11.271Z (3 months ago)
- Topics: iiif
- Language: Ruby
- Size: 23.4 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# IIIF Manifest Enhancer
In order to make Stanford presidents' reports and other publications more usable, I want to enhance their manifests with tables of contents.
- Start with a bare-bones manifest that just includes the sequence of pages, labelled from 1 up
- Split into multiple manifests if needed (e.g. for cases where several items were bound together and then scanned as a single sequence)
- Generate a TOC via OCR, with titles and page numbers in yaml, allowing nesting
- e.g.```
Book
- Chapter 1: 1
- Section 1.1: 1
- Section 1.2: 13
```- Have a script that will read in the manifest and the TOC, taking as a parameter the offset of page one (e.g. for an item with pp.i-viii before page 1, pass in 9 as offset of page 1)
- the script generates new labels for all pages representing their proper page number
- the script also generates a TOC sequence and inserts it, so that a proper table of contents will appear in the IIIF client
- out of scope for now:
- anomalies in the page numbering sequence (e.g. unnumbered pages such as plates) - we'll stick to a simple model allowing sequence of roman-numeral-numbered pages followed by a sequence of arabic-numeral-numbered page, both starting at oneI think I can just use a single range structure (i.e., I won't attempt to enumerate all the canvases in a chapter)
```
"structures": [
{
"@id": "http://example.org/iiif/book1/range/r0",
"@type": "sc:Range",
"label": "Table of Contents",
"viewingHint": "top",
"members": [
{
"@id": "http://example.org/iiif/book1/canvas/cover",
"@type": "sc:Canvas",
"label": "Front Cover"
},... more canvases for the start pages of the chapters ...
]
}
]
```Manifest with working table of contents: http://iiif.durham.ac.uk/manifests/cat/hunter/DCL_Hunter_100.json
Note: I'll have to replace uris above the canvas level with my own, so as not to conflict with Stanford's or to appear to attribute things to Stanford that are actually my work. I'll have to add metadata to make the relationship clear.