Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kba/hocr-spec
The hOCR Embedded OCR Workflow and Output Format
https://github.com/kba/hocr-spec
Last synced: about 1 month ago
JSON representation
The hOCR Embedded OCR Workflow and Output Format
- Host: GitHub
- URL: https://github.com/kba/hocr-spec
- Owner: kba
- Created: 2016-03-03T00:13:01.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2024-08-12T09:42:24.000Z (5 months ago)
- Last Synced: 2024-08-12T11:04:33.185Z (5 months ago)
- Language: HTML
- Homepage: http://kba.github.io/hocr-spec/1.2/
- Size: 582 KB
- Stars: 71
- Watchers: 13
- Forks: 20
- Open Issues: 54
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-ocr - hocr-spec - hOCR 1.2 specification (Software / OCR file formats)
README
hocr-spec
=========[![Join the chat at https://gitter.im/kba/hocr-spec](https://badges.gitter.im/kba/hocr-spec.svg)](https://gitter.im/kba/hocr-spec?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
The hOCR Embedded OCR Workflow and Output Format
## About
This repository contains the [hOCR](https://en.wikipedia.org/wiki/HOCR) format
specification originally written by [Thomas Breuel](https://github.com/tmbdev).## Versions
* `1.0` [English](https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview)
* Google Doc the original text by @tmbdev
* Last substantial edit in May 2010
* `1.1` [English](./1.1/spec.md), [中文 (Chinese)](./1.1/spec_zh_CN.md)
* Port of the Google Doc
* Cleaning obvious errata (duplicate content)
* More fine-grained heading structure
* Table of contents
* Chinese translation provided by [@littlePP24](https://github.com/littlePP24) and [@wanghaisheng](https://github.com/wanghaisheng)
* Last substantial edit in September 2016
* `1.2` [English](https://kba.github.io/hocr-spec/1.2/)
* Create a WHATWG-like spec using [bikeshed](https://github.com/tabatkins/bikeshed)
* Add issues where appropriate
* Semantically backwards-compatible with both 1.0 and 1.1## Contribute
There is no formal body. Feel free to use the [Github
issues](https://github.com/kba/hocr-spec/issues) for discussion and questions.
Pull requests are very welcome.For quick questions you can use the [hocr-spec gitter
channel](https://gitter.im/kba/hocr-spec).## Building the spec
To build the spec, you will need to have installed:
* `GNU make`
* One of the following programs installed:
* [bikeshed](https://github.com/tabatkins/bikeshed)
* [docker](https://docker.com)
* Python 3To install the python requirements:
```sh
pip3 install --user -r requirements.txt
```The Makefile will first look for a local bikeshed installation and fallback to docker
to use the [bikeshed docker container](https://hub.docker.com/kbai/bikeshed)
to build the spec.To change the spec, adapt
* `/spec.md` to change the body of the spec
* `/spec.before.html` to change
* the [bikeshed metadata](https://tabatkins.github.io/bikeshed/#metadata)
* the [references to terms from other specs](https://tabatkins.github.io/bikeshed/#custom-dfns)
* `/spec.after.html` to change
* Javascript to run in the generated spec document
* `/defs.yml` to change the definition lists for elements and propertiesThen run `make VERSION=` to build that spec.
Examples:
* To build the `1.2` version: `make VERSION=1.2` or simply `make`
* To build the `1.2-zh` version: `make VERSION=1.2-zh`## Open Tasks
The goal of this project is to make the hOCR specification more accessible and
easier to maintain.* Cross-reference other specs
* Harmonize style
* Add samples
* [...](https://github.com/kba/hocr-spec/issues)