https://github.com/sachaos/jisui
Convert scanned image PDF file to text annotated PDF file
https://github.com/sachaos/jisui
e-book gcp-cloud-vision ocr ocr-recognition pdf
Last synced: about 1 year ago
JSON representation
Convert scanned image PDF file to text annotated PDF file
- Host: GitHub
- URL: https://github.com/sachaos/jisui
- Owner: sachaos
- License: mit
- Created: 2021-01-31T07:18:43.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-01-31T23:28:45.000Z (over 5 years ago)
- Last Synced: 2025-04-11T00:08:44.473Z (about 1 year ago)
- Topics: e-book, gcp-cloud-vision, ocr, ocr-recognition, pdf
- Language: Go
- Homepage:
- Size: 2.07 MB
- Stars: 29
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Jisui (自炊)
===
This tool is PoC (Proof of Concept).
Jisui is a helper tool to create e-book.
Ordinary the scanned book have not text information, so you cannot search text from the PDF.
Jisui extract texts from a scanned book (PDF) and merge the text to PDF.
This tool is depending on Google Cloud Vision API to extract texts.
So you need GCP account & own project.
[Jisui (自炊)](https://ja.wikipedia.org/wiki/%E8%87%AA%E7%82%8A_(%E9%9B%BB%E5%AD%90%E6%9B%B8%E7%B1%8D)) is Japanese slung which means that scanning a book to make e-book.
## Pre-requirements
* GCS bucket
* GCP credential file
* Font file e.g. https://moji.or.jp/ipafont/
## Install
```
$ go get github.com/sachaos/jisui
```
## Usage
```
$ jisui -bucket [your GCS bucket] -font [Downloaded font] -output result.pdf [scanned PDF file]
```
## Example
You can see example PDF file.
Please download and open it in PDF viewer.
You can recongnize the difference when you search text.
* [Scanned image PDF](./example/scanned.pdf)
* [Processed PDF](./example/result.pdf)
