An open API service indexing awesome lists of open source software.

https://github.com/sachaos/jisui

Convert scanned image PDF file to text annotated PDF file
https://github.com/sachaos/jisui

e-book gcp-cloud-vision ocr ocr-recognition pdf

Last synced: about 1 year ago
JSON representation

Convert scanned image PDF file to text annotated PDF file

Awesome Lists containing this project

README

          

Jisui (自炊)
===

This tool is PoC (Proof of Concept).

Jisui is a helper tool to create e-book.
Ordinary the scanned book have not text information, so you cannot search text from the PDF.
Jisui extract texts from a scanned book (PDF) and merge the text to PDF.

This tool is depending on Google Cloud Vision API to extract texts.
So you need GCP account & own project.

[Jisui (自炊)](https://ja.wikipedia.org/wiki/%E8%87%AA%E7%82%8A_(%E9%9B%BB%E5%AD%90%E6%9B%B8%E7%B1%8D)) is Japanese slung which means that scanning a book to make e-book.

## Pre-requirements

* GCS bucket
* GCP credential file
* Font file e.g. https://moji.or.jp/ipafont/

## Install

```
$ go get github.com/sachaos/jisui
```

## Usage

```
$ jisui -bucket [your GCS bucket] -font [Downloaded font] -output result.pdf [scanned PDF file]
```

## Example

You can see example PDF file.

Please download and open it in PDF viewer.

You can recongnize the difference when you search text.

* [Scanned image PDF](./example/scanned.pdf)
* [Processed PDF](./example/result.pdf)

![image](./image/example.png)