https://github.com/sachaos/jisui

Convert scanned image PDF file to text annotated PDF file
https://github.com/sachaos/jisui

e-book gcp-cloud-vision ocr ocr-recognition pdf

Last synced: about 1 year ago
JSON representation

Convert scanned image PDF file to text annotated PDF file

Host: GitHub
URL: https://github.com/sachaos/jisui
Owner: sachaos
License: mit
Created: 2021-01-31T07:18:43.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2021-01-31T23:28:45.000Z (over 5 years ago)
Last Synced: 2025-04-11T00:08:44.473Z (about 1 year ago)
Topics: e-book, gcp-cloud-vision, ocr, ocr-recognition, pdf
Language: Go
Homepage:
Size: 2.07 MB
Stars: 29
Watchers: 3
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          Jisui (自炊)

===

This tool is PoC (Proof of Concept).

Jisui is a helper tool to create e-book.  

Ordinary the scanned book have not text information, so you cannot search text from the PDF.  

Jisui extract texts from a scanned book (PDF) and merge the text to PDF.

This tool is depending on Google Cloud Vision API to extract texts.  

So you need GCP account & own project.

[Jisui (自炊)](https://ja.wikipedia.org/wiki/%E8%87%AA%E7%82%8A_(%E9%9B%BB%E5%AD%90%E6%9B%B8%E7%B1%8D)) is Japanese slung which means that scanning a book to make e-book.

## Pre-requirements

* GCS bucket

* GCP credential file

* Font file e.g. https://moji.or.jp/ipafont/

## Install

```

$ go get github.com/sachaos/jisui

```

## Usage

```

$ jisui -bucket [your GCS bucket] -font [Downloaded font] -output result.pdf [scanned PDF file]

```

## Example

You can see example PDF file.

Please download and open it in PDF viewer.

You can recongnize the difference when you search text.

* [Scanned image PDF](./example/scanned.pdf)

* [Processed PDF](./example/result.pdf)

![image](./image/example.png)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sachaos/jisui

Awesome Lists containing this project

README