https://github.com/zapolnoch/node-tesseract-ocr
A Node.js wrapper for the Tesseract OCR API
https://github.com/zapolnoch/node-tesseract-ocr
image-to-text ocr tesseract text-recognition
Last synced: 5 months ago
JSON representation
A Node.js wrapper for the Tesseract OCR API
- Host: GitHub
- URL: https://github.com/zapolnoch/node-tesseract-ocr
- Owner: zapolnoch
- License: mit
- Created: 2018-10-09T22:20:51.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-07-13T18:17:01.000Z (over 2 years ago)
- Last Synced: 2025-05-16T18:03:26.981Z (5 months ago)
- Topics: image-to-text, ocr, tesseract, text-recognition
- Language: JavaScript
- Size: 516 KB
- Stars: 311
- Watchers: 4
- Forks: 38
- Open Issues: 24
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Tesseract OCR for Node.js
[](https://travis-ci.org/zapolnoch/node-tesseract-ocr)
[](https://www.npmjs.com/package/node-tesseract-ocr)
[](https://www.npmjs.com/package/node-tesseract-ocr)
[](https://snyk.io/test/github/zapolnoch/node-tesseract-ocr)## Installation
First, you need to install the Tesseract project. Instructions for installing Tesseract for all platforms can be found on [the project site](https://github.com/tesseract-ocr/tessdoc/blob/master/Installation.md). On Debian/Ubuntu:
```bash
apt-get install tesseract-ocr
```After you've installed Tesseract, you can go installing the npm-package:
```bash
npm install node-tesseract-ocr
```## Usage
```js
const tesseract = require("node-tesseract-ocr")const config = {
lang: "eng", // default
oem: 3,
psm: 3,
}async function main() {
try {
const text = await tesseract.recognize("image.jpg", config)
console.log("Result:", text)
} catch (error) {
console.log(error.message)
}
}main()
```Also you can pass URL:
```js
const img = "https://tesseract.projectnaptha.com/img/eng_bw.png"
const text = await tesseract.recognize(img)
```or Buffer:
```js
const tesseract = require("node-tesseract-ocr")
const fs = require("fs/promises")async function main() {
const img = await fs.readFile("image.jpg")
const text = await tesseract.recognize(img)console.log("Result:", text)
}
```If you want to process multiple images in a single run, then pass an array:
```js
const images = ["./samples/file1.png", "./samples/file2.png"]
const text = await tesseract.recognize(images)
```In the config object you can pass any [OCR options](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#options). Also you can pass here any [control parameters](https://tesseract-ocr.github.io/tessdoc/tess3/ControlParams) or use ready-made sets of [config files](https://github.com/tesseract-ocr/tesseract/tree/master/tessdata/configs) (like hocr):
```js
await tesseract.recognize("image.jpg", {
load_system_dawg: 0,
tessedit_char_whitelist: "0123456789",
presets: ["tsv"],
})
```## Alternatives
If you want to use Tesseract in the browser, choose [Tesseract.js](https://github.com/naptha/tesseract.js) package, which compiles original Tesseract from C to JavaScript WebAssembly. You can also use it in Node.js, but the performance may not be as good.