https://github.com/xcrap-cloud/image-text-extractor

Xcrap Image Text Extractor is a package of the Xcrap framework that abstracts the extraction of texts from images using the node-tesseract-ocr library.
https://github.com/xcrap-cloud/image-text-extractor

extractor image javascript nodejs scraping tesseract text typescript web xcrap

Last synced: about 1 year ago
JSON representation

Xcrap Image Text Extractor is a package of the Xcrap framework that abstracts the extraction of texts from images using the node-tesseract-ocr library.

Host: GitHub
URL: https://github.com/xcrap-cloud/image-text-extractor
Owner: Xcrap-Cloud
License: mit
Created: 2025-04-10T16:11:27.000Z (about 1 year ago)
Default Branch: master
Last Pushed: 2025-04-10T16:38:48.000Z (about 1 year ago)
Last Synced: 2025-04-10T18:00:16.316Z (about 1 year ago)
Topics: extractor, image, javascript, nodejs, scraping, tesseract, text, typescript, web, xcrap
Language: TypeScript
Homepage: https://www.npmjs.com/package/@xcrap/image-text-extractor
Size: 83 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # 🕷️ Xcrap Image Text Extractor

**Xcrap Image Text Extractor** is a package of the Xcrap framework that abstracts the extraction of texts from images using the [node-tesseract-ocr](https://www.npmjs.com/package/node-tesseract-ocr) library.

## 📦 Installation

There are no secrets to installing it, just use your preferred dependency manager. Here is an example using NPM:

```cmd

npm i @xcrap/image-text-extractor

```

## 🚀 Usage

**Xcrap Image Text Extractor** provides an *async extractor* that can be used in an HTML parsing model just like any extractor:

```ts

import { extractImageText } from "@xcrap/image-text-extractor"

import { HtmlParsingModel } from "@xcrap/parser"

const parsingModel = new ParsingModel({

	imageTexts: {

		query: "img",

		multiple: true,

		extractor: extractImageText({ lang: "eng" })

	}

})

```

If you want to transform the `src` of the images to resolve relative paths or something like that, pass the `transformSrc` option in the options like this:

```ts

const parsingModel = new ParsingModel({ 

    imageTexts: {

        query: "img",

        multiple: true,

        extractor: extractImageText({

            lang: "eng",

            transformSrc: (originalSrc) => {...}

        })

    }

})

```

> Check out more options at [node-tesseract-ocr](https://www.npmjs.com/package/node-tesseract-ocr).

## 🤝 Contributing

- Want to contribute? Follow these steps:

- Fork the repository.

- Create a new branch (git checkout -b feature-new).

- Commit your changes (git commit -m 'Add new feature').

- Push to the branch (git push origin feature-new).

- Open a Pull Request.

## 📝 License

This project is licensed under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/xcrap-cloud/image-text-extractor

Awesome Lists containing this project

README