Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/githubharald/WordDetector
Detect handwritten words (classic image processing based method).
https://github.com/githubharald/WordDetector
detector handwriting-recognition ocr segmentation text-detection
Last synced: 10 days ago
JSON representation
Detect handwritten words (classic image processing based method).
- Host: GitHub
- URL: https://github.com/githubharald/WordDetector
- Owner: githubharald
- License: mit
- Created: 2018-08-21T10:50:40.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-05-05T12:42:19.000Z (over 1 year ago)
- Last Synced: 2024-08-01T11:08:40.981Z (3 months ago)
- Topics: detector, handwriting-recognition, ocr, segmentation, text-detection
- Language: Python
- Homepage:
- Size: 2.27 MB
- Stars: 260
- Watchers: 4
- Forks: 83
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Word Segmentation with Scale Space Technique
**Update 2021: installable Python package, added line clustering and word sorting**
Implementation of the scale space technique for word segmentation proposed by
[R. Manmatha and N. Srimal](http://ciir.cs.umass.edu/pubfiles/mm-27.pdf).
Even though the paper is from 1999, the method still achieves good results, is fast, and has a simple implementation.
The algorithm takes an **image containing words as input** and **outputs the detected words**.
Optionally, the words are sorted according to reading order (top to bottom, left to right).![example](./doc/example.png)
## Installation
* Go to the root level of the repository
* Execute `pip install .`
* Go to `tests/` and execute `pytest` to check if installation worked## Usage
This example loads an image of a text line, prepares it for the detector (1), detects words (2),
sorts them (3), and finally shows the cropped words (4).````python
from word_detector import prepare_img, detect, sort_line
import matplotlib.pyplot as plt
import cv2# (1) prepare image:
# (1a) convert to grayscale
# (1b) scale to specified height because algorithm is not scale-invariant
img = prepare_img(cv2.imread('data/line/0.png'), 50)# (2) detect words in image
detections = detect(img,
kernel_size=25,
sigma=11,
theta=7,
min_area=100)# (3) sort words in line
line = sort_line(detections)[0]# (4) show word images
plt.subplot(len(line), 1, 1)
plt.imshow(img, cmap='gray')
for i, word in enumerate(line):
print(word.bbox)
plt.subplot(len(line), 1, i + 2)
plt.imshow(word.img, cmap='gray')
plt.show()
````The repository contains some examples showing how to use the package:
* Install requirements: `pip install -r requirements.txt`
* Go to `examples/`
* Run `python main.py` to detect words in line images (IAM dataset)
* Or, run `python main.py --data ../data/page --img_height 1000 --theta 5` to run the detector on an image of a page (also from IAM dataset)The package contains the following functions:
* `prepare_img`: prepares input image for detector
* `detect`: detect words in image
* `sort_line`: sort words in a (single) line
* `sort_multiline`: cluster words into lines, then sort each line separatelyFor more details on the functions and their parameters use `help(function_name)`, e.g. `help(detect)`.
## Algorithm
The illustration below shows how the algorithm works:
* top left: input image
* top right: apply filter to the image
* bottom left: threshold filtered image
* bottom right: compute bounding boxes![illustration](./doc/illustration.png)
The filter kernel with size=25, sigma=5 and theta=3 is shown below on the left.
It models the typical shape of a word, with the width larger than the height (in this case by a factor of 3).
On the right the frequency response is shown (DFT of size 100x100).
The filter is in fact a low-pass, with different cut-off frequencies in x and y direction.
![kernel](./doc/kernel.png)## How to select parameters
* The algorithm is **not scale-invariant**
* The default parameters give good results for a text height of 25-50 pixels
* If working with lines, resize the image to 50 pixels height
* If working with pages, resize the image so that the words have a height of 25-50 pixels
* The sigma parameter controls the width of the Gaussian function (standard deviation) along the x-direction. Small
values might lead to multiply detection per word (over-segmentation), while large values might lead to a detection
containing multiple words (under-segmentation)
* The kernel size depends on the sigma parameter and should be chosen large enough to contain as much of the non-zero
kernel values as possible
* The average aspect ratio (width/height) of the words to be detected is a good initial guess for the theta parameterThe best way to find the optimal parameters is to use a dataset (e.g. IAM) and optimize the parameters w.r.t. some
evaluation metric (e.g. intersection over union).## Results
This algorithm gives good results on datasets with large inter-word-distances and small intra-word-distances like IAM.
However, for historical datasets like Bentham or Ratsprotokolle results are not very good and more complex approaches
should be preferred (e.g., a neural network based approach as implemented in
the [WordDetectorNN](https://github.com/githubharald/WordDetectorNN) repository).