{"id":13542834,"url":"https://github.com/githubharald/WordDetector","last_synced_at":"2025-04-02T12:30:36.718Z","repository":{"id":37439861,"uuid":"145550762","full_name":"githubharald/WordDetector","owner":"githubharald","description":"Detect handwritten words (classic image processing based method).","archived":false,"fork":false,"pushed_at":"2023-05-05T12:42:19.000Z","size":2378,"stargazers_count":260,"open_issues_count":0,"forks_count":83,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-08-01T11:08:40.981Z","etag":null,"topics":["detector","handwriting-recognition","ocr","segmentation","text-detection"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/githubharald.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-08-21T10:50:40.000Z","updated_at":"2024-07-27T18:41:55.000Z","dependencies_parsed_at":"2024-01-15T23:37:22.467Z","dependency_job_id":null,"html_url":"https://github.com/githubharald/WordDetector","commit_stats":null,"previous_names":["githubharald/wordsegmentation"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FWordDetector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FWordDetector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FWordDetector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/githubharald%2FWordDetector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/githubharald","download_url":"https://codeload.github.com/githubharald/WordDetector/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246815248,"owners_count":20838410,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["detector","handwriting-recognition","ocr","segmentation","text-detection"],"created_at":"2024-08-01T11:00:18.486Z","updated_at":"2025-04-02T12:30:36.697Z","avatar_url":"https://github.com/githubharald.png","language":"Python","funding_links":[],"categories":["Segmentation"],"sub_categories":["Word Segmentation"],"readme":"# Word Segmentation with Scale Space Technique\n\n**Update 2021: installable Python package, added line clustering and word sorting**\n\nImplementation of the scale space technique for word segmentation proposed by \n[R. Manmatha and N. Srimal](http://ciir.cs.umass.edu/pubfiles/mm-27.pdf). \nEven though the paper is from 1999, the method still achieves good results, is fast, and has a simple implementation. \nThe algorithm takes an **image containing words as input** and **outputs the detected words**.\nOptionally, the words are sorted according to reading order (top to bottom, left to right).\n\n![example](./doc/example.png)\n\n## Installation\n\n* Go to the root level of the repository\n* Execute `pip install .`\n* Go to `tests/` and execute `pytest` to check if installation worked\n\n## Usage\n\nThis example loads an image of a text line, prepares it for the detector (1), detects words (2), \nsorts them (3), and finally shows the cropped words (4).\n\n````python\nfrom word_detector import prepare_img, detect, sort_line\nimport matplotlib.pyplot as plt\nimport cv2\n\n# (1) prepare image:\n# (1a) convert to grayscale\n# (1b) scale to specified height because algorithm is not scale-invariant\nimg = prepare_img(cv2.imread('data/line/0.png'), 50)\n\n# (2) detect words in image\ndetections = detect(img,\n                    kernel_size=25,\n                    sigma=11,\n                    theta=7,\n                    min_area=100)\n\n# (3) sort words in line\nline = sort_line(detections)[0]\n\n# (4) show word images\nplt.subplot(len(line), 1, 1)\nplt.imshow(img, cmap='gray')\nfor i, word in enumerate(line):\n  print(word.bbox)\n  plt.subplot(len(line), 1, i + 2)\n  plt.imshow(word.img, cmap='gray')\nplt.show()\n````\n\nThe repository contains some examples showing how to use the package:\n* Install requirements: `pip install -r requirements.txt`\n* Go to `examples/`\n* Run `python main.py` to detect words in line images (IAM dataset)\n* Or, run `python main.py --data ../data/page --img_height 1000 --theta 5` to run the detector on an image of a page (also from IAM dataset)\n\n\nThe package contains the following functions:\n* `prepare_img`: prepares input image for detector\n* `detect`: detect words in image\n* `sort_line`: sort words in a (single) line\n* `sort_multiline`: cluster words into lines, then sort each line separately\n\nFor more details on the functions and their parameters use `help(function_name)`, e.g. `help(detect)`.\n\n\n## Algorithm\n\nThe illustration below shows how the algorithm works:\n\n* top left: input image\n* top right: apply filter to the image\n* bottom left: threshold filtered image\n* bottom right: compute bounding boxes\n\n![illustration](./doc/illustration.png)\n\nThe filter kernel with size=25, sigma=5 and theta=3 is shown below on the left. \nIt models the typical shape of a word, with the width larger than the height (in this case by a factor of 3). \nOn the right the frequency response is shown (DFT of size 100x100). \nThe filter is in fact a low-pass, with different cut-off frequencies in x and y direction.\n![kernel](./doc/kernel.png)\n\n\n## How to select parameters\n\n* The algorithm is **not scale-invariant**\n    * The default parameters give good results for a text height of 25-50 pixels\n    * If working with lines, resize the image to 50 pixels height\n    * If working with pages, resize the image so that the words have a height of 25-50 pixels\n* The sigma parameter controls the width of the Gaussian function (standard deviation) along the x-direction. Small\n  values might lead to multiply detection per word (over-segmentation), while large values might lead to a detection\n  containing multiple words (under-segmentation)\n* The kernel size depends on the sigma parameter and should be chosen large enough to contain as much of the non-zero\n  kernel values as possible\n* The average aspect ratio (width/height) of the words to be detected is a good initial guess for the theta parameter\n\nThe best way to find the optimal parameters is to use a dataset (e.g. IAM) and optimize the parameters w.r.t. some\nevaluation metric (e.g. intersection over union).\n\n## Results\n\nThis algorithm gives good results on datasets with large inter-word-distances and small intra-word-distances like IAM.\nHowever, for historical datasets like Bentham or Ratsprotokolle results are not very good and more complex approaches\nshould be preferred (e.g., a neural network based approach as implemented in\nthe [WordDetectorNN](https://github.com/githubharald/WordDetectorNN) repository).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubharald%2FWordDetector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgithubharald%2FWordDetector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgithubharald%2FWordDetector/lists"}