{"id":17110144,"url":"https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision","last_synced_at":"2025-06-25T05:07:25.431Z","repository":{"id":28580955,"uuid":"117417506","full_name":"sshniro/line-segmentation-algorithm-to-gcp-vision","owner":"sshniro","description":"Line segmentation algorithm for Google Vision API.","archived":false,"fork":false,"pushed_at":"2022-11-08T08:16:50.000Z","size":2890,"stargazers_count":96,"open_issues_count":16,"forks_count":37,"subscribers_count":11,"default_branch":"master","last_synced_at":"2025-06-13T23:39:11.093Z","etag":null,"topics":["data-extraction","google-vision","invoice","proposed-algorithm","segmentation"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sshniro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-01-14T09:26:48.000Z","updated_at":"2025-06-06T06:24:10.000Z","dependencies_parsed_at":"2023-01-14T09:06:15.655Z","dependency_job_id":null,"html_url":"https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sshniro/line-segmentation-algorithm-to-gcp-vision","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshniro%2Fline-segmentation-algorithm-to-gcp-vision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshniro%2Fline-segmentation-algorithm-to-gcp-vision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshniro%2Fline-segmentation-algorithm-to-gcp-vision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshniro%2Fline-segmentation-algorithm-to-gcp-vision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sshniro","download_url":"https://codeload.github.com/sshniro/line-segmentation-algorithm-to-gcp-vision/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sshniro%2Fline-segmentation-algorithm-to-gcp-vision/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259790461,"owners_count":22911549,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-extraction","google-vision","invoice","proposed-algorithm","segmentation"],"created_at":"2024-10-14T16:25:33.376Z","updated_at":"2025-06-25T05:07:25.407Z","avatar_url":"https://github.com/sshniro.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)](https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision/blob/master/LICENSE)\n[![Build Status](https://travis-ci.org/sshniro/line-segmentation-algorithm-to-gcp-vision.svg?branch=master)](https://travis-ci.com/sshniro/line-segmentation-algorithm-to-gcp-vision)\n# Introduction\n\nGoogle Vision provides 2 options for optical character recognition(OCR).\n\n````\n- Option 1: TEXT_DETECTION - Words with coordinates\n- Option 2: DOCUMENT_TEXT_DETECTION - OCR on dense text to extract lines and paragraph information\n````\n\nThe second option is suitable for data extraction from articles (Dense Text such as News Papers/Books). This option has an \nintelligent segmentation method to merge words which are nearby and form lines and paragraphs.\n \nThis feature is not desirable for images with sparse text content such as retail invoices, where the data relevant to the same line\nresides in two corners (A huge gap/whitespace between the product name and price). For these images the OCR segments the \nlines in a different order. If the distance of two words in a single line is too far apart then google vision identifies \nthem as two separate paragraphs/lines. \n\nThe below images shows the sample output for a typical invoice from google vision.\n\n\u003cimg width=\"1198\" alt=\"screen shot 2018-01-15 at 3 55 59 pm\" src=\"https://user-images.githubusercontent.com/13045528/34937970-9f2e93b8-fa0c-11e7-9521-0fc6ad191e0d.png\"\u003e\n\nThis behaviour creates a problem in information extraction scenarios. For example, to extract a price of a product from a \nretail invoice the system needs to find a way to match the words in the same line. The algorithm proposed below performs \nline segmentation based on characters polygon coordinates for data extraction.\n\n## Usage Guide\n\nUsage instruction for each programing language is located in the ReadMe files inside the relevant folders.\n\n### Proposed Algorithm\n\nThe implemented algorithm runs in two stages\n\n- Stage 1 - Groups nearby words to generate a longer strip of line\n- Stage 2 - Connects words which are far apart using the bounding polygon approach\n\n\u003cimg width=\"437\" alt=\"screen shot 2018-01-15 at 4 50 31 pm\" src=\"https://user-images.githubusercontent.com/13045528/34940084-415cf57e-fa14-11e7-8099-ffa7fbce1b21.png\"\u003e\n\n\n## Explanation.\n\nStage one helps to reduce the computations needed for the second phase of the algorithm. In the first phase the algorithms\ntries to merge words/characters which are very near. Stage 1 should be completed because for price related text like $3.40 is presented as 2 words by \nGoogle Vision (word 1: `$3.` word 2:`,40`). The first stage helps to concat nearby characters to form a text-block/word. \nThis step helps reduces the computation needed for the second phase.\n\nThe stage 2 algorithm draws an imaginary bounding polygon (with a threshold) over the words and computes the \nwords which belongs to each line.\n\n## Issues.\n\nThe algorithm successfully works for most of the slanted and slightly crumpled images. But it will fail to highly \ncrumpled or folded images.\n\n## Test \n##### Node JS\n\n- cd nodejs\n- npm install\n- npm test\n\n\n## Future Work\n\nTry to implement the water-flow algorithm for line segmentation and measure accuracies with bounding polygon approach. \n\n\u003cimg width=\"211\" alt=\"waterflow\" src=\"https://user-images.githubusercontent.com/13045528/34940259-d6899526-fa14-11e7-9b6c-4b3a2aaa1a75.png\"\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsshniro%2Fline-segmentation-algorithm-to-gcp-vision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsshniro%2Fline-segmentation-algorithm-to-gcp-vision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsshniro%2Fline-segmentation-algorithm-to-gcp-vision/lists"}