{"id":13988856,"url":"https://github.com/prabhakar267/image2text","last_synced_at":"2025-04-05T02:08:46.527Z","repository":{"id":37679908,"uuid":"58192124","full_name":"prabhakar267/image2text","owner":"prabhakar267","description":":clipboard: Python wrapper to grab text from images and save as text files using Tesseract Engine","archived":false,"fork":false,"pushed_at":"2023-01-23T06:59:27.000Z","size":5686,"stargazers_count":406,"open_issues_count":1,"forks_count":141,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-03-29T01:07:49.711Z","etag":null,"topics":["image2text","ocr","optical-character-recognition","python-wrapper","tesseract","tesseract-engine","tesseract-installation","tesseract-ocr"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/prabhakar267.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-05-06T08:19:03.000Z","updated_at":"2025-03-21T09:52:35.000Z","dependencies_parsed_at":"2023-02-12T21:00:31.151Z","dependency_job_id":null,"html_url":"https://github.com/prabhakar267/image2text","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prabhakar267%2Fimage2text","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prabhakar267%2Fimage2text/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prabhakar267%2Fimage2text/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/prabhakar267%2Fimage2text/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/prabhakar267","download_url":"https://codeload.github.com/prabhakar267/image2text/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247276164,"owners_count":20912288,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["image2text","ocr","optical-character-recognition","python-wrapper","tesseract","tesseract-engine","tesseract-installation","tesseract-ocr"],"created_at":"2024-08-09T13:01:24.214Z","updated_at":"2025-04-05T02:08:46.511Z","avatar_url":"https://github.com/prabhakar267.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Image2Text\n[![Build Status](https://api.travis-ci.org/prabhakar267/image2text.svg?branch=master)](https://travis-ci.org/prabhakar267/image2text)\n\n**Image2Text** is a python wrapper to grab text from images and save as text files using [Google Tesseract Engine](https://github.com/tesseract-ocr/tesseract). Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License, Version 2.0, and development has been sponsored by Google since 2006. In 2006 Tesseract was considered one of the most accurate open-source OCR engines then available.\n\n**Quick Links:**\n+ [Usage](#usage)\n    + [Running Tests](#running-tests)\n+ [Tesseract Installation](#tesseract-installation)\n    + [Linux](#linux)\n    + [Windows](#windows)\n+ [Sample Results](#sample-results)\n    + [Sample Image](#sample-image)\n    + [Text output](#text-output)\n\n\n# Usage\n```shell\npython main.py -i \u003cinput_path\u003e -o \u003coutput_path\u003e\n```\n```\nusage: main.py [-h] -i INPUT [-o OUTPUT] [-d]\n\nrequired arguments:\n  -i INPUT, --input INPUT       Single image file path or images directory path\n\noptional arguments:\n  -o OUTPUT, --output OUTPUT    (Optional) Output directory for converted text\n  -d, --debug                   Enable verbose DEBUG logging\n```\n\n```shell\npython main.py -i sample/\n```\nor\n```\npython main.py -i sample/ -o output/\n```\n\n## Running Tests\n```\npython -m unittest\n```\n\n# Tesseract Installation\n## Linux\n```\n[sudo] apt-get install tesseract-ocr\n```\n## Windows\n1. Install tesseract-ocr from UB Mannheim here: [https://github.com/UB-Mannheim/tesseract/wiki](https://github.com/UB-Mannheim/tesseract/wiki)\n2. Add the installed Tesseract-OCR directory path to `PATH` system variable\n\n## Mac\n```\nbrew install tesseract\n```\n\n# Sample Results\n## Sample Image\n**(Wikipedia page for Google | Lang : Simple English)**\n![](/sample/file-page1.jpg?raw=true)\n\n## Text output\n```\nA man signing in at Google’s main aﬁce, Googleplex.\n\nGoogle Inc. is an American multinational corporation\nthat is best known for running one of the largest search\nengines on the World Wide Web (WWW). Every day,\n200 million (200,000,000) people use it. Google’s main\nofﬁce (“Googleplex”) is in Mountain View, California,\nUSA.\n\nWith Google Search, people can also search for pictures,\nUsenet newsgroups, news, and things to buy online. By\nJune 2004, Google had 4.28 billion web pages on its\ndatabase, 880 million (880,000,000) pictures and 845\nmillion (845,000,000) Usenet messages — six billion\nthings.\n\n“To google,” as an action word (verb) means “to search\nfor something on Google”. Because Google is so popular\n(more than half of people on the web use it) it has been\nused to mean “to search the web”. Google dislikes this\nuse since the name of the company is a trademark.\n\nAs a public company, Google Inc. trades on the\nNASDAQ under the tickers GOOG and GOOGL.\n\nIn August 2015, Google announced it was being restruc-\ntured under a new holding company called Alphabet Inc.\n\n1 History\n\nGoogle was started in early 1996 by Larry Page and\nSergey Brin, two students at Stanford University, USA.\nIt used to be called Backrub. Later, they made it into a\ncompany, Google Inc., on September 7, 1998 at a friend’s\ngarage in Menlo Park, California. In February 1999, the\ncompany moved to 165 University Ave., Palo Alto, Cal-\nifornia. Later that year, it moved to another place, now\n\ncalled the “Googleplex”.\n\nIn September 2001, Google’s rating system (“PageR-\nank”, for saying which information is more helpful) got a\nUS. Patent. The patent was to Stanford University, with\nLawrence (Larry) Page as the inventor (the person who\nﬁrst had the idea).\n\nGoogle makes an important, though shrinking, percent-\nage of its money through its friends like America Online\nand InterActiveCorp. It has a special group known as the\nPartner Solutions Organization (PSO) which helps make\ncontracts, helps making accounts better, and gives engi-\nneering help.\n\n2 How Google makes money\n\nGoogle makes money by advertising. People or compa-\nnies who want people to buy their product, service, or\nideas give Google money, and Google shows an adver-\ntisement to people Google thinks will click on the adver-\ntisement. Google only gets money when people click on\nthe link, so it tries to know as much about people as pos-\nsible to only show the advertisement to the “right people”.\nIt does this with Google Analytics, which sends data back\nto Google whenever someone visits a web site. From this\nand other data, Google makes a proﬁle about the person,\nwhich it then uses to ﬁgure out which advertisements to\nshow.\n\n3 The name “Google”\n\nThe name “Google” is a misspelling of the word\ng00g01.[7][8] Milton Sirotta, nephew of US. mathemati-\ncian Edward Kasner, made this word in 1938, for the\nnumber 1 followed by one hundred zeroes ( 10100 ). It\nis said that the word “googol” was chosen as a name for\nthis number because it sounded like baby talk. Google\nuses this word because the company wants to make lots\nof stuff on the Web easy to ﬁnd and use. Andy Bechtol-\nsheim ﬁrst thought of the name.\n\nThe name for Google’s main ofﬁce, the “Googleplex,” is a\nplay on a different, even bigger number, the \"googolpleX\",\nwhich is 1 followed by one googol of zeroes.\n\n\n```\n\n## Stargazers over time\n\n[![Stargazers over time](https://starchart.cc/prabhakar267/image2text.svg)](https://starchart.cc/prabhakar267/image2text)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprabhakar267%2Fimage2text","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprabhakar267%2Fimage2text","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprabhakar267%2Fimage2text/lists"}