{"id":22814432,"url":"https://github.com/danlock/gogosseract","last_synced_at":"2025-08-09T20:10:02.734Z","repository":{"id":204617748,"uuid":"711027857","full_name":"Danlock/gogosseract","owner":"Danlock","description":"A reimplementation of https://github.com/otiai10/gosseract without CGo, running Tesseract compiled to WASM with Wazero","archived":false,"fork":false,"pushed_at":"2023-11-07T06:05:17.000Z","size":26470,"stargazers_count":141,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-14T11:39:57.175Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Danlock.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-28T02:19:06.000Z","updated_at":"2024-11-07T10:51:07.000Z","dependencies_parsed_at":null,"dependency_job_id":"475fb74e-6fa2-4fa6-ae21-8a49c91da4e0","html_url":"https://github.com/Danlock/gogosseract","commit_stats":null,"previous_names":["danlock/gogosseract"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danlock%2Fgogosseract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danlock%2Fgogosseract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danlock%2Fgogosseract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Danlock%2Fgogosseract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Danlock","download_url":"https://codeload.github.com/Danlock/gogosseract/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229388361,"owners_count":18065252,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-12T13:08:53.109Z","updated_at":"2025-08-09T20:10:02.704Z","avatar_url":"https://github.com/Danlock.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gogosseract\n![Coverage](https://img.shields.io/badge/Coverage-70.4%25-brightgreen)\n[![Go Report Card](https://goreportcard.com/badge/github.com/danlock/gogosseract)](https://goreportcard.com/report/github.com/danlock/gogosseract)\n[![Go Reference](https://pkg.go.dev/badge/github.com/danlock/gogosseract.svg)](https://pkg.go.dev/github.com/danlock/gogosseract)\n\n\nA reimplementation of https://github.com/otiai10/gosseract without CGo, running Tesseract compiled to WASM with Emscripten via Wazero.\n\nTesseract is an Optical Character Recognition library written in C++.\n\nThe WASM is generated from my [personal](https://github.com/Danlock/tesseract-wasm) fork of robertknight's well written tesseract-wasm project.\n\nNote that Tesseract is only compiled with support for the LSTM neural network OCR engine, and not for \"classic\" Tesseract.\n\n\u003e [!CAUTION]\n\u003e This library and it's dependent libraries was broken by a backwards incompatible change in wazero 1.8.0. This library will not be updated. If you plan on\n\u003e using this library regardless, make sure your dependencies are the same version as what's in the go.mod file in this repo.\n\u003e Also, CGO gosseract is like 6 times faster than this library anyway the last time I checked.\n\n# Training Data\n\nTesseract requires training data in order to accurately recognize text. The official source is [here](https://github.com/tesseract-ocr/tessdata_fast). Strategies for dealing with this include downloading it at runtime, or embedding the file within your Go binary using go:embed at compile time.\n\n# Accuracy\n\nTesseract can work better if the input images are preprocessed. See this page for tips.\n\nhttps://tesseract-ocr.github.io/tessdoc/ImproveQuality.html\n\n# Examples\n\nUsing Tesseract to parse text from an image.\n\n```go\n    trainingDataFile, err := os.Open(\"eng.traineddata\")\n    handleErr(err)\n\n    cfg := gogosseract.Config{\n        Language: \"eng\",\n        TrainingData: trainingDataFile,\n    }\n    // While Tesseract's logs are very useful for debugging, you have the option to silence or redirect it\n    cfg.Stderr = io.Discard\n    cfg.Stdout = io.Discard\n    // Compile the Tesseract WASM and run it, loading in the TrainingData and setting any Config Variables provided\n    tess, err := gogosseract.New(ctx, cfg)\n    handleErr(err)\n\n    imageFile, err := os.Open(\"image.png\")\n    handleErr(err)\n\n    err = tess.LoadImage(ctx, imageFile, gogosseract.LoadImageOptions{})\n    handleErr(err)\n\n    text, err = tess.GetText(ctx, func(progress int32) { log.Printf(\"Tesseract parsing is %d%% complete.\", progress) })\n    handleErr(err)\n    // Closing the Tesseract instance will clean up everything used by Tesseract and it's WASM module\n    handleErr(tess.Close(ctx))\n```\n\nUsing a Pool of Tesseract workers for thread safe concurrent image parsing.\n\n```go\n    cfg := gogosseract.Config{\n        Language: \"eng\",\n        TrainingData: trainingDataFile,\n    }\n    // Create 10 Tesseract instances that can process image requests concurrently.\n    pool, err := gogosseract.NewPool(ctx, 10, gogosseract.PoolConfig{Config: cfg})\n    handleErr(err)\n    // ParseImage loads the image and waits until the Tesseract worker sends back your result.\n    hocr, err := pool.ParseImage(ctx, img, gogosseract.ParseImageOptions{\n        IsHOCR: true,\n    })\n    handleErr(err)\n    // Always remember to Close the pool to release resources\n    handleErr(pool.Close())\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanlock%2Fgogosseract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanlock%2Fgogosseract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanlock%2Fgogosseract/lists"}