Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/pannous/tensorflow-ocr

🖺 OCR using tensorflow with attention

ocr tensorflow

Last synced: 15 Jun 2024

https://github.com/breezedeus/Pix2Text

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

image-to-markdown latex latex-pdf layout-analysis math-formula math-formula-recognition math-ocr mathpix ocr python pytorch table-ocr

Last synced: 14 Jun 2024

https://github.com/zhoubear/open-paperless

Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)

documents groupware ocr office paperless pdf scanner

Last synced: 14 Jun 2024

https://github.com/YangDai2003/CopilotOCR-Android

Fast OCR, multi-language support, sleek design. Scan, edit, and share text effortlessly. Smart barcode and QR code scanning.

android android-ocr-application java ocr ocr-android ocr-text-reader

Last synced: 14 Jun 2024

https://github.com/tshetrim/image-to-text-ocr-extension-for-chatgpt

Image To Text (OCR) Extension for ChatGPT (Chrome + Firefox)

chatgpt firefox-addon google-extension ocr productivity-tools tesseractjs

Last synced: 14 Jun 2024

https://github.com/amebalabs/TRex

Copy any text on your screen, stop retyping.

macos ocr productivity screenshot swift textrecognition tools

Last synced: 13 Jun 2024

https://github.com/scott0123/Tesseract-macOS

Objective C wrapper for the open source OCR Engine Tesseract (macOS)

mac macos objective-c ocr screenshot tesseract tesseract-mac tesseract-macos xcode

Last synced: 12 Jun 2024

https://github.com/diaomin/crnn-mxnet-chinese-text-recognition

An implementation of CRNN (CNN+LSTM+warpCTC) on MxNet for chinese text recognition

chinese-text-recognition cnn-lstm-ctc crnn mxnet ocr

Last synced: 11 Jun 2024

https://github.com/chinakook/CTPN.mxnet

Connectionist Text Proposal Network in MXNet

ctpn ocr

Last synced: 11 Jun 2024

https://github.com/xiaofengShi/CHINESE-OCR

[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

ctpn keras-crnn lstm-ctc ocr pytorch-crnn tensorflow

Last synced: 11 Jun 2024

https://github.com/smileboywtu/MillionHeroAssistant

百万 / 冲顶 / 芝士 / UC / 万能 答题助手(知识图谱更加专业,自动推荐答案, Android手机自动屏幕适配,模拟器支持,多开)

adb android baidu ios ocr python3 xigua

Last synced: 11 Jun 2024

https://github.com/opensemanticsearch/open-semantic-search

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

annotation faceted-search fulltext-search investigative-journalism journalism named-entity-recognition ocr ontologies osint python research-tool search search-engine search-interface semantic skos text-analysis text-mining thesaurus ui

Last synced: 09 Jun 2024

https://github.com/ciur/papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)

archives django dms document-management ocr paperless pdf scan scanned-documents

Last synced: 09 Jun 2024

https://github.com/pd3f/pd3f

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

extract-text language-model machine-learning ocr parsr pd3f pdf pdf-to-text pipeline python text-extraction

Last synced: 09 Jun 2024

https://github.com/Tony607/keras-image-ocr

How to train a Keras model to recognize variable length text | DLology

deep-learning keras ocr

Last synced: 09 Jun 2024

https://github.com/ReceiptManager/receipt-parser-legacy

A supermarket receipt parser written in Python using tesseract OCR

home-assistant invoice ocr receipt receipt-parser supermarket

Last synced: 09 Jun 2024

https://github.com/eikek/docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

dms docspell document document-management document-management-system edms elm nlp ocr pdf personal-document-system scala self-hosted spa stanford-corenlp webapp

Last synced: 09 Jun 2024

https://github.com/Lynnesbian/OCRbot

An OCR (Optical Character Recognition) bot for Mastodon (and compatible) instances

mastodon ocr python python3 tesseract

Last synced: 09 Jun 2024

https://github.com/tesseract-ocr/tessdata

Trained models with fast variant of the "best" LSTM models + legacy models

ocr tesseract

Last synced: 08 Jun 2024

https://github.com/dynobo/normcap

OCR powered screen-capture tool to capture information instead of images

multiplatform ocr python screenshot tesserocr tool

Last synced: 08 Jun 2024

https://github.com/EKYCSolutions/khmer-ocr-benchmark-dataset

A standardized benchmark dataset for Khmer Optical Character Recognition (OCR) engine.

dataset khmer ocr

Last synced: 08 Jun 2024

https://github.com/RapidAI/RapidOCR

Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVION and PaddlePaddle.

chineseocr crnn dbnet easyocr ocr onnxruntime openvino paddleocr rapidocr

Last synced: 08 Jun 2024

https://github.com/yardstick17/image_text_reader

The module extracts text from image using the tesseract-OCR engine. Generally, text present in the images are blur or are of uneven sizes. The image is pre-processed for better comprehension by OCR. This module first makes bounding box for text in images and then normalizes it to 300 dpi, suitable for OCR engine to read.

image-reader image-to-text ocr ocr-text-reader read-image tesseract-ocr

Last synced: 08 Jun 2024

https://github.com/eragonruan/text-detection-ctpn

text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

ctpn id-card ocr robust-reading tensorflow text-detection

Last synced: 08 Jun 2024

https://github.com/clovaai/CRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)

craft curved-text cvpr2019 detection ocr ocr-detection pytorch text-detection

Last synced: 08 Jun 2024

https://github.com/Megvii-CSG/MegReader

A research project for text detection and recognition using PyTorch 1.2.

ctc deep-learning ocr pytorch text-detection text-detection-recognition text-recognition

Last synced: 08 Jun 2024

https://github.com/Sanster/text_renderer

Generate text images for training deep learning ocr model

crnn ocr synthtext

Last synced: 08 Jun 2024

https://github.com/Holmeyoung/crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.

cnn ctc-loss ocr rnn

Last synced: 08 Jun 2024

https://github.com/faustomorales/keras-ocr

A packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

keras keras-crnn ocr text-detection

Last synced: 07 Jun 2024

https://github.com/breezedeus/CnOCR

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】

chinese-character-recognition english-character-recognition ocr ocr-python pytorch

Last synced: 07 Jun 2024

https://github.com/clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

computer-vision document-ai eccv-2022 multimodal-pre-trained-model nlp ocr

Last synced: 06 Jun 2024

https://github.com/enoch3712/ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.

ai llm nlp ocr openai python

Last synced: 06 Jun 2024

https://github.com/hwalsuklee/awesome-deep-text-detection-recognition

A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-list awesome-lists deep-learning ocr ocr-detection ocr-paper ocr-paper-list ocr-papers ocr-recognition text-detection text-detection-recognition text-recognition

Last synced: 04 Jun 2024

https://github.com/whitelok/image-text-localization-recognition

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

awesome convolutional-neural-networks deep-learning deep-learning-algorithms machine-learning ocr scene-texts text-detection text-extraction text-recognition

Last synced: 04 Jun 2024

https://github.com/LodestoneHQ/lodestone

Personal Document Archiving (DMS, EDMS for Personal/Home Office use)

dms document-management edms filemanager lodestone ocr personal-document-system

Last synced: 02 Jun 2024

https://github.com/hertzg/tesseract-server

A small lightweight HTTP server that converts photos, images and scanned documents to text using optical character recognition by utilizing the power of Google Tesseract.

api container containers docker docker-compose docker-image hacktoberfest http-server image-processing ocr rest-api tesseract tesseract-server typescript

Last synced: 02 Jun 2024

https://github.com/scambier/obsidian-text-extractor

A (companion) plugin to facilitate the extraction of text from images (OCR) and PDFs.

obsidian obsidian-plugin ocr pdf

Last synced: 02 Jun 2024

https://github.com/VikParuchuri/texify

Math OCR model that outputs LaTeX and markdown

deep-learning latex markdown ocr

Last synced: 01 Jun 2024

https://github.com/andrealenzi11/py-poppleract

Python library and Web service based on Poppler Pdftotext utility and Tesseract OCR for extracting text from PDF documents

ocr optical-character-recognition pdf-reader pdf-splitting pdf-to-text pdf2text pdftotext poppler poppleract py-poppleract tesseract tesseract-ocr text-extraction

Last synced: 31 May 2024

https://github.com/cyanfish/naps2

Scan documents to PDF and more, as simply as possible.

csharp dotnet escl linux macos ocr pdf sane scanner twain wia windows

Last synced: 31 May 2024

https://github.com/lifeparticle/Bengali-Alphabet

✍️ Bengali alphabet (বাংলা বর্ণমালা)

awesome bangla bangla-nlp bangla-ocr bengali bengali-alphabet machine-learning nlp ocr unicode

Last synced: 31 May 2024

https://github.com/menon92/BanglaText2Image

Synthetic data generation for bangla OCR

bangla-ocr data java ocr synthetic

Last synced: 31 May 2024

https://github.com/AlejandroAkbal/Image-to-Text-OCR

Image to Text is a web tool to extract text from any image using OCR

image nuxt ocr ocr-recognition vite vue web

Last synced: 31 May 2024

https://github.com/xushengfeng/xlinkote

无限画布 白板笔记 知识管理

draw geogebra knowledgebase latex markdown notebook ocr tikz todo whiteboard

Last synced: 31 May 2024

https://github.com/InkTimeRecord/TTime

🚀 Screenshots, word marking, OCR, AI, translation software || 截图、划词、文字识别、AI、翻译软件

ai exe macos ocr pc screenshots software translation ttime windows

Last synced: 31 May 2024

https://github.com/hiroi-sora/Umi-OCR

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

ocr ocr-python paddleocr

Last synced: 31 May 2024

https://github.com/hiroi-sora/Umi-OCR_v2

结束和新的开始

ocr ocr-python paddleocr qml qt

Last synced: 31 May 2024

https://github.com/PaddlePaddle/Paddle.js

Paddle.js is a web project for Baidu PaddlePaddle, which is an open source deep learning framework running in the browser. Paddle.js can either load a pre-trained model, or transforming a model from paddle-hub with model transforming tools provided by Paddle.js. It could run in every browser with WebGL/WebGPU/WebAssembly supported. It could also run in Baidu Smartprogram and WX miniprogram.

deep-learning inference-engine model ocr paddlepaddle webassembly webgl webgpu

Last synced: 31 May 2024

https://github.com/Dadangdut33/Screen-Translate

A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

ocr opencv-python python tesseract-ocr tkinter translate

Last synced: 29 May 2024

https://github.com/xushengfeng/eSearch

截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 滚动截屏 Screenshot OCR search translate search for picture paste the picture on the screen screen recorder

clipboard color-picker cross-platform electron image-editing image-editor live-text ocr paddleocr screen-capture screen-recorder screenshot search search-photos

Last synced: 28 May 2024

https://github.com/sismics/docs

Lightweight document management system packed with all the features you can expect from big expensive solutions

cloud dms docker document enterprise file-sharing java javascript ocr open-source self-hosting sharing workflow

Last synced: 28 May 2024

https://github.com/axa-group/Parsr

Transforms PDF, Documents and Images into Enriched Structured Data

data document extraction hacktoberfest images nlp ocr parsr pdf python typescript

Last synced: 28 May 2024

https://github.com/devmaxxing/videocr-app

Desktop application for extracting text/hard-coded subtitles from videos.

ocr paddleocr pyqt5 python subtitles

Last synced: 28 May 2024

https://github.com/TheJoeFin/Text-Grab

Use OCR in Windows quickly and easily with Text Grab. With optional background process and notifications.

dotnet msix ocr window-10 windows windows-11 wpf

Last synced: 28 May 2024

https://github.com/devmaxxing/videocr-PaddleOCR

Extract hardcoded subtitles from videos using machine learning

machine-learning ocr paddleocr paddlepaddle subtitles

Last synced: 28 May 2024

https://github.com/clovaai/deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019

crnn deep-learning grcnn iccv2019 ocr ocr-recognition r2am rare recognition rosetta scene-text scene-text-recognition star-net text-recognition

Last synced: 27 May 2024

https://github.com/YaoFANGUK/video-subtitle-extractor

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

deep-learning extract hardsub ocr ripper srt subrip subtitles

Last synced: 27 May 2024

https://github.com/sml2h3/ddddocr

带带弟弟 通用验证码识别OCR pypi版

captcha ddddocr ocr

Last synced: 27 May 2024

https://github.com/DayBreak-u/chineseocr_lite

超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M

ncnn ocr pytorch

Last synced: 27 May 2024

https://github.com/robertknight/tesseract-wasm

JS/WebAssembly build of the Tesseract OCR engine for use in browsers and Node

js ocr webassembly

Last synced: 26 May 2024

https://github.com/Chandler-Lu/alfred-ocr

OCR & Translate using multiple interfaces for multi platform.

alfred cnocr ocr python quicker zxing

Last synced: 26 May 2024

https://github.com/dam-cav/img-to-pgs-sup

PGS SUP subtitle generator (Image sequences), useful companion of VideoSubFinder

blueray ocr subtitle videosubfinder

Last synced: 24 May 2024

https://github.com/obgnail/video-subtitle-extractor

提取视频硬字幕。采用 PaddleOCR。

extractor ocr opencv python subtitles video

Last synced: 24 May 2024

https://github.com/different-ai/file-organizer-2000

Never Organize your Obsidian Vault again

gpt obsidian ocr

Last synced: 22 May 2024

https://github.com/RhetTbull/textinator

Simple MacOS StatusBar / Menu Bar app to automatically detect text in screenshots

macos menubar menubar-app menubarapp ocr osx screenshot utility

Last synced: 21 May 2024

https://github.com/lucasrla/remarks

Extract annotations (highlights and scribbles) from PDF, EPUB, and notebooks marked with reMarkable tablets. Export to Markdown, PDF, PNG, SVG

annotations epub highlighting markdown obsidian ocr ocrmypdf pdf pdf-converter pymupdf remarkable-tablet roamresearch svg-images zotero

Last synced: 20 May 2024

https://github.com/the-black-knight-01/Tabulo

Table Detection and Extraction Using Deep Learning ( It is built in Python, using Luminoth, TensorFlow<2.0 and Sonnet.)

deep-learning detection faster-r-cnn luminoth ocr pdf-table-extraction python sonnet ssd table-data-extraction table-detection table-detection-using-deep-learning table-recognition tabulo tensorflow tesseract

Last synced: 19 May 2024

https://github.com/veryfi/veryfi-go

Go module for communicating with the Veryfi OCR API.

api go go-library go-sdk golang-library invoice ocr receipt sdk sdk-go veryfi veryfi-api

Last synced: 18 May 2024

https://github.com/junhoyeo/betterocr

🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.

ai chatgpt chatgpt-api easyocr llm ocr openai openai-api tesseract tesseract-ocr

Last synced: 17 May 2024

https://github.com/R0Wi-DEV/workflow_ocr

This is a Nextcloud Workflow App which enables you to process files via OCR on serverside.

nextcloud nextcloud-workflow-ocr ocr pdf-files

Last synced: 15 May 2024

https://github.com/PaddlePaddle/PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

chineseocr crnn db ocr ocrlite

Last synced: 15 May 2024

https://github.com/mathewthe2/Game2Text

Complete toolbox for gamifying language learning

anki language-learning languages ocr yomichan

Last synced: 15 May 2024

https://github.com/matt-m-o/YomiNinja

Open-source OCR and dictionary tool.

dictonary language-learning languages ocr overlay

Last synced: 15 May 2024

https://github.com/kha-white/mokuro

Read Japanese manga inside browser with selectable text.

comics comics-reader japanese manga manga-reader ocr

Last synced: 15 May 2024

https://github.com/blueaxis/Poricom

Optical character recognition in manga images. Manga OCR desktop application

manga-ocr manga-reader ocr pyqt5-gui

Last synced: 15 May 2024

https://github.com/blueaxis/Cloe

Manga OCR snipping application for desktop

manga-ocr ocr ocr-python pyqt5 snipping-tool

Last synced: 15 May 2024

https://github.com/Monogramm/erpnext_ocr

:snake: :alembic: Optical Character Recognition using tesseract within Frappe.

erpnext frappe ocr python tesseract

Last synced: 14 May 2024

https://github.com/InternLM/HuixiangDou

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

application assistance chatbot dsl lark llm multimodal ocr pipeline rag robot wechat

Last synced: 14 May 2024

https://github.com/jrodal98/screenshot-actions

Dunst actions for screenshots (OCR, upload to 0x0.st, delete, rename, move to/from clipboard)

context-actions dunst flameshot hacktoberfest maim ocr screenshot scrot

Last synced: 14 May 2024

https://github.com/Andrewthe13th/Inventory_Kamera

Scans Genshin Impact characters, artifacts, and weapons from the game window into a JSON file.

artifacts game genshin genshin-impact ocr scanner weapons

Last synced: 14 May 2024

https://github.com/Tencent/TNN

TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.

coreml deep-learning face-detection hairsegmentaion inference mnn ncnn ocr openvino pytorch tengine tensorflow tensorrt

Last synced: 13 May 2024