Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/coderofsalvation/imagegrep-bash
grep word in pdf or image based on OCR
https://github.com/coderofsalvation/imagegrep-bash
Last synced: 12 days ago
JSON representation
grep word in pdf or image based on OCR
- Host: GitHub
- URL: https://github.com/coderofsalvation/imagegrep-bash
- Owner: coderofsalvation
- License: agpl-3.0
- Created: 2015-04-09T09:44:48.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-05-28T19:08:08.000Z (over 4 years ago)
- Last Synced: 2023-03-24T12:55:53.600Z (over 1 year ago)
- Language: Shell
- Size: 21.5 KB
- Stars: 6
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
imagegrep-bash
==============
unix 'grep' a word inside pdf or image based on OCR# Usage
./imagegrep foo.pdf invoice eng && echo "grab your wallet!"
no repo is complete without a catgif!# Install
wget https://raw.githubusercontent.com/coderofsalvation/imagegrep-bash/master/imagegrep
chmod 755 imagegrep
./imagegrep foo.pdf invoice eng# Requirements
* tesseract-ocr
* imagemagickthese packages can be installed using apt-get or yum
# Why
To automate, categorize files and their destination folder.
OCR usually fails in many cases, but sometimes knowing one word (and its length) is enough.
Imagegrep can be used this to scrape gmail and copy invoice-attachments to a preferred folder on my harddrive.# not covered here: gmail to local maildir using 'offlineimap'
# not covered here: use mu ('maildir-utils' package) to extract pdf attachmentsfind mailbox/latest/*.pdf | while read file; do
./imagegrep "$file" invoice eng &&\
echo "grab your wallet!" &&\
mv foo.pdf ~/admin/invoices/.
done