Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/beebus/osra-iterate
Bash Script to iterate through .TIF Images in a folder and run the OSRA program to attempt to convert the TIF images into ChemDraw files (.CDXML).
https://github.com/beebus/osra-iterate
bash bash-script bash-scripting chemical-structures cheminformatics chemistry image-processing image-recognition jpg linux molecular-structures molecule molecules ocr optical-recognition organic-chemistry osra pdf reactions tif-images
Last synced: 11 days ago
JSON representation
Bash Script to iterate through .TIF Images in a folder and run the OSRA program to attempt to convert the TIF images into ChemDraw files (.CDXML).
- Host: GitHub
- URL: https://github.com/beebus/osra-iterate
- Owner: beebus
- License: mit
- Created: 2017-09-29T19:45:45.000Z (about 7 years ago)
- Default Branch: main
- Last Pushed: 2021-03-20T19:01:06.000Z (over 3 years ago)
- Last Synced: 2024-10-04T06:41:07.226Z (about 1 month ago)
- Topics: bash, bash-script, bash-scripting, chemical-structures, cheminformatics, chemistry, image-processing, image-recognition, jpg, linux, molecular-structures, molecule, molecules, ocr, optical-recognition, organic-chemistry, osra, pdf, reactions, tif-images
- Language: Shell
- Homepage:
- Size: 3.91 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# osra-iterate
Bash Script to Iterate through TIF Images in Folder and Run OSRAThis shows the command line usage of the OSRA open source software.
Execute the osra_iterate.sh bash script by using the following command (or similar) within a Linux terminal and with the folder that contains osra_iterate.sh as the current working directory:
./osra_iterate.sh ~/Share/input/ ~/Share/output/
What is OSRA?
OSRA (Optical Structure Recognition Application) is a utility designed to convert graphical representations of chemical structures and reactions, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES or MOL files – a computer recognizable molecular structure format. OSRA can read a document in any of the over 90 graphical formats parseable by GraphicsMagick (https://sourceforge.net/p/osra/wiki/Dependencies#GraphicsMagick) – including GIF, JPEG, PNG, TIFF, PDF, PS etc., and generate the SMILES or MOL representation of the molecular structure images encountered within that document, or RSMI/RXN for reactions.
Note that any software designed for optical recognition is unlikely to be perfect, and the output produced might, and probably will, contain errors, so curation by a human knowledgeable in chemical structures is highly recommended.
OSRA can process the following types of images:
* Computer-generated 2D structures, such as found on the PubChem website (http://pubchem.ncbi.nlm.nih.gov/), black-and-white and color.
* Black-and-white PDF and PostScript files, including multi-page ones.
* Scanned images – black-and-white, a resolution of 300 dpi is recommended, though 150 dpi can also produce fair results. Please make sure the scanned image is of reasonable quality – an input that's too noisy will only generate garbage output.
* Reactions and PolymersYou can download a free version (https://sourceforge.net/p/osra/wiki/Download/) of the source code or support OSRA development by purchasing binary installation executables for Windows (https://store.payproglobal.com/checkout?products[1][id]=38760), and Linux (https://store.payproglobal.com/checkout?products[1][id]=38761).