{"id":28562975,"url":"https://github.com/bytefer/macos-vision-ocr","last_synced_at":"2025-06-10T12:42:01.817Z","repository":{"id":265080338,"uuid":"894199258","full_name":"bytefer/macos-vision-ocr","owner":"bytefer","description":"A powerful command-line OCR tool built with Apple's Vision framework, supporting single image and batch processing with detailed positional information output.","archived":false,"fork":false,"pushed_at":"2025-02-14T10:02:27.000Z","size":2204,"stargazers_count":53,"open_issues_count":1,"forks_count":8,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-02-14T11:22:32.234Z","etag":null,"topics":["apple-vision-framework","macos-ocr","macos-vision-ocr","ocr"],"latest_commit_sha":null,"homepage":"","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bytefer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-25T23:43:24.000Z","updated_at":"2025-02-14T10:02:30.000Z","dependencies_parsed_at":"2024-11-27T15:33:32.694Z","dependency_job_id":null,"html_url":"https://github.com/bytefer/macos-vision-ocr","commit_stats":null,"previous_names":["bytefer/macos-vision-ocr"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytefer%2Fmacos-vision-ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytefer%2Fmacos-vision-ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytefer%2Fmacos-vision-ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytefer%2Fmacos-vision-ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bytefer","download_url":"https://codeload.github.com/bytefer/macos-vision-ocr/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bytefer%2Fmacos-vision-ocr/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259078846,"owners_count":22802208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apple-vision-framework","macos-ocr","macos-vision-ocr","ocr"],"created_at":"2025-06-10T12:41:49.604Z","updated_at":"2025-06-10T12:42:01.669Z","avatar_url":"https://github.com/bytefer.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MacOS Vision OCR\n\nA powerful command-line OCR tool built with Apple's Vision framework, supporting single image and batch processing with detailed positional information output.\n\n## Features\n\n- Support for multiple image formats (JPG, JPEG, PNG, WEBP)\n- Single image and batch processing modes\n- Multi-language recognition (supporting 16 languages including English, Chinese, Japanese, Korean, and European languages)\n- Detailed JSON output with text positions and confidence scores\n- Debug mode with visual bounding boxes\n- Support for both arm64 and x86_64 architectures\n\n## System Requirements\n\n- macOS 10.15 or later\n- Support for arm64 (Apple Silicon) or x86_64 (Intel) architecture\n\n\u003e It is recommended that macOS 13 or later be used in preference to macOS 13 or later for the best OCR recognition.\n\n## Installation\n\n### Build from Source\n\n1. Ensure Xcode and Command Line Tools are installed\n\n2. Clone the repository:\n\n```bash\ngit clone https://github.com/your-username/macos-vision-ocr.git\ncd macos-vision-ocr\n```\n\n3. Build for your architecture:\n\nFor Apple Silicon (arm64):\n\n```bash\nswift build -c release --arch arm64\n```\n\nFor Intel (x86_64):\n\n```bash\nswift build -c release --arch x86_64\n```\n\n## Usage\n\n### Single Image Processing\n\nProcess a single image and output to console:\n\n```bash\n./macos-vision-ocr --img ./images/handwriting.webp\n```\n\nProcess with custom output directory:\n\n```bash\n./macos-vision-ocr --img ./images/handwriting.webp --output ./images\n```\n\n### Set Recognition Languages\n\nRecognition languages can be specified using the `--rec-langs` option. For example:\n\n```bash\n./macos-vision-ocr --img ./images/handwriting.webp --rec-langs \"zh-Hans, zh-Hant, en-US\"\n```\n\n### Batch Processing\n\nProcess multiple images in a directory:\n\n```bash\n./macos-vision-ocr --img-dir ./images --output-dir ./output\n```\n\nMerge all results into a single file:\n\n```bash\n./macos-vision-ocr --img-dir ./images --output-dir ./output --merge\n```\n\n### Debug Mode\n\nEnable debug mode to visualize text detection:\n\n```bash\n./macos-vision-ocr --img ./images/handwriting.webp --debug\n```\n\n![handwriting_boxes.png](./images/handwriting_boxes.png)\n\n### Command Line Options\n\n```\nOptions:\n  --img \u003cpath\u003e          Path to a single image file\n  --output \u003cpath\u003e       Output directory for single image mode\n  --img-dir \u003cpath\u003e      Directory containing images for batch mode\n  --output-dir \u003cpath\u003e   Output directory for batch mode\n  --merge              Merge all text outputs into a single file in batch mode\n  --debug              Debug mode: Draw bounding boxes on the image\n  --lang               Show supported recognition languages\n  --help               Show help information\n```\n\n## Output Format\n\nThe tool outputs JSON with the following structure:\n\n```json\n{\n  \"texts\": \"The Llama 3.2-Vision Collection of multimodal large langyage model5 （LLMS） is a\\ncollection of instruction-tuned image reasoning generative models in l1B and 90B\\nsizes （text + images in / text ovt）. The Llama 3.2-Vision instruction-tuned models\\nare optimized for visval recognittion, iage reasoning, captioning, and answering\\ngeneral qvestions about an iage. The models outperform many of the available\\nopen Source and Closed multimodal models on common industry benchmarKs.\",\n  \"info\": {\n    \"filepath\": \"./images/handwriting.webp\",\n    \"width\": 1600,\n    \"filename\": \"handwriting.webp\",\n    \"height\": 720\n  },\n  \"observations\": [\n    {\n      \"text\": \"The Llama 3.2-Vision Collection of multimodal large langyage model5 （LLMS） is a\",\n      \"confidence\": 0.5,\n      \"quad\": {\n        \"topLeft\": {\n          \"y\": 0.28333333395755611,\n          \"x\": 0.09011629800287288\n        },\n        \"topRight\": {\n          \"x\": 0.87936045388666206,\n          \"y\": 0.28333333395755611\n        },\n        \"bottomLeft\": {\n          \"x\": 0.09011629800287288,\n          \"y\": 0.35483871098527953\n        },\n        \"bottomRight\": {\n          \"x\": 0.87936045388666206,\n          \"y\": 0.35483871098527953\n        }\n      }\n    }\n  ]\n}\n```\n\n## Debug Output\n\nWhen using `--debug`, the tool will:\n\n1. Create a new image with \"\\_boxes.png\" suffix\n2. Draw red bounding boxes around detected text\n3. Save the debug image in the same directory as the input image\n\n## Supported Languages\n\n- English (en-US)\n- French (fr-FR)\n- Italian (it-IT)\n- German (de-DE)\n- Spanish (es-ES)\n- Portuguese (Brazil) (pt-BR)\n- Simplified Chinese (zh-Hans)\n- Traditional Chinese (zh-Hant)\n- Simplified Cantonese (yue-Hans)\n- Traditional Cantonese (yue-Hant)\n- Korean (ko-KR)\n- Japanese (ja-JP)\n- Russian (ru-RU)\n- Ukrainian (uk-UA)\n- Thai (th-TH)\n- Vietnamese (vi-VT)\n\n## Node.js Integration Example\n\nHere's an example of how to use `macos-vision-ocr` in a Node.js application:\n\n```javascript\nconst { exec } = require(\"child_process\");\nconst util = require(\"util\");\nconst execPromise = util.promisify(exec);\n\nasync function performOCR(imagePath, outputDir = null) {\n  try {\n    // Construct the command\n    let command = `./macos-vision-ocr --img \"${imagePath}\"`;\n    if (outputDir) {\n      command += ` --output \"${outputDir}\"`;\n    }\n\n    // Execute the OCR command\n    const { stdout, stderr } = await execPromise(command);\n\n    if (stderr) {\n      console.error(\"Error:\", stderr);\n      return null;\n    }\n\n    // Parse the JSON output\n    console.log(\"stdout:\", stdout);\n    const result = JSON.parse(stdout);\n    return result;\n  } catch (error) {\n    console.error(\"OCR processing failed:\", error);\n    return null;\n  }\n}\n\n// Example usage\nasync function example() {\n  const result = await performOCR(\"./images/handwriting.webp\");\n  if (result) {\n    console.log(\"Extracted text:\", result.texts);\n    console.log(\"Text positions:\", result.observations);\n  }\n}\n\nexample();\n```\n\n## Common Issues\n\n1. **Image Loading Fails**\n\n   - Ensure the image path is correct\n   - Verify the image format is supported (JPG, JPEG, PNG, WEBP)\n   - Check file permissions\n\n2. **No Text Detected**\n   - Ensure the image contains clear, readable text\n   - Check if the text size is not too small (minimum text height is 1% of image height)\n   - Verify the text language is supported\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Acknowledgments\n\nBuilt with:\n\n- Apple Vision Framework\n- Swift Argument Parser\n- macOS Native APIs\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytefer%2Fmacos-vision-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbytefer%2Fmacos-vision-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbytefer%2Fmacos-vision-ocr/lists"}