{"id":22913954,"url":"https://github.com/manumishra12/visionread","last_synced_at":"2026-04-17T07:31:45.021Z","repository":{"id":267976872,"uuid":"902947671","full_name":"manumishra12/VisionRead","owner":"manumishra12","description":"VisionRead the application that leverages the power of Llama 3.2 Vision to perform Optical Character Recognition (OCR).","archived":false,"fork":false,"pushed_at":"2025-02-12T13:20:09.000Z","size":30604,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-01T11:52:03.718Z","etag":null,"topics":["llama","material-ui","ocr","ollama","python3","reactjs","streamlit"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/manumishra12.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-13T15:46:57.000Z","updated_at":"2025-02-12T13:20:22.000Z","dependencies_parsed_at":"2024-12-13T17:31:07.320Z","dependency_job_id":null,"html_url":"https://github.com/manumishra12/VisionRead","commit_stats":null,"previous_names":["manumishra12/visionread"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/manumishra12/VisionRead","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manumishra12%2FVisionRead","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manumishra12%2FVisionRead/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manumishra12%2FVisionRead/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manumishra12%2FVisionRead/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/manumishra12","download_url":"https://codeload.github.com/manumishra12/VisionRead/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manumishra12%2FVisionRead/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31919896,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-16T18:22:33.417Z","status":"online","status_checked_at":"2026-04-17T02:00:06.879Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llama","material-ui","ocr","ollama","python3","reactjs","streamlit"],"created_at":"2024-12-14T05:12:37.886Z","updated_at":"2026-04-17T07:31:44.979Z","avatar_url":"https://github.com/manumishra12.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VisionRead 👀\nVisionRead application built using Python (with Streamlit) and React.js to perform Optical Character Recognition (OCR) leveraging the Llama 3.2 Vision model. These applications can extract text from both printed and handwritten images with exceptional accuracy, providing advanced features like file history for revisiting uploaded images.\n\u003cbr\u003e\n\n\n### React Appliction:\n\u003cimg src=\"https://github.com/manumishra12/VisionRead/blob/main/VisionRead.gif\" alt=\"Alt text\" width=\"1000\" height=\"550\"\u003e\n\n\u003cbr\u003e\n\u003cbr\u003e\n\n### Streamlit Appliction:\n\u003cimg src=\"https://github.com/manumishra12/VisionRead/blob/main/screen-capture.gif\" alt=\"Alt text\" width=\"1000\" height=\"500\"\u003e\n\n---\n\n\u003cbr\u003e\n\n## About Llama 3.2 Vision and Ollama\n\n**Llama 3.2 Vision** is a state-of-the-art multimodal AI model developed by **Ollama**, designed to process and understand both textual and visual inputs. With its advanced capabilities, Llama 3.2 Vision excels in extracting and interpreting information from images, including printed and handwritten text, making it a perfect fit for OCR tasks.\n\n**Ollama** provides a robust platform to deploy and interact with models like Llama 3.2 Vision. It offers a CLI and libraries for Python and JavaScript, making integration simple and efficient for developers.\n\nLearn more about Llama 3.2 Vision on the [Ollama blog](https://ollama.com/blog/llama3.2-vision).\n\n\u003cbr\u003e\n\n\u003cimg src=\"https://github.com/manumishra12/VisionRead/blob/main/image.png\" alt=\"Alt text\" width=\"200\" height=\"200\"\u003e\n\n\n---\n\n## Key Features\n\n- **Text Extraction:** Extracts text from printed and handwritten images with exceptional accuracy.\n- **File History:** Automatically saves uploaded images and their results, allowing users to revisit previous files and outputs.\n- **Handwriting Recognition:** Converts handwritten notes into editable text, enhancing accessibility and productivity.\n- **Simple Interface:** User-friendly Streamlit interface for easy interaction.\n- **Multi-format Support:** Accepts common image formats (e.g., PNG, JPG) for OCR processing.\n- **Efficient Backend:** Utilizes the Llama 3.2 Vision model for fast and reliable text extraction.\n- **Scalable Design:** Designed to handle multiple uploads and maintain a smooth user experience.\n\n---\n\n## Requirements\n\n- Python 3.7+\n- Ollama CLI and Llama 3.2 Vision model\n- Streamlit\n- ReactJS\n- MaterialUI\n\n---\n\n## Installation for Streamlit\n\n1. Clone this repository:\n   ```bash\n   git clone https://github.com/your-username/ocr-streamlit-app.git\n   cd ocr-streamlit-app\n   ```\n\n2. Install the required Python libraries:\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. Pull the Llama 3.2 Vision model using Ollama:\n   ```bash\n   ollama pull llama3.2-vision\n   ```\n\n---\n\n\u003cbr\u003e\n\n## React.js Implementation\n\n### Requirements\n- Node.js and npm\n- Ollama CLI and Llama 3.2 Vision model\n- React.js\n\n\n### Installation\n\n1. Clone this repository:\n   ```bash\n   git clone https://github.com/your-username/ocr-app.git\n   cd ocr-app/react\n   ```\n\n2. Install the required dependencies:\n   ```bash\n   npm install\n   ```\n   \n3. Pull the Llama 3.2 Vision model using Ollama:\n   ```bash\n   ollama pull llama3.2-vision\n   ```\n      \n4. Start the React app:\n   ```bash\n   npm start\n   ```\n\nUpload an image to extract text. The app will display the extracted text in a structured format, and users can navigate through the file history to revisit previous uploads.\n\nKey React.js Features\n\n- Responsive Design: Ensures a seamless user experience across devices.\n- Interactive Dashboard: Provides a dynamic interface for managing uploads and viewing results.\n- API Integration: Leverages the Ollama API to interact with the Llama 3.2 Vision model efficiently.\n- Customization Options: Easily adaptable for additional features like language translation or text-to-speech.\n- Real-time Feedback: Displays processing status to keep users informed.\n\n\u003cbr\u003e\n\n\u003cimg src=\"https://github.com/manumishra12/VisionRead/blob/main/1.png\" alt=\"Alt text\" width=\"600\" height=\"600\"\u003e\n\u003cbr\u003e\n\u003cimg src=\"https://github.com/manumishra12/VisionRead/blob/main/2.png\" alt=\"Alt text\" width=\"600\" height=\"600\"\u003e\n\u003cbr\u003e\n\u003cimg src=\"https://github.com/manumishra12/VisionRead/blob/main/new.png\" alt=\"Alt text\" width=\"800\" height=\"600\"\u003e\n---\n\n## Usage\n\nRunning the Application\n\n1. Start the Streamlit app:\n   ```bash\n   streamlit run app.py\n   ```\n\n2. Upload an image to extract text. The app will display the extracted text along with a history of previous uploads.\n   \n---\n\n## Example Code: Using Llama 3.2 Vision\n\n💡Python Library\nTo use Llama 3.2 Vision with the Ollama Python library:\n\n ```bash\n  import ollama\n  \n  response = ollama.chat(\n      model='llama3.2-vision',\n      messages=[{\n          'role': 'user',\n          'content': 'What is in this image?',\n          'images': ['image.jpg']\n      }]\n  )\n  \n  print(response)\n```\n\n💡JavaScript Library\nTo use Llama 3.2 Vision with the Ollama JavaScript library:\n\n ```bash\n  import ollama from 'ollama';\n  \n  const response = await ollama.chat({\n    model: 'llama3.2-vision',\n    messages: [{\n      role: 'user',\n      content: 'What is in this image?',\n      images: ['image.jpg']\n    }]\n  });\n  \n  console.log(response);\n\n```\n\n💡cURL\nUsing cURL to interact with the Ollama model:\n\n ```bash\ncurl http://localhost:11434/api/chat -d '{\n  \"model\": \"llama3.2-vision\",\n  \"messages\": [\n    {\n      \"role\": \"user\",\n      \"content\": \"what is in this image?\",\n      \"images\": [\"\u003cbase64-encoded image data\u003e\"]\n    }\n  ]\n}'\n```\n\n---\n\n## Acknowledgments\n\n- Ollama Team: For providing the Llama 3.2 Vision model.\n- Streamlit: For the easy-to-use framework enabling rapid development of data applications.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanumishra12%2Fvisionread","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanumishra12%2Fvisionread","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanumishra12%2Fvisionread/lists"}