{"id":23167785,"url":"https://github.com/patelvivekdev/llm-ocr","last_synced_at":"2025-04-04T22:20:40.422Z","repository":{"id":267868855,"uuid":"901054300","full_name":"patelvivekdev/llm-ocr","owner":"patelvivekdev","description":null,"archived":false,"fork":false,"pushed_at":"2025-01-02T18:14:25.000Z","size":247,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-10T06:44:45.834Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/patelvivekdev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-10T00:40:10.000Z","updated_at":"2025-01-02T18:14:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"bda447fa-b84a-4893-bb78-341d83767a9e","html_url":"https://github.com/patelvivekdev/llm-ocr","commit_stats":null,"previous_names":["patelvivekdev/llm-ocr"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patelvivekdev%2Fllm-ocr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patelvivekdev%2Fllm-ocr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patelvivekdev%2Fllm-ocr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/patelvivekdev%2Fllm-ocr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/patelvivekdev","download_url":"https://codeload.github.com/patelvivekdev/llm-ocr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247256658,"owners_count":20909336,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-18T02:34:56.815Z","updated_at":"2025-04-04T22:20:40.405Z","avatar_url":"https://github.com/patelvivekdev.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LLM-OCR\n\nA simple OCR SDK that uses AI models to extract text from images and return formatted markdown.\n\n## Features\n\n- [x] Support for multiple AI providers (Google Gemini, Mistral)\n- [x] Local and remote image processing\n- [x] Streaming and non-streaming responses\n- [x] Base64 image encoding\n- [x] Markdown formatted output\n- [ ] additional provider support with models\n- [ ] additional output formats (JSON)\n- [ ] support for pdf files\n- [ ] support for Multi-page PDF files\n\n## Installation\n\n```bash\nnpm install llm-ocr\n# or\nyarn add llm-ocr\n# or\npnpm add llm-ocr\n# or\nbun add llm-ocr\n```\n\n## Environment Variables\n\nCreate a `.env` file and add your API keys:\n\n```env\nGOOGLE_API_KEY=your_google_api_key\nMISTRAL_API_KEY=your_mistral_api_key\n```\n\n## Usage\n\n### Basic Example\n\n```typescript\nimport { ocr } from 'llm-ocr';\n\n// For local image\nconst result = await ocr({\n  filePath: './path/to/image.jpg',\n  modelID: 'gemini-1.5-flash',\n  provider: 'google',\n  stream: false,\n  // systemPrompt: 'What is the text in the image?', // Optional\n});\n\n// For remote image\nconst result = await ocr({\n  filePath: 'https://example.com/image.jpg',\n  modelID: 'pixtral-large-latest',\n  provider: 'mistral',\n  stream: false,\n});\n```\n\n### Available Models\n\nGoogle Models:\n\n- gemini-1.5-flash `fast but less accurate`\n- gemini-1.5-flash-8b `fast but less accurate`\n- gemini-1.5-pro `accurate but slow`\n\nMistral Models:\n\n- pixtral-12b-2409 `fast but less accurate`\n- pixtral-large-latest `accurate but slow`\n\n### Utility Functions\n\n```typescript\nimport { encodeImage, isRemoteFile, downloadImageAndEncode } from 'llm-ocr';\n\n// Encode local image to base64\nconst base64Image = encodeImage('./path/to/image.jpg');\n\n// Check if file is remote\nconst isRemote = isRemoteFile('https://example.com/image.jpg');\n\n// Download and encode remote image\nconst encodedRemoteImage = await downloadImageAndEncode(\n  'https://example.com/image.jpg',\n);\n```\n\n## License\n\nMIT © [Vivek Patel](https://github.com/patelvivekdev)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatelvivekdev%2Fllm-ocr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpatelvivekdev%2Fllm-ocr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpatelvivekdev%2Fllm-ocr/lists"}