https://github.com/patelvivekdev/llm-ocr

Last synced: 9 months ago
JSON representation

Host: GitHub
URL: https://github.com/patelvivekdev/llm-ocr
Owner: patelvivekdev
License: mit
Created: 2024-12-10T00:40:10.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-01-02T18:14:25.000Z (about 1 year ago)
Last Synced: 2025-02-10T06:44:45.834Z (11 months ago)
Language: TypeScript
Size: 241 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # LLM-OCR

A simple OCR SDK that uses AI models to extract text from images and return formatted markdown.

## Features

- [x] Support for multiple AI providers (Google Gemini, Mistral)

- [x] Local and remote image processing

- [x] Streaming and non-streaming responses

- [x] Base64 image encoding

- [x] Markdown formatted output

- [ ] additional provider support with models

- [ ] additional output formats (JSON)

- [ ] support for pdf files

- [ ] support for Multi-page PDF files

## Installation

```bash

npm install llm-ocr

# or

yarn add llm-ocr

# or

pnpm add llm-ocr

# or

bun add llm-ocr

```

## Environment Variables

Create a `.env` file and add your API keys:

```env

GOOGLE_API_KEY=your_google_api_key

MISTRAL_API_KEY=your_mistral_api_key

```

## Usage

### Basic Example

```typescript

import { ocr } from 'llm-ocr';

// For local image

const result = await ocr({

  filePath: './path/to/image.jpg',

  modelID: 'gemini-1.5-flash',

  provider: 'google',

  stream: false,

  // systemPrompt: 'What is the text in the image?', // Optional

});

// For remote image

const result = await ocr({

  filePath: 'https://example.com/image.jpg',

  modelID: 'pixtral-large-latest',

  provider: 'mistral',

  stream: false,

});

```

### Available Models

Google Models:

- gemini-1.5-flash `fast but less accurate`

- gemini-1.5-flash-8b `fast but less accurate`

- gemini-1.5-pro `accurate but slow`

Mistral Models:

- pixtral-12b-2409 `fast but less accurate`

- pixtral-large-latest `accurate but slow`

### Utility Functions

```typescript

import { encodeImage, isRemoteFile, downloadImageAndEncode } from 'llm-ocr';

// Encode local image to base64

const base64Image = encodeImage('./path/to/image.jpg');

// Check if file is remote

const isRemote = isRemoteFile('https://example.com/image.jpg');

// Download and encode remote image

const encodedRemoteImage = await downloadImageAndEncode(

  'https://example.com/image.jpg',

);

```

## License

MIT © [Vivek Patel](https://github.com/patelvivekdev)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/patelvivekdev/llm-ocr

Awesome Lists containing this project

README