{"id":40872376,"url":"https://github.com/connectaman/deepseek-ocr-multigpu-infer","last_synced_at":"2026-01-22T00:41:50.069Z","repository":{"id":321199635,"uuid":"1084884575","full_name":"connectaman/deepseek-ocr-multigpu-infer","owner":"connectaman","description":"Efficient multi-GPU OCR inference framework leveraging parallel processes for accelerated token throughput and faster batch processing. Designed for scalable, high-performance optical character recognition workloads using PyTorch. Supports dynamic GPU assignment, optimized resource utilization, and easy integration for large-scale image datasets.","archived":false,"fork":false,"pushed_at":"2025-10-28T10:58:58.000Z","size":126,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-28T12:08:17.725Z","etag":null,"topics":["agentic-extraction","data","deepseek","document-parser","extraction","extractor","gpu","image-parser","llm","multigpu","nvidia","ocr","parallel-computing","parser","pdf-parser","vlm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/connectaman.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-28T09:48:47.000Z","updated_at":"2025-10-28T10:59:02.000Z","dependencies_parsed_at":"2025-10-28T12:08:21.761Z","dependency_job_id":"bad3369d-c3f6-41ee-9ee6-72f282068fc6","html_url":"https://github.com/connectaman/deepseek-ocr-multigpu-infer","commit_stats":null,"previous_names":["connectaman/deepseek-ocr-multigpu-infer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/connectaman/deepseek-ocr-multigpu-infer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/connectaman%2Fdeepseek-ocr-multigpu-infer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/connectaman%2Fdeepseek-ocr-multigpu-infer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/connectaman%2Fdeepseek-ocr-multigpu-infer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/connectaman%2Fdeepseek-ocr-multigpu-infer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/connectaman","download_url":"https://codeload.github.com/connectaman/deepseek-ocr-multigpu-infer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/connectaman%2Fdeepseek-ocr-multigpu-infer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28648460,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-21T21:29:11.980Z","status":"ssl_error","status_checked_at":"2026-01-21T21:24:31.872Z","response_time":86,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-extraction","data","deepseek","document-parser","extraction","extractor","gpu","image-parser","llm","multigpu","nvidia","ocr","parallel-computing","parser","pdf-parser","vlm"],"created_at":"2026-01-22T00:41:49.390Z","updated_at":"2026-01-22T00:41:50.053Z","avatar_url":"https://github.com/connectaman.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DeepSeek-OCR Inference Scripts\n\n[![GitHub](https://img.shields.io/badge/GitHub-Repository-blue?logo=github)](https://github.com/connectaman/deepseek-ocr-multigpu-infer)\n[![License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Python](https://img.shields.io/badge/Python-3.8+-blue?logo=python)](https://www.python.org/)\n\nProfessional, production-ready Python scripts for running DeepSeek-OCR inference. This repository provides both single GPU and multi-GPU inference options to suit different hardware configurations and use cases.\n\n**Repository**: [https://github.com/connectaman/deepseek-ocr-multigpu-infer](https://github.com/connectaman/deepseek-ocr-multigpu-infer)\n\n\n## Scripts Available\n\n### 1. Single GPU Inference (`deepseek_ocr_inference.py`)\n- 🎯 **Single GPU**: Optimized for single GPU setups\n- ⚡ **Fast Setup**: Quick model loading and processing\n- 🔧 **Model Presets**: Built-in presets for different model sizes\n- 📝 **Crop Mode**: Optional crop mode for better performance\n- 🔄 **Multi-Process**: Support for 1-2 processes per GPU for maximum utilization\n\n### 2. Multi-GPU Inference (`deepseek_ocr_multigpu_inference.py`)\n- 🚀 **Multi-GPU Support**: Automatically detects and utilizes all available CUDA GPUs\n- 📁 **Parallel Processing**: Processes entire folders of images in parallel\n- ⚖️ **Load Balancing**: Efficiently distributes work across GPUs\n- 📊 **Scalable**: Scales with your hardware\n- 🔄 **Multi-Process**: Support for 1-2 processes per GPU for maximum utilization\n\n## Common Features\n\n- 📁 **Batch Processing**: Processes entire folders of images\n- 🔧 **Configurable**: Customizable prompts, image sizes, and processing parameters\n- 📊 **Progress Tracking**: Real-time logging and progress monitoring\n- 📈 **Results Export**: Excel export of processing results and statistics\n- 🛡️ **Error Handling**: Robust error handling with detailed logging\n- 📝 **Professional Logging**: Clean, informative logging without experimental metrics\n\n## Requirements\n\n- Python 3.8+\n- CUDA-compatible GPU(s)\n- NVIDIA drivers and CUDA toolkit\n\n## 🖥️ GPU Requirements\n\n### Minimum Requirements\n- **GPU Memory**: 8GB VRAM minimum\n- **CUDA Compute Capability**: 7.0+ (RTX 20 series or newer)\n- **CUDA Version**: 11.8 or higher\n- **Driver Version**: 525.60.13 or newer\n\n### Recommended Configurations\n\n#### Single GPU Setups\n- **RTX 4090** (24GB) - Best for single GPU multi-process\n- **RTX 4080** (16GB) - Good for single GPU single process\n- **RTX 4070** (12GB) - Minimum for single GPU multi-process\n- **A100** (40GB) - Enterprise single GPU setups\n\n#### Multi-GPU Setups\n- **2x RTX 4090** (24GB each) - Maximum performance\n- **2x RTX 4080** (16GB each) - High performance\n- **4x RTX 4070** (12GB each) - Cost-effective multi-GPU\n\n### AWS Instance Recommendations\n\n#### Tested AWS Instances\n\n| Instance Type | GPU | VRAM | Use Case | Performance |\n|---------------|-----|------|----------|-------------|\n| **g5.xlarge** | 1x NVIDIA A10G | 24GB | Single GPU testing | 1x baseline |\n| **g5.12xlarge** | 4x NVIDIA A10G | 24GB each | Multi-GPU production | 3.5-4x speedup |\n\n#### AWS Instance Details\n\n**g5.xlarge**\n- **GPU**: 1x NVIDIA A10G\n- **VRAM**: 24GB\n- **vCPUs**: 4\n- **Memory**: 16GB RAM\n- **Best For**: Single GPU testing, development, small batch processing\n- **Approximate Cost**: ~$1.1/hour\n\n**g5.12xlarge**\n- **GPU**: 4x NVIDIA A10G\n- **VRAM**: 24GB per GPU (96GB total)\n- **vCPUs**: 48\n- **Memory**: 192GB RAM\n- **Best For**: Multi-GPU production, large batch processing, maximum throughput\n- **Approximate Cost**: ~$5.6/hour\n\n### Performance Benchmarking\n\n#### Test Environment\n- **AWS Instance**: g5.xlarge and g5.12xlarge\n- **Model**: deepseek-ai/DeepSeek-OCR\n- **Image Size**: 1024x1024\n- **Batch Size**: 100 images\n- **Test Images**: Mixed document types (PDFs, screenshots, handwritten notes)\n\n#### Benchmark Results\n\n| Approach | Instance | GPUs | Processes | Images/min | Speedup |\n|----------|----------|------|-----------|------------|---------|\n| Single GPU - Single Process | g5.xlarge | 1 | 1 | 12-15 | 1x |\n| Single GPU - Multi Process | g5.xlarge | 1 | 2 | 18-22 | 1.5x |\n| Multi-GPU - Single Process | g5.12xlarge | 4 | 4 | 45-55 | 3.5x |\n| Multi-GPU - Multi Process | g5.12xlarge | 4 | 8 | 65-80 | 5x |\n\n#### Memory Usage Patterns\n\n| Configuration | GPU Memory Usage | Peak Memory | Notes |\n|---------------|------------------|-------------|-------|\n| Single Process | 8-10GB | 12GB | Stable memory usage |\n| Multi Process (2x) | 6-8GB per process | 16GB | Shared model loading |\n| Multi-GPU (4x) | 8-10GB per GPU | 12GB per GPU | Independent GPU memory |\n\n\n## Installation\n\n1. **Clone or download this repository**\n   ```bash\n   git clone https://github.com/connectaman/deepseek-ocr-multigpu-infer.git\n   cd deepseek-ocr-multigpu-infer\n   ```\n\n2. **Create a virtual environment (recommended)**\n   ```bash\n   python -m venv venv\n   source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n   ```\n\n3. **Install dependencies**\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n## 🚀 Inference Approaches\n\nThis repository supports **4 different inference approaches** to maximize GPU utilization and processing speed:\n\n### 1. Single GPU - Single Process\n- **Use Case**: Basic single GPU setups, testing, or when you want simple processing\n- **Command**: `python deepseek_ocr_inference.py input_folder output_folder`\n- **Processes**: 1 process on 1 GPU\n- **Best For**: Simple setups, testing, or when you have limited GPU memory\n\n### 2. Single GPU - Multi Process\n- **Use Case**: Maximum utilization of a single powerful GPU\n- **Command**: `python deepseek_ocr_inference.py input_folder output_folder --num-processes 2`\n- **Processes**: 2 processes on 1 GPU\n- **Best For**: High-end single GPU setups (RTX 4090, A100, etc.)\n\n### 3. Multi-GPU - Single Process per GPU\n- **Use Case**: Multiple GPUs with standard utilization\n- **Command**: `python deepseek_ocr_multigpu_inference.py input_folder output_folder`\n- **Processes**: 1 process per GPU\n- **Best For**: Multi-GPU setups with moderate processing needs\n\n### 4. Multi-GPU - Multi Process per GPU\n- **Use Case**: Maximum utilization across multiple GPUs\n- **Command**: `python deepseek_ocr_multigpu_inference.py input_folder output_folder --num-processes-per-gpu 2`\n- **Processes**: 2 processes per GPU (e.g., 4 processes on 2 GPUs)\n- **Best For**: High-performance multi-GPU setups for maximum throughput\n\n## Usage\n\n### Single GPU Inference\n\n#### Basic Usage\n```bash\npython deepseek_ocr_inference.py input_folder output_folder\n```\n\n#### Advanced Usage\n```bash\npython deepseek_ocr_inference.py ./images ./results \\\n    --prompt \"Convert this document to markdown\" \\\n    --base-size 1024 \\\n    --image-size 640 \\\n    --crop-mode \\\n    --gpu-id 0 \\\n    --num-processes 2 \\\n    --results-file my_results.xlsx\n```\n\n#### Multi-Process Usage (Maximum GPU Utilization)\n```bash\npython deepseek_ocr_inference.py ./images ./results --num-processes 2\n```\n\n#### Model Size Presets\n```bash\n# Tiny model (fastest, least accurate)\npython deepseek_ocr_inference.py input output --base-size 512 --image-size 512 --no-crop-mode\n\n# Small model\npython deepseek_ocr_inference.py input output --base-size 640 --image-size 640 --no-crop-mode\n\n# Base model (default)\npython deepseek_ocr_inference.py input output --base-size 1024 --image-size 1024 --no-crop-mode\n\n# Large model (most accurate, slowest)\npython deepseek_ocr_inference.py input output --base-size 1280 --image-size 1280 --no-crop-mode\n\n# Gundam model (balanced)\npython deepseek_ocr_inference.py input output --base-size 1024 --image-size 640 --crop-mode\n```\n\n### Multi-GPU Inference\n\n#### Basic Usage\n```bash\npython deepseek_ocr_multigpu_inference.py input_folder output_folder\n```\n\n#### Advanced Usage\n```bash\npython deepseek_ocr_multigpu_inference.py ./images ./results \\\n    --prompt \"Convert this document to markdown\" \\\n    --base-size 1024 \\\n    --image-size 1280 \\\n    --num-processes-per-gpu 2 \\\n    --results-file multigpu_results.xlsx\n```\n\n#### Multi-Process per GPU (Maximum Utilization)\n```bash\npython deepseek_ocr_multigpu_inference.py ./images ./results --num-processes-per-gpu 2\n```\n\n### Command Line Arguments\n\n#### Single GPU Script (`deepseek_ocr_inference.py`)\n\n| Argument | Required | Default | Description |\n|----------|----------|---------|-------------|\n| `input_folder` | ✅ | - | Path to folder containing input images |\n| `output_folder` | ✅ | - | Path to folder for output markdown files |\n| `--prompt` | ❌ | `\"\u003cimage\u003e\\n\u003c|grounding|\u003eConvert the document to markdown. \"` | Custom prompt for OCR model |\n| `--base-size` | ❌ | `1024` | Base size parameter for model |\n| `--image-size` | ❌ | `640` | Image size parameter for model |\n| `--crop-mode` | ❌ | `False` | Enable crop mode for processing |\n| `--gpu-id` | ❌ | `0` | GPU device ID to use |\n| `--num-processes` | ❌ | `1` | Number of processes on GPU (1-2) |\n| `--results-file` | ❌ | `single_gpu_inference_results.xlsx` | Excel file for processing results |\n\n#### Multi-GPU Script (`deepseek_ocr_multigpu_inference.py`)\n\n| Argument | Required | Default | Description |\n|----------|----------|---------|-------------|\n| `input_folder` | ✅ | - | Path to folder containing input images |\n| `output_folder` | ✅ | - | Path to folder for output markdown files |\n| `--prompt` | ❌ | `\"\u003cimage\u003e\\n\u003c|grounding|\u003eConvert the document to markdown. \"` | Custom prompt for OCR model |\n| `--base-size` | ❌ | `1024` | Base size parameter for model |\n| `--image-size` | ❌ | `1280` | Image size parameter for model |\n| `--num-processes-per-gpu` | ❌ | `1` | Number of processes per GPU (1-2) |\n| `--results-file` | ❌ | `multigpu_inference_results.xlsx` | Excel file for processing results |\n\n### Supported Image Formats\n\n- JPEG (.jpg, .jpeg)\n- PNG (.png)\n- BMP (.bmp)\n- TIFF (.tiff, .tif)\n- WebP (.webp)\n\n## GPU Monitoring\n\n### Install GPU Monitoring Tool\n\n```bash\npip install nvitop\n```\n\n### Monitor GPU Usage\n\n```bash\nnvitop\n```\n\nThis will show real-time GPU utilization, memory usage, and temperature for all available GPUs.\n\n#### GPU Monitoring Screenshot\n\n![GPU Monitoring with nvitop](screenshot/gpu1.png)\n![GPU Monitoring with nvitop](screenshot/gpu2.png)\n\n*Example of nvitop showing GPU utilization during DeepSeek-OCR inference across multiple GPUs*\n\n## Example Workflow\n\n### Single GPU Workflow\n\n1. **Prepare your images**\n   ```bash\n   mkdir input_images\n   # Copy your images to input_images/\n   ```\n\n2. **Run single GPU inference**\n   ```bash\n   python deepseek_ocr_inference.py input_images output_markdowns\n   ```\n\n3. **Monitor progress**\n   - Watch the console output for real-time progress\n   - Use `nvitop` in another terminal to monitor GPU usage\n\n4. **Check results**\n   - Markdown files will be saved in `output_markdowns/`\n   - Processing results will be saved in `single_gpu_inference_results.xlsx`\n\n### Multi-GPU Workflow\n\n1. **Prepare your images**\n   ```bash\n   mkdir input_images\n   # Copy your images to input_images/\n   ```\n\n2. **Run multi-GPU inference**\n   ```bash\n   python deepseek_ocr_multigpu_inference.py input_images output_markdowns\n   ```\n\n3. **Monitor progress**\n   - Watch the console output for real-time progress\n   - Use `nvitop` in another terminal to monitor GPU usage across all GPUs\n\n4. **Check results**\n   - Markdown files will be saved in `output_markdowns/`\n   - Processing results will be saved in `multigpu_inference_results.xlsx`\n\n## Output Structure\n\n### Markdown Files\nEach input image generates a corresponding markdown file:\n```\ninput_images/\n├── document1.jpg\n├── document2.png\n└── document3.tiff\n\noutput_markdowns/\n├── document1.md\n├── document2.md\n└── document3.md\n```\n\n### Results Excel File\nThe Excel file contains processing metadata:\n- `filename`: Original image filename\n- `markdown_filename`: Generated markdown filename\n- `gpu_id`: GPU that processed the image\n- `gpu_name`: Name of the GPU used\n- `status`: Processing status (success/error)\n- `error`: Error message (if applicable)\n\n## 📊 Performance Comparison\n\n### Processing Speed (AWS Tested)\n| Approach | Instance | GPUs | Processes | Images/min | Speedup | Cost/hr |\n|----------|----------|------|-----------|------------|---------|---------|\n| Single GPU - Single Process | g5.xlarge | 1 | 1 | 12-15 | 1x | $1.00 |\n| Single GPU - Multi Process | g5.xlarge | 1 | 2 | 18-22 | 1.5x | $1.00 |\n| Multi-GPU - Single Process | g5.12xlarge | 4 | 4 | 45-55 | 3.5x | $3.00 |\n| Multi-GPU - Multi Process | g5.12xlarge | 4 | 8 | 65-80 | 5x | $3.00 |\n\n### Local Hardware (Estimated)\n| Approach | GPU Setup | Processes | Use Case | Speed |\n|----------|-----------|-----------|----------|-------|\n| Single GPU - Single Process | 1 GPU | 1 | Basic processing | 1x |\n| Single GPU - Multi Process | 1 GPU | 2 | High-end single GPU | 1.5-1.8x |\n| Multi-GPU - Single Process | 2+ GPUs | 1 per GPU | Standard multi-GPU | 2x (2 GPUs) |\n| Multi-GPU - Multi Process | 2+ GPUs | 2 per GPU | Maximum throughput | 3-3.5x (2 GPUs) |\n\n### Memory Requirements\n- **Single Process**: ~8-12GB GPU memory per process\n- **Multi Process**: ~6-8GB GPU memory per process (due to shared model loading)\n- **Recommended**: RTX 4090 (24GB) or A100 (40GB) for multi-process setups\n\n### When to Use Each Approach\n1. **Single GPU - Single Process**: Testing, development, or limited GPU memory\n2. **Single GPU - Multi Process**: High-end single GPU with plenty of memory\n3. **Multi-GPU - Single Process**: Multiple GPUs with standard processing needs\n4. **Multi-GPU - Multi Process**: Production environments requiring maximum throughput\n\n## Performance Tips\n\n### Single GPU Optimization\n1. **Model Size**: Choose appropriate model size based on your accuracy vs speed requirements\n2. **Crop Mode**: Enable crop mode for better performance on smaller images\n3. **GPU Selection**: Use `--gpu-id` to select the most powerful GPU if you have multiple\n4. **Memory Management**: Monitor GPU memory usage with `nvitop`\n\n### Multi-GPU Optimization\n1. **Load Balancing**: The script automatically distributes work evenly across GPUs\n2. **GPU Memory**: Ensure sufficient GPU memory on all GPUs for your batch size\n3. **Image Size**: Larger images require more memory but may provide better OCR results\n4. **Monitoring**: Use `nvitop` to monitor GPU utilization across all GPUs\n\n### General Tips\n1. **Batch Processing**: Process images in batches to optimize memory usage\n2. **Image Formats**: Use compressed formats (JPEG) for faster loading\n3. **Storage**: Use SSD storage for faster image loading\n\n## Troubleshooting\n\n### Common Issues\n\n1. **CUDA Out of Memory**\n   - Reduce `--image-size` parameter\n   - Process fewer images simultaneously\n   - Check available GPU memory with `nvidia-smi`\n\n2. **No Images Found**\n   - Verify input folder path\n   - Check supported image formats\n   - Ensure images are not in subdirectories\n\n3. **Model Loading Errors**\n   - Verify internet connection for model download\n   - Check CUDA installation\n   - Ensure sufficient disk space for model cache\n\n### Debug Mode\n\nFor detailed debugging, you can modify the logging level in the script:\n```python\nlogging.basicConfig(level=logging.DEBUG, ...)\n```\n\n## License\n\nMIT License - see LICENSE file for details.\n\n## 👨‍💻 Author\n\n**Aman Ulla**\n- 📫 Contact: [connectamanulla@gmail.com](mailto:connectamanulla@gmail.com)\n- 🌐 Portfolio: [amanulla.in](http://www.amanulla.in)\n- 🔗 [GitHub](https://github.com/connectaman) • [LinkedIn](https://linkedin.com/in/connectaman) • [Twitter](https://twitter.com/connectaman1)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## Support\n\nFor issues and questions:\n1. Check the troubleshooting section above\n2. Review the console output for error messages\n3. Open an issue on the repository\n\n---\n\n**Note**: This script requires CUDA-compatible GPUs and the DeepSeek-OCR model. Make sure your system meets the hardware requirements before running.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconnectaman%2Fdeepseek-ocr-multigpu-infer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fconnectaman%2Fdeepseek-ocr-multigpu-infer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fconnectaman%2Fdeepseek-ocr-multigpu-infer/lists"}