https://github.com/carlosacchi/captiocr
CaptiOCR - A real-time screen text extraction tool using Tesseract OCR. Capture, recognize, and log on-screen text dynamically. Future updates will include on-demand language installation, resizable selection areas, and live text overlays.
https://github.com/carlosacchi/captiocr
captions live live-caption live-captioning live-captions live-transcript logging ocr ocr-python ocr-recognition saver transcription
Last synced: 6 days ago
JSON representation
CaptiOCR - A real-time screen text extraction tool using Tesseract OCR. Capture, recognize, and log on-screen text dynamically. Future updates will include on-demand language installation, resizable selection areas, and live text overlays.
- Host: GitHub
- URL: https://github.com/carlosacchi/captiocr
- Owner: carlosacchi
- License: mit
- Created: 2025-02-07T14:10:44.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-02-17T20:39:59.000Z (about 1 month ago)
- Last Synced: 2026-02-18T01:35:06.079Z (about 1 month ago)
- Topics: captions, live, live-caption, live-captioning, live-captions, live-transcript, logging, ocr, ocr-python, ocr-recognition, saver, transcription
- Language: Python
- Homepage: https://www.captiocr.com
- Size: 809 KB
- Stars: 6
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π₯οΈ CaptiOCR - Real-Time Screen Text Extraction
[](https://github.com/CarloSacchi/CaptiOCR/releases/latest)
[](https://github.com/carlosacchi/captiocr/actions/workflows/github-code-scanning/codeql)
**CaptiOCR** is an open-source **real-time screen text extraction tool** designed to capture and transcribe captions (subtitles) from video conferencing applications like **Microsoft Teams**, **Zoom**, and **Google Meet**. With an intuitive interface and powerful OCR capabilities, you can select any screen area and extract text continuously in real-time.
---
## β¨ Key Features
β
**Real-time OCR processing** using [Tesseract OCR](https://github.com/tesseract-ocr/tesseract)
β
**Multi-language support** (English, Italian, French, German, Portuguese)
β
**Multi-monitor support** with DPI awareness
β
**Dynamic area selection** - drag, resize, and move capture areas during operation
β
**Text processing** - automatic duplicate removal and text cleaning
β
**Profile management** - save and load different configurations
β
**Hotkey support** - `Ctrl+Q` to stop capture
β
**Export options** - save captured text with custom naming
β
**Debug logging** for troubleshooting
β
**Modular architecture** - clean, maintainable codebase
---
## π οΈ Prerequisites
Before installation, ensure you have:
- β
**Python 3.9+** installed
- β
**Tesseract OCR** installed ([Download here](https://github.com/tesseract-ocr/tesseract))
- β
**Windows OS** (primary support)
---
## π¦ Installation
### **1οΈβ£ Clone the Repository**
```bash
git clone https://github.com/CarloSacchi/CaptiOCR.git
cd CaptiOCR
```
### **2οΈβ£ Install Python Dependencies**
```bash
pip install -r requirements.txt
```
### **3οΈβ£ Install Tesseract OCR**
**Windows users:**
Download and install Tesseract from the [official releases](https://github.com/tesseract-ocr/tesseract/releases).
The application will automatically detect standard installation paths.
---
## π Quick Start
Run the application:
```bash
python CaptiOCR.py
```
### **Basic Usage:**
1οΈβ£ **Select Language** - Choose your OCR language from the dropdown
2οΈβ£ **Click "Start (Select Area)"** - Open the area selection tool
3οΈβ£ **Drag to Select** - Draw a rectangle around the text area you want to capture
4οΈβ£ **Press ENTER** - Begin real-time text extraction
5οΈβ£ **Press Ctrl+Q or STOP** - End the capture session
6οΈβ£ **Name Your Capture** - Save with a custom filename
π **Output:** Captured text is saved in the `captures/` folder as timestamped `.txt` files.
---
## π― Advanced Features
### **Multi-Monitor Support**
- **Automatic detection** of all connected monitors
- **DPI awareness** for high-resolution displays
- **Cross-monitor selection** - capture areas spanning multiple screens
- **Monitor-specific positioning** for consistent setups
### **Dynamic Capture Areas**
- **Resizable borders** - adjust capture area during operation
- **Movable windows** - reposition without stopping capture
- **Multiple profiles** - save configurations for different applications
### **Text Processing**
- **Duplicate detection** - automatic removal of repeated text
- **Text cleaning** - remove artifacts and formatting issues
- **Processed output** - clean, readable transcriptions
### **Profile Management**
- **Save Settings** - store optimized configurations
- **Quick Load** - switch between saved profiles
- **Application-specific** - different settings for Teams, Zoom, Meet
---
## π‘ Tips & Best Practices
### **Optimizing OCR Accuracy**
- **Language Selection**: Choose the correct language model for best results with accents and special characters
- **Capture Area**: Select narrow, wide rectangles focusing on subtitle regions
- **Minimum Size**: Ensure capture areas are at least 50Γ50 pixels
- **Stable Areas**: Target regions where text appears consistently
### **Performance Optimization**
- **Close unnecessary applications** to reduce system load
- **Use specific language models** rather than auto-detection
- **Regular cleanup** of old capture files and logs
- **Monitor system resources** during extended capture sessions
---
## π Project Structure
```
CaptiOCR/
βββ CaptiOCR.py # Main application entry point
βββ captiocr/ # Core application modules
β βββ config/ # Settings and constants
β βββ core/ # OCR and capture logic
β βββ models/ # Data models
β βββ ui/ # User interface components
β βββ utils/ # Utilities and helpers
βββ captures/ # Saved text outputs
βββ config/ # User preferences
βββ tessdata/ # OCR language files
βββ logs/ # Application logs
βββ resources/ # Icons and assets
```
---
## π§ Configuration
The application uses JSON configuration files stored in `config/`:
- **User preferences** - UI settings, language choices
- **Language data** - Available OCR models
- **Capture profiles** - Saved area configurations
---
## π System Requirements
- **OS**: Windows 10/11 (primary), Linux/macOS (experimental)
- **RAM**: 4GB minimum, 8GB recommended
- **CPU**: Multi-core processor recommended for real-time processing
- **Display**: Support for multiple monitors with varying DPI
- **Storage**: 100MB+ for application and language files
---
## π Troubleshooting
### **Common Issues:**
- **OCR not working**: Verify Tesseract installation and PATH
- **Text not detected**: Check language selection and capture area size
- **Performance issues**: Close other applications, check system resources
- **Multi-monitor problems**: Update display drivers, check DPI settings
### **Debug Logging:**
Enable debug logging in the application settings to capture detailed operation information for troubleshooting.
---
## πΊοΈ Roadmap
### **Upcoming Features**
- π **Live translation** integration
- π **Cloud storage** synchronization
- π **Export formats** (PDF, HTML, Word)
- π **API integration** for external applications
- π **Dark mode** and theme customization
- π **Batch processing** capabilities
---
## π€ Contributing
We welcome contributions! Here's how to get started:
1. **Fork** the repository
2. **Create** a feature branch (`git checkout -b feature/amazing-feature`)
3. **Follow** the coding guidelines in `CLAUDE.md`
4. **Commit** your changes (`git commit -m 'Add amazing feature'`)
5. **Push** to the branch (`git push origin feature/amazing-feature`)
6. **Open** a Pull Request
### **Development Guidelines**
- Follow **PEP 8** Python style guide
- Use **type hints** and **docstrings**
- Maintain **modular architecture**
- Add **comprehensive logging**
- Update **version numbers** for functional changes
---
## π License
This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.
---
## π€ Author & Support
**Author:** Carlo Sacchi
**Website:** [https://www.captiocr.com](https://www.captiocr.com)
For support, feature requests, or bug reports, please open an issue on GitHub.
---
**β If CaptiOCR helps you, please consider giving it a star on GitHub!**