https://github.com/scanf-s/single_character_extract_in_image
PaddleOCR 및 OpenCV를 활용한 한국어 글자 추출
https://github.com/scanf-s/single_character_extract_in_image
korean opencv-python paddleocr python
Last synced: about 1 month ago
JSON representation
PaddleOCR 및 OpenCV를 활용한 한국어 글자 추출
- Host: GitHub
- URL: https://github.com/scanf-s/single_character_extract_in_image
- Owner: Scanf-s
- Created: 2025-03-17T15:48:15.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-05T02:49:40.000Z (about 1 year ago)
- Last Synced: 2025-06-10T06:50:59.751Z (12 months ago)
- Topics: korean, opencv-python, paddleocr, python
- Language: Jupyter Notebook
- Homepage:
- Size: 13.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# Korean Single Character Crop with PaddleOCR and OpenCV
## 📖 Overview
This project extracts individual Korean characters from images using **PaddleOCR** and **OpenCV**. It processes images by detecting bounding boxes, cropping characters, and refining them for further use, such as Optical Character Recognition (OCR) or dataset preparation.
### Key Features:
- **Bounding Box Detection**: Detects text regions using PaddleOCR.
- **Character Cropping**: Crops individual characters from detected bounding boxes.
- **Projection-Based Segmentation**: Separates characters within a bounding box using vertical projection.
- **Size Filtering**: Filters out characters that are too small to ensure quality.
- **Final Character Extraction**: Extracts and saves individual Korean characters with high confidence.
---
## 🛠️ Installation
### Prerequisites
- Python 3.8 or higher
- CUDA-enabled GPU (for PaddleOCR GPU version)
### Install Dependencies
1. Clone the repository:
```bash
git clone https://github.com/your-repo/korean-letter-crop.git
cd korean-letter-crop
```
2. Install required Python packages:
```bash
conda create -n ENVNAME --python=3.12.9
conda activate ENVNAME
pip install -r requirements.txt
```
---
## 🚀 Usage
1. Run the Program
To process an input image and extract Korean characters:
2. Arguments
-i: Path to the input image (default: ./inputs/image.png)
-o: Path to the output directory for extracted characters (default: ./final/)
3. Output
Cropped Images: Saved in the ./cropped/ directory.
Projection Images: Saved in the ./projection/ directory.
Filtered Characters: Saved in the ./best_size/ directory.
Final Characters: Saved in the ./final/ directory with filenames corresponding to the recognized characters.
---
## 📂 Project Structure
```text
korean-letter-crop/
├── inputs/ # Input images
├── cropped/ # Cropped bounding box images
├── projection/ # Projection-based segmented images
├── best_size/ # Filtered images based on size
├── final/ # Final extracted characters
├── [run.py] # Main script
├── [requirements.txt] # Python dependencies
└── [Readme.md] # Project documentation
```
---
## ⚙️ How It Works
1. Bounding Box Detection:
Uses PaddleOCR to detect text regions and generate bounding boxes.
2. Cropping:
Crops the detected bounding boxes and saves them as individual images.
3. Projection-Based Segmentation:
Uses vertical projection to separate characters within a bounding box.
4. Size Filtering:
Filters out characters that are too small to ensure quality.
5. Final Character Extraction:
Uses PaddleOCR to recognize and save individual Korean characters with high confidence.
---
## 🧪 Testing
To test the program with sample images:
1. Place your test images in the ./inputs/ directory.
2. Run the program
```bash
python run.py -i INPUT_IMAGE_PATH -o OUTPUT_DIR_PATH
```