https://github.com/pythonicshariful/phone-number-extractor
A Python script that extracts phone numbers from images using Tesseract OCR and Regex. Automatically organizes processed images into success and failed folders, and saves results to a CSV file.
https://github.com/pythonicshariful/phone-number-extractor
automation dataextraction ocr-python phonenumberextraction pytesseract python regex
Last synced: 9 days ago
JSON representation
A Python script that extracts phone numbers from images using Tesseract OCR and Regex. Automatically organizes processed images into success and failed folders, and saves results to a CSV file.
- Host: GitHub
- URL: https://github.com/pythonicshariful/phone-number-extractor
- Owner: pythonicshariful
- Created: 2025-09-30T08:04:31.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-09-30T08:09:08.000Z (8 months ago)
- Last Synced: 2025-09-30T10:09:45.788Z (8 months ago)
- Topics: automation, dataextraction, ocr-python, phonenumberextraction, pytesseract, python, regex
- Language: Python
- Homepage:
- Size: 84 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Phone Number Extractor from Images
This Python script uses **Tesseract OCR** and **Regex** to extract phone numbers from images.
It processes all images inside an `images` folder and saves results into a CSV file.
- ✅ If phone numbers are found → image is moved to `success/`
- ❌ If no phone number or error → image is moved to `failed/`
- 📊 A `numbers.csv` file is generated with extracted numbers and source filenames.
- 🔄 The main `images` folder will be emptied after processing.
---
## 📂 Folder Structure
```
project/
│── main.py
│── README.md
│── numbers.csv (generated automatically)
│── images/ # put your input images here
│── success/ # created automatically
│── failed/ # created automatically
```
---
## ⚙️ Installation
1. Clone the repository:
```bash
git clone https://github.com/pythonicshariful/phone-number-extractor.git
cd phone-number-extractor
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Install **Tesseract OCR**:
- Windows: [Download here](https://github.com/tesseract-ocr/tesseract/wiki)
- Linux (Ubuntu/Debian):
```bash
sudo apt install tesseract-ocr
```
- macOS (Homebrew):
```bash
brew install tesseract
```
5. Update the path to `tesseract.exe` inside `main.py` if needed:
```python
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
```
---
## ▶️ Usage
1. Place your images inside the `images/` folder.
Supported formats: `.png`, `.jpg`, `.jpeg`
2. Run the script:
```bash
python main.py
```
3. Results:
- Extracted phone numbers → `numbers.csv`
- Successfully processed images → `success/`
- Failed images → `failed/`
---
## 🛠 Requirements
- Python 3.8+
- Tesseract OCR
- Python libraries:
```txt
pillow
pytesseract
tqdm
```
Install them with:
```bash
pip install pillow pytesseract tqdm
```
---
## 📜 License
MIT License