https://github.com/abhirooptalasila/rc-text-detection

Detect text on a processed image of a RC using AWS Rekognition
https://github.com/abhirooptalasila/rc-text-detection

aws clahe ocr regex rekognition text-detection text-processing

Last synced: 3 months ago
JSON representation

Detect text on a processed image of a RC using AWS Rekognition

Host: GitHub
URL: https://github.com/abhirooptalasila/rc-text-detection
Owner: abhirooptalasila
Created: 2020-05-01T10:29:38.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-05-01T10:30:43.000Z (about 5 years ago)
Last Synced: 2025-01-20T06:42:09.012Z (5 months ago)
Topics: aws, clahe, ocr, regex, rekognition, text-detection, text-processing
Language: Jupyter Notebook
Size: 243 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text Detection on a Registration Certificate

A registration certificate or RC is an official document stating that your vehicle is registered with the Indian government. I've used [AWS Rekognition](https://aws.amazon.com/rekognition/) to detect text after performing some preprocessing steps.

The task is to recognize the following fields from a picture of a RC.
- License plate number or Regn number
- VIN number or Chassis number (typically 17 digit long)
- Name
- Engine number
- Registration date
- Mfg. date

### Unprocessed and Processed Picture

### Output

```python
{'license_plate': 'DL2CAU7997',
'reg_date': '02/05/2015',
'chass_num': 'MA3FLEB1S00309631',
'eng_num': 'D13A2554860',
'name': 'MANJEET SINGH'}
```

---
[```pre_process.py```](pre_process.py) contains the script to process images where I first convert the image into LAB color space and then apply the **Contrast Limited Adaptive Histogram Equalization (CLAHE)** method on it.

The LAB color space goes about defining colors differently. Whereas RGB defines color by a combination of red, green, and blue values of different shades, LAB uses three different channels. They are: Lightness, something called the A Channel, and the B Channel. Hence, Lightness, A Channel, and B Channel are shortened to L-A-B, LAB.

Adaptive histogram equalization (AHE) is a computer image processing technique used to improve contrast in images. It differs from ordinary histogram equalization in the respect that the adaptive method computes several histograms, each corresponding to a distinct section of the image, and uses them to redistribute the lightness values of the image. It is therefore suitable for improving the local contrast and enhancing the definitions of edges in each region of an image.

However, **AHE has a tendency to overamplify noise** in relatively homogeneous regions of an image. A variant of adaptive histogram equalization called contrast limited adaptive histogram equalization **(CLAHE) prevents this by limiting the amplification**.

These images are then uploaded to AWS.

```bash
aws lambda invoke --function-name "mynewtext" --log-type Tail --payload file://~Downloads/RC/data.json ~/Downloads/RC/output.json
```

[```Get Details.ipynb```](Get%20Details.ipynb) uses **Regex** to extract the required fields from the JSON file and store them in a dictionary as a key-value pair

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abhirooptalasila/rc-text-detection

Awesome Lists containing this project

README