An open API service indexing awesome lists of open source software.

https://github.com/mastermindromii/pan-number-validation-project-using-mysql

This project is designed to validate Indian PAN numbers using MySQL Workbench. It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.
https://github.com/mastermindromii/pan-number-validation-project-using-mysql

create-view cte function-sql mysql regex

Last synced: 8 months ago
JSON representation

This project is designed to validate Indian PAN numbers using MySQL Workbench. It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.

Awesome Lists containing this project

README

          

# ๐Ÿชช PAN Number Validation Project using MySQL

This project is designed to **validate Indian PAN numbers** using **MySQL Workbench**.
It includes SQL scripts to clean, validate, and categorize PAN numbers as **Valid** or **Invalid** based on multiple business rules and regex patterns.

---

## ๐Ÿ“Œ Project Overview
The **Permanent Account Number (PAN)** is a 10-character alphanumeric identifier issued by the Indian Income Tax Department.
To ensure correctness of data, this project:
- Cleans raw PAN number datasets
- Identifies missing, duplicate, or incorrectly formatted PANs
- Uses **custom MySQL functions** to detect invalid patterns (e.g., sequential/repeated characters)
- Classifies PAN numbers into **Valid** and **Invalid**
- Generates a **summary report** for quick insights

---

## ๐Ÿ› ๏ธ Features
- Data cleaning (trimming, uppercase conversion, duplicate removal)
- Validation rules (regex-based format checks)
- Custom MySQL functions for:
- Detecting **adjacent character repetition**
- Detecting **sequential characters**
- Creation of a **validation view** for quick results
- **Summary report** with counts of valid, invalid, and missing PANs

---

## ๐Ÿ“‚ Dataset
The project assumes an input dataset with one column:

| Column Name | Description |
|--------------|---------------------------|
| `pan_number` | Raw PAN numbers (string) |

Example:

| pan_number |
|-------------|
| ABCDE1234F |
| xyz 1234p |
| AA1111AA1A |
| (NULL) |

---

## ๐Ÿงน Data Cleaning Steps
1. Handle missing data โ†’ Remove NULL or empty PAN numbers
2. Check duplicates โ†’ Identify duplicate PANs
3. Trim spaces โ†’ Remove leading/trailing spaces
4. Correct case โ†’ Convert to UPPERCASE
5. Create cleaned table โ†’ Store cleaned dataset

---

## ๐Ÿ”Ž Validation Logic
### 1๏ธโƒฃ Regex Format Rule
PAN must match:

^[A-Z]{5}[0-9]{4}[A-Z]$

### 2๏ธโƒฃ Custom Functions
- **fn_check_adjacent_repetition()** โ†’ Ensures no two adjacent characters are the same
- **fn_check_sequence()** โ†’ Detects sequential patterns like `ABCDE`, `1234`

### 3๏ธโƒฃ Final Categorization
- **Valid PAN** โ†’ Matches regex + no repetition + no sequences
- **Invalid PAN** โ†’ Fails any rule

---

## ๐Ÿ“Š Outputs
### View: `vw_valid_invalid_pans`
| pan_number | status |
|-------------|-------------|
| ABCDE1234F | Valid PAN |
| AAAAA1111A | Invalid PAN |
| XYZ1234P | Invalid PAN |

---

### ๐Ÿ“ˆ Summary Report Example
| total_processed_records | total_valid_pans | total_invalid_pans | missing_incomplete_pans |
|--------------------------|------------------|--------------------|--------------------------|
| 10 | 7 | 2 | 1 |

---

## ๐Ÿš€ Usage Guide
1. Clone this repository:
```bash
git clone https://github.com/MasterMindRomii/PAN-Number-Validation-Project-using-MySQL.git

๐Ÿค Contribution

Pull requests are welcome! If you find issues or want to add new validation rules, feel free to fork and contribute.

๐Ÿ“œ License

This project is licensed under the MIT License.

๐Ÿ‘จโ€๐Ÿ’ป Author

Romi Gupta
๐Ÿ’ผ Data Analyst | SQL | Power BI | Python | Excel
๐Ÿ“ง romigupta1875@gmail.com