https://github.com/mastermindromii/pan-number-validation-project-using-mysql
This project is designed to validate Indian PAN numbers using MySQL Workbench. It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.
https://github.com/mastermindromii/pan-number-validation-project-using-mysql
create-view cte function-sql mysql regex
Last synced: 8 months ago
JSON representation
This project is designed to validate Indian PAN numbers using MySQL Workbench. It includes SQL scripts to clean, validate, and categorize PAN numbers as Valid or Invalid based on multiple business rules and regex patterns.
- Host: GitHub
- URL: https://github.com/mastermindromii/pan-number-validation-project-using-mysql
- Owner: MasterMindRomii
- License: mit
- Created: 2025-08-23T18:35:52.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-08-23T18:52:51.000Z (9 months ago)
- Last Synced: 2025-08-24T07:23:11.222Z (9 months ago)
- Topics: create-view, cte, function-sql, mysql, regex
- Homepage:
- Size: 111 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ๐ชช PAN Number Validation Project using MySQL
This project is designed to **validate Indian PAN numbers** using **MySQL Workbench**.
It includes SQL scripts to clean, validate, and categorize PAN numbers as **Valid** or **Invalid** based on multiple business rules and regex patterns.
---
## ๐ Project Overview
The **Permanent Account Number (PAN)** is a 10-character alphanumeric identifier issued by the Indian Income Tax Department.
To ensure correctness of data, this project:
- Cleans raw PAN number datasets
- Identifies missing, duplicate, or incorrectly formatted PANs
- Uses **custom MySQL functions** to detect invalid patterns (e.g., sequential/repeated characters)
- Classifies PAN numbers into **Valid** and **Invalid**
- Generates a **summary report** for quick insights
---
## ๐ ๏ธ Features
- Data cleaning (trimming, uppercase conversion, duplicate removal)
- Validation rules (regex-based format checks)
- Custom MySQL functions for:
- Detecting **adjacent character repetition**
- Detecting **sequential characters**
- Creation of a **validation view** for quick results
- **Summary report** with counts of valid, invalid, and missing PANs
---
## ๐ Dataset
The project assumes an input dataset with one column:
| Column Name | Description |
|--------------|---------------------------|
| `pan_number` | Raw PAN numbers (string) |
Example:
| pan_number |
|-------------|
| ABCDE1234F |
| xyz 1234p |
| AA1111AA1A |
| (NULL) |
---
## ๐งน Data Cleaning Steps
1. Handle missing data โ Remove NULL or empty PAN numbers
2. Check duplicates โ Identify duplicate PANs
3. Trim spaces โ Remove leading/trailing spaces
4. Correct case โ Convert to UPPERCASE
5. Create cleaned table โ Store cleaned dataset
---
## ๐ Validation Logic
### 1๏ธโฃ Regex Format Rule
PAN must match:
^[A-Z]{5}[0-9]{4}[A-Z]$
### 2๏ธโฃ Custom Functions
- **fn_check_adjacent_repetition()** โ Ensures no two adjacent characters are the same
- **fn_check_sequence()** โ Detects sequential patterns like `ABCDE`, `1234`
### 3๏ธโฃ Final Categorization
- **Valid PAN** โ Matches regex + no repetition + no sequences
- **Invalid PAN** โ Fails any rule
---
## ๐ Outputs
### View: `vw_valid_invalid_pans`
| pan_number | status |
|-------------|-------------|
| ABCDE1234F | Valid PAN |
| AAAAA1111A | Invalid PAN |
| XYZ1234P | Invalid PAN |
---
### ๐ Summary Report Example
| total_processed_records | total_valid_pans | total_invalid_pans | missing_incomplete_pans |
|--------------------------|------------------|--------------------|--------------------------|
| 10 | 7 | 2 | 1 |
---
## ๐ Usage Guide
1. Clone this repository:
```bash
git clone https://github.com/MasterMindRomii/PAN-Number-Validation-Project-using-MySQL.git
๐ค Contribution
Pull requests are welcome! If you find issues or want to add new validation rules, feel free to fork and contribute.
๐ License
This project is licensed under the MIT License.
๐จโ๐ป Author
Romi Gupta
๐ผ Data Analyst | SQL | Power BI | Python | Excel
๐ง romigupta1875@gmail.com