https://github.com/sssairohit/enm
Excel Name Matching is a Python-based automation tool that standardizes names in an Excel file using fuzzy matching techniques. It ensures consistency for data processing, making it easier to use VLOOKUP and other operations.
https://github.com/sssairohit/enm
data-cleaning data-standardization excel-automation fuzzy-matching openpyxl pandas python record-linkage
Last synced: 3 months ago
JSON representation
Excel Name Matching is a Python-based automation tool that standardizes names in an Excel file using fuzzy matching techniques. It ensures consistency for data processing, making it easier to use VLOOKUP and other operations.
- Host: GitHub
- URL: https://github.com/sssairohit/enm
- Owner: sssairohit
- License: mit
- Created: 2025-04-03T13:41:13.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-04-03T13:46:34.000Z (9 months ago)
- Last Synced: 2025-06-04T07:50:04.769Z (7 months ago)
- Topics: data-cleaning, data-standardization, excel-automation, fuzzy-matching, openpyxl, pandas, python, record-linkage
- Language: Python
- Homepage:
- Size: 35.2 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Excel Name Matching
### Project Overview
This project standardizes names in an Excel file using fuzzy matching techniques.
### How It Works
- Reads `input.xlsx`, containing two sheets:
- **Sheet1**: Names that need correction.
- **Sheet2**: The correct reference names.
- Matches names in **Sheet1** to the closest correct name in **Sheet2**.
- Replaces names in **Sheet1** with the matched names from **Sheet2**.
- Saves the updated names to `output.xlsx`.
### Installation
1. Clone the repository or download the script.
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Ensure `input.xlsx` is placed in the working directory.
### Usage
Run the script with:
```bash
python main.py
```
After execution, check `output.xlsx` for updated names.
### File Structure
```
/excel_name_matching/
│── input.xlsx # Raw input file (Sheet1 & Sheet2)
│── output.xlsx # Processed file (after name correction)
│── main.py # Python script for name matching
│── requirements.txt # Dependencies list
│── README.md # Project documentation
```
### Configuration
- The script uses `rapidfuzz` for fuzzy name matching.
- Names are replaced **only if similarity is above 80%**.
- Modify this threshold in `main.py` if necessary.
- The script retains the original name if no close match is found.
### Example
#### Input
**Sheet1 (Names to Fix):**
```
2pi System Private Limited
Apple
Microsoft
```
**Sheet2 (Correct Names):**
```
2pi Systems
Apple Inc
Microsoft Corp
```
#### Output (`output.xlsx`):
```
2pi Systems
Apple Inc
Microsoft Corp
```
### Dependencies
This project requires:
```
pandas
openpyxl
rapidfuzz
```
Install them with:
```bash
pip install -r requirements.txt
```
### Contributions
Contributions are welcome. Modify and improve as needed.
### License
This project is licensed under the MIT License. See the `LICENSE` file for details.