https://github.com/sssairohit/enm

Excel Name Matching is a Python-based automation tool that standardizes names in an Excel file using fuzzy matching techniques. It ensures consistency for data processing, making it easier to use VLOOKUP and other operations.
https://github.com/sssairohit/enm

data-cleaning data-standardization excel-automation fuzzy-matching openpyxl pandas python record-linkage

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/sssairohit/enm
Owner: sssairohit
License: mit
Created: 2025-04-03T13:41:13.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-04-03T13:46:34.000Z (9 months ago)
Last Synced: 2025-06-04T07:50:04.769Z (7 months ago)
Topics: data-cleaning, data-standardization, excel-automation, fuzzy-matching, openpyxl, pandas, python, record-linkage
Language: Python
Homepage:
Size: 35.2 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Excel Name Matching

### Project Overview
This project standardizes names in an Excel file using fuzzy matching techniques.

### How It Works
- Reads `input.xlsx`, containing two sheets:
- **Sheet1**: Names that need correction.
- **Sheet2**: The correct reference names.
- Matches names in **Sheet1** to the closest correct name in **Sheet2**.
- Replaces names in **Sheet1** with the matched names from **Sheet2**.
- Saves the updated names to `output.xlsx`.

### Installation
1. Clone the repository or download the script.
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Ensure `input.xlsx` is placed in the working directory.

### Usage
Run the script with:
```bash
python main.py
```
After execution, check `output.xlsx` for updated names.

### File Structure
```
/excel_name_matching/
│── input.xlsx # Raw input file (Sheet1 & Sheet2)
│── output.xlsx # Processed file (after name correction)
│── main.py # Python script for name matching
│── requirements.txt # Dependencies list
│── README.md # Project documentation
```

### Configuration
- The script uses `rapidfuzz` for fuzzy name matching.
- Names are replaced **only if similarity is above 80%**.
- Modify this threshold in `main.py` if necessary.
- The script retains the original name if no close match is found.

### Example
#### Input
**Sheet1 (Names to Fix):**
```
2pi System Private Limited
Apple
Microsoft
```
**Sheet2 (Correct Names):**
```
2pi Systems
Apple Inc
Microsoft Corp
```

#### Output (`output.xlsx`):
```
2pi Systems
Apple Inc
Microsoft Corp
```

### Dependencies
This project requires:
```
pandas
openpyxl
rapidfuzz
```
Install them with:
```bash
pip install -r requirements.txt
```

### Contributions
Contributions are welcome. Modify and improve as needed.

### License
This project is licensed under the MIT License. See the `LICENSE` file for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sssairohit/enm

Awesome Lists containing this project

README