https://github.com/kressety/truefuzzymatch
TrueFuzzyMatch is a powerful tool for fuzzy matching material names between two Excel tables (A and B) using text embeddings generated by the Ollama API.
https://github.com/kressety/truefuzzymatch
embeddings excel ollama openpyxl
Last synced: about 2 months ago
JSON representation
TrueFuzzyMatch is a powerful tool for fuzzy matching material names between two Excel tables (A and B) using text embeddings generated by the Ollama API.
- Host: GitHub
- URL: https://github.com/kressety/truefuzzymatch
- Owner: kressety
- License: mit
- Created: 2025-03-03T11:41:31.000Z (about 2 months ago)
- Default Branch: master
- Last Pushed: 2025-03-03T12:34:00.000Z (about 2 months ago)
- Last Synced: 2025-03-03T13:21:56.826Z (about 2 months ago)
- Topics: embeddings, excel, ollama, openpyxl
- Language: Python
- Homepage:
- Size: 41 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# TrueFuzzyMatch




**TrueFuzzyMatch** is a powerful tool for fuzzy matching material names between two Excel tables (A and B) using text embeddings generated by the Ollama API. It computes embeddings for material names in Table A, matches them with Table B based on cosine similarity, and outputs the results with matched codes and similarity scores. This project is designed for cross-platform use and provides pre-built executables for Windows, macOS, and Ubuntu.
## Features
- **Embedding Generation**: Compute text embeddings for material names in Table A using Ollama models.
- **Fuzzy Matching**: Match material names in Table B to Table A with cosine similarity.
- **Cross-Platform**: Pre-built binaries for Windows, macOS, and Ubuntu via GitHub Releases.
- **Multithreaded**: Efficient parallel processing for embedding and similarity computation.
- **Rich CLI**: Interactive terminal interface with `rich` for a polished user experience.## Prerequisites
To run this tool, you need:
1. **Ollama Service**: A running Ollama instance (default: `http://localhost:11434`). Install it from [Ollama's official site](https://ollama.ai/).
2. **Excel Files**: Prepare your `.xlsx` files (Table A and Table B) with material names.For development or building from source, see the [Development](#development) section.
## Installation
### Option 1: Download Pre-Built Executables
1. Visit the [Releases page](https://github.com/kressety/TrueFuzzyMatch/releases).
2. Download the latest version for your operating system:
- `EmbeddingTask-vX.X.X-windows.exe` for Windows
- `EmbeddingTask-vX.X.X-macos` for macOS
- `EmbeddingTask-vX.X.X-ubuntu` for Ubuntu
3. Place the executable in a folder with your `.xlsx` files.### Option 2: Run from Source
1. Clone the repository:
```bash
git clone https://github.com/kressety/TrueFuzzyMatch.git
cd TrueFuzzyMatch
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the script:
```bash
python main.py
```## Usage
1. **Place Excel Files**: Ensure your `.xlsx` files (e.g., `A_table.xlsx` and `B_table.xlsx`) are in the same directory as the executable or script.
2. **Run the Tool**:
- **Pre-built**: Double-click the executable (Windows) or run in terminal:
```bash
./EmbeddingTask-vX.X.X-
```
- **Source**: `python main.py`
3. **Follow Prompts**:
- Select an Ollama model from the list.
- Choose a task:
1. **Generate A Table Embeddings**: Compute and save embeddings for Table A (`A_table_embeddings_*.npy`).
2. **Match B Table Materials**: Match Table B to Table A and save results to `O-B_table.xlsx`.
3. **Exit**.
- Select files and columns as prompted.### Example Workflow
- **Table A**: Contains material names and optional codes (e.g., `物料编码`, `计量单位编码`).
- **Table B**: Contains material names to match against Table A.
- **Output**: `O-B_table.xlsx` with added columns: `MaterialCode`, `UnitCode`, `MatchedMaterialName`, `Similarity`.## Development
### Requirements
- Python 3.12
- Dependencies: `numpy`, `pandas`, `requests`, `rich`, `tqdm`, `pyinstaller` (for building)Install with:
```bash
pip install -r requirements.txt
```### Building Executables
Use PyInstaller to build locally:
```bash
pyinstaller -i Embeddings.png -n EmbeddingTask --optimize 2 -F main.py
```For automated builds, the GitHub Actions workflow (`build.yml`) compiles executables for all platforms on tag push (e.g., `v1.0.6`).
## GitHub Actions
The project uses a CI/CD pipeline to build and release executables:
- **Trigger**: Push a tag (e.g., `git tag v1.0.7 && git push --tags`).
- **Platforms**: Ubuntu, Windows, macOS.
- **Output**: Uploaded to GitHub Releases.See [build-and-release.yml](.github/workflows/build-and-release.yml) for details.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Contributing
Contributions are welcome! Please:
1. Fork the repository.
2. Create a feature branch (`git checkout -b feature/xyz`).
3. Commit changes (`git commit -m "Add xyz"`).
4. Push to the branch (`git push origin feature/xyz`).
5. Open a Pull Request.## Acknowledgments
- Built with [Ollama](https://ollama.ai/) for embedding generation.
- UI powered by [Rich](https://github.com/Textualize/rich).
- Multithreading with Python's `concurrent.futures`.