https://github.com/mapado/passjoin
Python implementation of the Pass-join algorithm
https://github.com/mapado/passjoin
Last synced: about 12 hours ago
JSON representation
Python implementation of the Pass-join algorithm
- Host: GitHub
- URL: https://github.com/mapado/passjoin
- Owner: mapado
- License: mit
- Created: 2020-02-03T10:27:07.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2023-09-15T07:53:27.000Z (over 2 years ago)
- Last Synced: 2025-09-30T17:54:06.242Z (9 months ago)
- Language: Python
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Passjoin
Python implementation of the Pass-join index.
This index allows to efficiently query similar words within a distance threshold.
The implementation is based on this [paper](http://people.csail.mit.edu/dongdeng/papers/vldb2012-passjoin.pdf) and the existing Javascript implementation in the mnemoist package ([link](https://github.com/Yomguithereal/mnemonist)).
## Installation
```bash
$ pip install passjoin
```
## Usage
### Index creation
```python
from passjoin import Passjoin
from Levenshtein import distance # or any string distance function
max_edit_distance = 1 # maximum edit distance for retrieval
corpus = ['pierre', 'pierr', 'jean', 'jeanne']
passjoin_index = Passjoin(corpus, max_edit_distance, distance)
```
### Index querying
```python
passjoin_index.get_word_variations('pierre')
>> {'pierre', 'pierr'}
passjoin_index.get_word_variations('jeann')
>> {'jean', 'jeanne'}
passjoin_index.get_word_variations('jeanine')
>> {'jeanne'}
```
## Contributing
Clone the project.
Install [pipenv](https://github.com/pypa/pipenv).
Run `pipenv install --dev`
Launch test with `pipenv run pytest`