https://github.com/roddar92/russian_soundex
Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone
https://github.com/roddar92/russian_soundex
computational-linguistics metaphone phonetic-algorithms phonetics phonological-rules russian-nlp russian-phonetic-algorithm soundex
Last synced: 5 months ago
JSON representation
Russian/English/Estonian/Finnish/Swedish phonetic algorithm based on Soundex and Metaphone
- Host: GitHub
- URL: https://github.com/roddar92/russian_soundex
- Owner: roddar92
- License: mit
- Created: 2018-12-24T14:22:50.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2025-03-01T18:46:09.000Z (over 1 year ago)
- Last Synced: 2025-09-28T00:23:19.700Z (8 months ago)
- Topics: computational-linguistics, metaphone, phonetic-algorithms, phonetics, phonological-rules, russian-nlp, russian-phonetic-algorithm, soundex
- Language: Python
- Homepage:
- Size: 83 KB
- Stars: 52
- Watchers: 4
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Fonetika
Russian, English, Sweden, Estonian and Finnish Phonetic algorithm based on Soundex/Metaphone.
Package has both implemented phoneme transformation into letter-number sequence and distance engine for comparison of phonetic sequences (based on Levenstein and Hamming distances).
Furthermore, both Russian phonetic algorithms supports preprocessing for specific phoneme cases.
### Quick start
1. Install this package via ```pip```
```python
pip install fonetika
```
2. Import Soundex algorithm.
Package supports a lot of opportunities, it's possible to cut a result sequence (like in the original Soundex version) or also code vowels.
```python
from fonetika.soundex import RussianSoundex
soundex = RussianSoundex(delete_first_letter=True)
soundex.transform('ёлочка')
...
J070530
soundex = RussianSoundex(delete_first_letter=True, code_vowels=True)
soundex.transform('ёлочка')
...
JA7A53A
```
> A structure of the library is scalable, `RussianSoundex` class inherits basic class `Soundex` (original for English language). In order to extend our algorithm, you need just inherit own class from `Soundex` and override methods.
3. Import Soundex distance for usage of string comparision
```python
from fonetika.distance import PhoneticsInnerLanguageDistance
soundex = RussianSoundex(delete_first_letter=True)
phon_distance = PhoneticsInnerLanguageDistance(soundex)
phon_distance.distance('ёлочка', 'йолочка')
...
0
```
4. You can also calculate distance between words of two languages. It would be useful for working with one language family group.
```python
from fonetika.distance import PhoneticsBetweenLanguagesDistance
m1 = FinnishMetaphone(reduce_word=False)
m2 = EstonianMetaphone(reduce_word=False)
phon_distance = PhoneticsBetweenLanguagesDistance(m1, m2)
phon_distance.distance('yö', 'öö')
...
1
```