https://github.com/sagorbrur/itranslit
transliteration for indic language
https://github.com/sagorbrur/itranslit
bengali-transliteration deep-learning gujarati-transliteration hindi-transliteration indic-languages indic-transliteration malayalam-transliteration nlp punjabi-transliteration pytorch tamil-transliteration transliteration urdu-transliteration
Last synced: about 2 months ago
JSON representation
transliteration for indic language
- Host: GitHub
- URL: https://github.com/sagorbrur/itranslit
- Owner: sagorbrur
- License: mit
- Created: 2021-04-29T11:48:13.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-05-29T18:51:55.000Z (about 4 years ago)
- Last Synced: 2025-03-24T17:11:28.800Z (2 months ago)
- Topics: bengali-transliteration, deep-learning, gujarati-transliteration, hindi-transliteration, indic-languages, indic-transliteration, malayalam-transliteration, nlp, punjabi-transliteration, pytorch, tamil-transliteration, transliteration, urdu-transliteration
- Language: Python
- Homepage:
- Size: 37.1 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: license
Awesome Lists containing this project
README
# iTRANSLIT
__iTRANSLIT__ is a deep learning based transliteration package for indic language## Installation
```pip install itranslit```## Dependency
- `pytorch 1.7.0 or 1.7.0+`NB: No `GPU` need. It's `CPU` based.
## Supported Language and Language Code
| __Language Name__ | __Langauage Code__ |
| --- | :-- |
| Bangla | bn |
| Gujarati| gu |
| Hindi | hi |
| Punjabi | pa |
| Sindhi | sd |
| Urdu | ur |
| Malayalam | ml |
| Tamil | ta |
| | |## API
```py
from itranslit import Translittranslit = Translit('bn')
word = "aami"
output = translit.predict(word, topk=10)
print(output)```
## Datasets and Training Details
- We used [Google Dakshina Dataset](https://github.com/google-research-datasets/dakshina)
- Thanks to [AI4Bharat](https://github.com/AI4Bharat/IndianNLP-Transliteration) for providing training notebook with details explanation
- We trained Google Dakshina lexicons train datasets for 10 epochs with batch size 128, 1e-3, embedding dim = 300, hidden dim = 512, lstm, used attention
- We evaluated our trained model with Google Dakshina lexicon test data using [AI4Bharat evaluation script](https://raw.githubusercontent.com/AI4Bharat/IndianNLP-Transliteration/jgeob-dev/tools/accuracy_reporter/accuracy_news.py)
- You can find evaluation summary [here](docs/evaluations)