https://github.com/sagorbrur/itranslit

transliteration for indic language
https://github.com/sagorbrur/itranslit

bengali-transliteration deep-learning gujarati-transliteration hindi-transliteration indic-languages indic-transliteration malayalam-transliteration nlp punjabi-transliteration pytorch tamil-transliteration transliteration urdu-transliteration

Last synced: about 2 months ago
JSON representation

transliteration for indic language

Host: GitHub
URL: https://github.com/sagorbrur/itranslit
Owner: sagorbrur
License: mit
Created: 2021-04-29T11:48:13.000Z (about 4 years ago)
Default Branch: master
Last Pushed: 2021-05-29T18:51:55.000Z (about 4 years ago)
Last Synced: 2025-03-24T17:11:28.800Z (2 months ago)
Topics: bengali-transliteration, deep-learning, gujarati-transliteration, hindi-transliteration, indic-languages, indic-transliteration, malayalam-transliteration, nlp, punjabi-transliteration, pytorch, tamil-transliteration, transliteration, urdu-transliteration
Language: Python
Homepage:
Size: 37.1 KB
Stars: 4
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: license

Awesome Lists containing this project

README

        # iTRANSLIT

__iTRANSLIT__ is a deep learning based transliteration package for indic language

## Installation

```pip install itranslit```

## Dependency

- `pytorch 1.7.0 or 1.7.0+`

NB: No `GPU` need. It's `CPU` based.

## Supported Language and Language Code

| __Language Name__ | __Langauage Code__ |

| --- | :-- |

| Bangla | bn |

| Gujarati| gu |

| Hindi | hi |

| Punjabi | pa |

| Sindhi | sd |

| Urdu | ur |

| Malayalam | ml |

| Tamil | ta |

|    |      |

## API

```py

from itranslit import Translit

translit = Translit('bn')

word = "aami"

output = translit.predict(word, topk=10)

print(output)

```

## Datasets and Training Details

- We used [Google Dakshina Dataset](https://github.com/google-research-datasets/dakshina)

- Thanks to [AI4Bharat](https://github.com/AI4Bharat/IndianNLP-Transliteration) for providing training notebook with details explanation

- We trained Google Dakshina lexicons train datasets for 10 epochs with batch size 128, 1e-3, embedding dim = 300, hidden dim = 512, lstm, used attention

- We evaluated our trained model with Google Dakshina lexicon test data using [AI4Bharat evaluation script](https://raw.githubusercontent.com/AI4Bharat/IndianNLP-Transliteration/jgeob-dev/tools/accuracy_reporter/accuracy_news.py)

- You can find evaluation summary [here](docs/evaluations)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sagorbrur/itranslit

Awesome Lists containing this project

README