https://github.com/aryamanarora/schwa-deletion
Code for the ACL 2020 Paper on Schwa Deletion in Hindi and Punjabi
https://github.com/aryamanarora/schwa-deletion
acl2020 hindi indic-languages nlp-machine-learning punjabi schwa-deletion transliterate-hindi transliterate-punjabi
Last synced: 6 months ago
JSON representation
Code for the ACL 2020 Paper on Schwa Deletion in Hindi and Punjabi
- Host: GitHub
- URL: https://github.com/aryamanarora/schwa-deletion
- Owner: aryamanarora
- Created: 2019-08-06T18:50:44.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2023-10-30T00:47:04.000Z (almost 2 years ago)
- Last Synced: 2025-03-26T10:48:10.910Z (7 months ago)
- Topics: acl2020, hindi, indic-languages, nlp-machine-learning, punjabi, schwa-deletion, transliterate-hindi, transliterate-punjabi
- Language: Python
- Homepage:
- Size: 84.7 MB
- Stars: 17
- Watchers: 4
- Forks: 5
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# schwa-deletion
Machine learning models for [schwa deletion](https://en.wikipedia.org/wiki/Schwa_deletion_in_Indo-Aryan_languages) in Hindi and Punjabi.
Pre-generated models, which achieve state-of-the-art performance, using scikit-learn's `MLPClassifier` and `LogisticRegression`, as well as XGBoost's `XGBClassifier` are included in the `models` subfolder in each language's directory.
The results of this research are presented in the paper below:
> "Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi", Aryaman Arora, Luke Gessler, and Nathan Schneider (2020). In *Proceedings of ACL*. Preprint:
## Usage
Ensure that you are using the most recent Python 3 version.
Clone repo and install requirements:
```bash
git clone https://github.com/aryamanarora/schwa-deletion.git
cd schwa-deletion
pip install -r requirements.txt
```Testing the pretrained Hindi XGBoost model:
```bash
cd hindi
python test.py
```You can see `test.py` for an idea of how to use the `main.py` script as a module.