https://github.com/therealprohacker/dmg
An application that helps orientalists, arabists and Islamic scholars by providing automatic transliteration.
https://github.com/therealprohacker/dmg
arabic arabic-nlp digital-humanities flask ijmes nlp python transliteration
Last synced: 10 months ago
JSON representation
An application that helps orientalists, arabists and Islamic scholars by providing automatic transliteration.
- Host: GitHub
- URL: https://github.com/therealprohacker/dmg
- Owner: theRealProHacker
- Created: 2024-01-16T14:13:21.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-31T16:56:39.000Z (over 1 year ago)
- Last Synced: 2025-01-31T17:40:46.270Z (over 1 year ago)
- Topics: arabic, arabic-nlp, digital-humanities, flask, ijmes, nlp, python, transliteration
- Language: Python
- Homepage: https://transliteration.eu.pythonanywhere.com/
- Size: 7.91 MB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
Awesome Lists containing this project
README
# DMG
DMG is an application that aims to give ease to orientalists, arabists and Islamic scholars in the German-speaking area by providing automatic transliteration.
The app is currently deployed on PythonAnywhere (https://transliteration.eu.pythonanywhere.com). Try it out now!
# Local deployment
```shell
git clone https://github.com/theRealProHacker/dmg.git
pip install -r requirements.txt
py app.py
```
Now navigate to the localhost URL provided (http://localhost:5000)
# Features
- Generally letter replacement
- Hyphenation of particles wa-, fa-, sa-, bi-, li-, ka-, al-
- Sun letter assimilation الشَمس: aš-šams
- Detection of idafah and transliteration of ta marbutah as either "t" or "h"/"" accordingly:
- مَكتَبَةُ كَبيرَة: maktaba kabīra or maktabah kabīrah
- but مَكتَبَةُ الأُستاذِ: maktabat al-ustāḏ
- Consideration of hamzatul wasl
- انْكَسَرَ: inkasara
- الَّذينَ: allaḏīna
- اخرُج: uḫruǧ
- فَانتَقَلَ: fa-ntaqala
- هُم الكُتّاب: hum ul-kuttāb
- عَن الْكِتابُ: ʿan il-kitāb
- Inseration of missing alifs: هَذَا: hāḏā
- Removal of silent letters: قَلوا: qalū
- Nisba as ī: al-ʿarabī
- Many different (boolean) settings
- whether the text should be transliterated in **pause** or not
- whether **ta marbutah** should be transliterated as h or not
- whether **diphthongs** should be transliterated as ai/au or ay/aw
- whether **geminated half vowels** should be transliterated iyy/uww or īy/ūw
- whether iyy/uww at the end should always become **ī/ū**
- whether a **hamza in the first position** should be transliterated
- whether the personal suffixes **-hu and -hi** should be transliterated as they are pronounced
- Automatic vocalization
# Contributing
Please don't hesitate to raise an issue or to make a pull request. More information for contributing can be found [here](CONTRIBUTING.md)
# License
The software is licensed under CC BY-NC except for the directories `arab_tools` and `data` which are licensed under GPL v3.
However, the license for `data/ner-gold-standard` is in that directory.
# Reception
> "The program seems almost perfect"
> "Great work, congratulations!"
> "I want to congratulate you and thank you. Many scientists and students of Islamic(-theologic) studies will profit greatly from your work"
# Bachelor thesis
This project is part of my bachelor thesis that will be published soon.