Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/joom/dilacar
A rule-based machine translation system from Ottoman Turkish to Modern Turkish.
https://github.com/joom/dilacar
computational-linguistics historical-linguistics machine-translation ottoman rule-based turkish turkish-language
Last synced: about 1 month ago
JSON representation
A rule-based machine translation system from Ottoman Turkish to Modern Turkish.
- Host: GitHub
- URL: https://github.com/joom/dilacar
- Owner: joom
- License: mit
- Created: 2019-02-08T22:57:24.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-07-08T03:15:35.000Z (over 4 years ago)
- Last Synced: 2024-08-03T18:19:53.974Z (4 months ago)
- Topics: computational-linguistics, historical-linguistics, machine-translation, ottoman, rule-based, turkish, turkish-language
- Language: TeX
- Homepage:
- Size: 313 KB
- Stars: 20
- Watchers: 5
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-turkish-nlp - joom/dilacar - A rule-based machine translation system from Ottoman Turkish to Modern Turkish. (Libraries / Haskell)
README
# dilacar
Machine translation system from Ottoman Turkish to Modern Turkish.Initially the final project for COS401 Introduction to Machine Translation with Srinivas Bangalore at Princeton University. The report for the project can be found [here](http://www.cs.princeton.edu/~ckorkut/papers/ottoman.pdf).
Named after Turkish-Armenian linguist Agop Dilâçar (born Martayan), who is known for his work on the Turkish language and who was given the last name Dilâçar (literally meaning "language opener") by Mustafa Kemal Atatürk.
## Installation
I'm currently having a problem building the executable on my machine, but running the REPL works:
```
stack ghci
```And then you can run the program as such:
```
λ> :set +s
λ> run "ياپراقلر باخچه\8204لردن دوشمش."
yapraklar
bahçelerden
düşmüş, duşmuş, dövüşmüş, döşmüş
.
(2.61 secs, 15,223,379,696 bytes)
```## Accuracy
This is still work in progress. Many, many of the suffixes in Turkish language are missing from the program, but the mechanism to add them is there. Turkish has a very high number of suffixes, especially derivative ones. With more time put into the project, its accuracy should go up.