https://github.com/linuxscout/miknaaz
Generate arabic golden standard corpus for morphology and stemming
https://github.com/linuxscout/miknaaz
Last synced: 3 months ago
JSON representation
Generate arabic golden standard corpus for morphology and stemming
- Host: GitHub
- URL: https://github.com/linuxscout/miknaaz
- Owner: linuxscout
- License: gpl-3.0
- Created: 2018-08-25T19:06:22.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2023-01-12T15:43:48.000Z (over 2 years ago)
- Last Synced: 2023-03-11T10:12:32.564Z (about 2 years ago)
- Language: Python
- Size: 86.9 KB
- Stars: 12
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Miknaaz مكناز
## Description
Generate Arabic golden standard corpus for morphology and stemming
### Citation
If you would cite it in academic work, can you use this citation
Taha Zerrouki, Miknaaz, http://github.com/linuxscout/miknaaz, 2023
or in bibtex format
@misc{zerrouki2018miknaaz,
title={Miknaaz: Generate arabic golden standard},
author={Zerrouki, Taha},
url={http://github.com/linuxscout/miknaaz},
year={2018}
}## Usage
* Build word features for linguistics building corpus
```python
from miknaaz.corpus_builder import CorpusBuilder
text = u"إلى البيت"
lemmer = CorpusBuilder()
words = lemmer.tokenize(text)
for word in words:
result = lemmer.morph_suggestions(word, True)
print(result)
```* Extract separate features
```python
from miknaaz.corpus_builder import CorpusBuilder
text = u"إلى البيت"
lemmer = CorpusBuilder()
words = lemmer.tokenize(text)
# test get lemmas
for word in words:
result = lemmer.get_lemmas(word)
# the result contains objects
print(result)
# test get roots
for word in words:
result = lemmer.get_roots(word)
# the result contains objects
print(result)
# test get wordtypes
for word in words:
result = lemmer.get_word_type(word)
# the result contains objects
print(result)
# test get wazns
for word in words:
result = lemmer.get_wazns(word)
# the result contains objects
print(result)
```