Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mhmd-azeez/GendeStemmer
A grammatically incorrect Stemmer for Sorani Kurdish
https://github.com/mhmd-azeez/GendeStemmer
Last synced: 28 days ago
JSON representation
A grammatically incorrect Stemmer for Sorani Kurdish
- Host: GitHub
- URL: https://github.com/mhmd-azeez/GendeStemmer
- Owner: mhmd-azeez
- Created: 2021-10-12T19:07:13.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2021-10-12T19:14:18.000Z (about 3 years ago)
- Last Synced: 2024-11-08T12:53:12.160Z (about 1 month ago)
- Language: C#
- Size: 10.7 KB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-kurdish - Gende Stemmer
README
# Gende Stemmer - گەندە ستێمەر
Gende Stemmer is a grammatically ignorant stemmer for Sorani Kurdish. Our aim is to make its output predictable and consistent, but not necessarily grammatically correct.Why is this useful? It's useful for Full Text Searching indexing and other Information Retrieval (IR) tasks. It's useful to be able to stem both `سێوەکانیشتان` and `سێوێکی` back to `سێو`.
Sometimes the output of the stemmer is very ugly, but as long as it is predictable and consistent we don't mind.
**NOTE:** Please note that this project is just a Proof Of Concept (POC) and doesn't work in every scenario. If you run into edge cases, please open an issue.
## How to use
Stem one word:
```csharp
var word = "بزنەکانیشتان";
var stemmed = Stemmer.StemWord(word);
// stemmed: بزن
```Stem a body of text:
```csharp
var text = "سڵاومان گەیاندە هەموویان";
var stemmed = Stemmer.Stem(text);
// stemmed: سڵاو گەیاند هەموو
```## Resources
This Stemmer is inspired by:
- [پوختەی ڕێنووس - دیاکۆ هاشمی](http://diyako.yageyziman.com/wp-content/uploads/2016/03/Puxtey_Renus_Diyako_2021_09_25.pdf)
- [CKMorph: A Comprehensive Morphological Analyzer for Central Kurdish](https://arxiv.org/ftp/arxiv/papers/2109/2109.08615.pdf)
- [A Formal Description of Sorani Kurdish Morphology](https://arxiv.org/ftp/arxiv/papers/2109/2109.03942.pdf)