Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gdebasis/BengaliStemmer
A very simple-to-use rule based stemmer for Bengali (Bangla). The program takes as input a new line separated list of words and outputs in each new line the stem for every input word.
https://github.com/gdebasis/BengaliStemmer
Last synced: about 1 month ago
JSON representation
A very simple-to-use rule based stemmer for Bengali (Bangla). The program takes as input a new line separated list of words and outputs in each new line the stem for every input word.
- Host: GitHub
- URL: https://github.com/gdebasis/BengaliStemmer
- Owner: gdebasis
- Created: 2014-08-08T17:08:20.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2014-12-28T15:29:02.000Z (over 9 years ago)
- Last Synced: 2024-02-17T12:37:13.498Z (4 months ago)
- Language: C
- Size: 145 KB
- Stars: 13
- Watchers: 5
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Lists
- awesome-bangla - Bengali Stemmer (Rule Based)
README
This is a very simple light-weight rule based stemmer for Bengali.
To build on a Linux system, type make.
You can then invoke the stemmer (the executable name is rbs) by
./stem_bnThe input file is a new line separated list of Bengali words and the output is also a new line separated file, the first word being the original word and the consecutive word being its stemmed form.
No corpus preprocessing is required to run this stemmer. I have provided a sample input file.
Just type in ./rbs sample.txt sample.stem to see the output.NOTE: You can provide an optional third argument for the aggressiveness of the stemmer.
By default, the aggressive mode is turned off. To turn it on, please append a "1" at the end of the argument list.