https://github.com/asraf-patoary/bnltk
BNLTK(Bangla Natural Language Processing Toolkit): a python package for NLP in Bangla
https://github.com/asraf-patoary/bnltk
bangla bangla-corpus bangla-natural-language-processing bangla-nlp bangla-pos-tagging bangla-stemmer bangla-tokenizer natural-language-processing natural-language-processing-bangla python-package
Last synced: 16 days ago
JSON representation
BNLTK(Bangla Natural Language Processing Toolkit): a python package for NLP in Bangla
- Host: GitHub
- URL: https://github.com/asraf-patoary/bnltk
- Owner: asraf-patoary
- License: mit
- Created: 2019-06-28T09:03:59.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2025-06-10T09:57:07.000Z (about 1 year ago)
- Last Synced: 2026-05-10T20:27:10.500Z (about 2 months ago)
- Topics: bangla, bangla-corpus, bangla-natural-language-processing, bangla-nlp, bangla-pos-tagging, bangla-stemmer, bangla-tokenizer, natural-language-processing, natural-language-processing-bangla, python-package
- Language: Python
- Homepage: https://ashwoolford.github.io/bnltk/
- Size: 303 KB
- Stars: 25
- Watchers: 1
- Forks: 8
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-bangladeshi-foss - BNLTK - Python Bangla NLP toolkit for tokenization, stemming, and POS tagging. (Developer Tools & Libraries / đ How to contribute)
README
# BNLTK
[](https://opensource.org/licenses/MIT)
[](https://pepy.tech/project/bnltk)
BNLTK(Bangla Natural Language Processing Toolkit) is an open-source python package for Natural Language Processing in Bangla. It offers functionalities to perform some basic NLP tasks such as Tokenization, Stemming and Parts of speech tagging. BNLTK requires Python version 3.6, 3.7, 3.8, 3.9 or 3.10.
Web documentation: [https://ashwoolford.github.io/bnltk/](https://ashwoolford.github.io/bnltk/)
## installation
```
pip install bnltk
```
**Note**: If you are using version 0.7.6, please see the documentation [here](#version-076)
## Version 0.7.8 (latest)
### Tokenizer
```
from bnltk.tokenize import Tokenizers
t = Tokenizers()
print(t.bn_word_tokenizer('āĻāĻ āĻāĻŦāĻšāĻžāĻāϝāĻŧāĻž āĻā§āĻŦ āĻāĻžāϞā§āĨ¤'))
# ["āĻāĻ", "āĻāĻŦāĻšāĻžāĻāϝāĻŧāĻž", "āĻā§āĻŦ", "āĻāĻžāϞā§", "āĨ¤"]
```
### Stemmer
```
from bnltk.stemmer import BanglaStemmer
bn_stemmer = BanglaStemmer()
print(bn_stemmer.stem('āĻšā§āϏā§āĻāĻŋāϞā§āύ'))
# āĻšāĻžāϏāĻž
```
### Parts of speech tagger
To use the Parts of Speech Tagger, please download the pretrained model's weights. Our trained model achieves an accuracy of 96%
```
from bnltk.bnltk_downloads import DataFiles
DataFiles.download()
```
After successfully downloading the files, you can use this module as follows:
```
from bnltk.pos_tagger import PosTagger
p_tagger = PosTagger()
print(p_tagger.tagger('āĻĻā§āĻļā§āĻāĻŋāύā§āϤāĻžāϰ āĻā§āύ āĻāĻžāϰāĻŖāĻ āύāĻžāĻ'))
# [('āĻĻā§āĻļā§āĻāĻŋāύā§āϤāĻžāϰ', 'NC'), ('āĻā§āύ', 'JQ'), ('āĻāĻžāϰāĻŖāĻ', 'NC'), ('āύāĻžāĻ', 'VM')]
```
## Version 0.7.6
### Tokenizer
```
from bnltk.tokenize import Tokenizers
t = Tokenizers()
print(t.bn_word_tokenizer('āĻāĻ āĻāĻŦāĻšāĻžāĻāϝāĻŧāĻž āĻā§āĻŦ āĻāĻžāϞā§āĨ¤'))
# ["āĻāĻ", "āĻāĻŦāĻšāĻžāĻāϝāĻŧāĻž", "āĻā§āĻŦ", "āĻāĻžāϞā§"]
```
### Stemmer
```
from bnltk.stemmer import BanglaStemmer
bn_stemmer = BanglaStemmer()
print(bn_stemmer.stem('āĻšā§āϏā§āĻāĻŋāϞā§āύ'))
# āĻšāĻžāϏāĻž
```
### Parts of speech tagger
To use the Parts of Speech Tagger, please download the pretrained model's weights. Our trained model achieves an accuracy of 96%
```
from bnltk.bnltk_downloads import DataFiles
DataFiles().download()
```
After successfully downloading the files, you can use this module as follows:
```
from bnltk.pos_tagger import PosTagger
p_tagger = PosTagger()
p_tagger.loader()
print(p_tagger.tagger('āĻĻā§āĻļā§āĻāĻŋāύā§āϤāĻžāϰ āĻā§āύ āĻāĻžāϰāĻŖāĻ āύāĻžāĻ'))
# [('āĻĻā§āĻļā§āĻāĻŋāύā§āϤāĻžāϰ', 'NC'), ('āĻā§āύ', 'JQ'), ('āĻāĻžāϰāĻŖāĻ', 'NC'), ('āύāĻžāĻ', 'VM')]
```
### Description of the POS tag set
| Categories | Types |
|-----------------------|-----------------------|
| Noun (N) | Common (NC) |
| | Proper (NP) |
| | Verbal (NV) |
| | Spatio-temporal (NST) |
| Pronoun (P) | Pronominal (PPR) |
| | Reflexive (PRF) |
| | Reciprocal (PRC) |
| | Relative (PRL) |
| | Wh (PWH) |
| Nominal Modifier (J) | Adjectives (JJ) |
| | Quantifiers (JQ) |
| Demonstratives (D) | Absolutive (DAB) |
| | Relative (DRL) |
| | Wh (DWH) |
| Adverb (A) | Manner (AMN) |
| | Location (ALC) |
| Participle (L) | Relative (LRL) |
| | Verbal (LV) |
| Postposition (PP) | |
| Particles (C) | Coordinating (CCD) |
| | Subordinating (CSB) |
| | Classifier (CCL) |
| | Interjection (CIN) |
| | Others (CX) |
| Punctuations (PU) | |
| Residual (RD) | Foreign Word (RDF) |
| | Symbol (RDS) |
| | Other (RDX) |