https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk

Machine Learning approach to Bengali Corpus POS Tagging using BNLTK. This is an experimenting project under the mentorship of Prof. Sandipan Ganguly, HIT-K.
https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk

bengali bengali-dataset bengali-language-processing bengali-natural-language-processing bengali-nlp english machine-learning natural-language-processing natural-language-understanding nlp nlp-library nlp-machine-learning postagger postagging rajdeep-das rajspeaks stemmer stemming tokenizer-parser

Last synced: 23 days ago
JSON representation

Machine Learning approach to Bengali Corpus POS Tagging using BNLTK. This is an experimenting project under the mentorship of Prof. Sandipan Ganguly, HIT-K.

Host: GitHub
URL: https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk
Owner: Rajspeaks
License: gpl-3.0
Created: 2022-03-25T11:25:20.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-05-02T06:28:13.000Z (almost 3 years ago)
Last Synced: 2025-02-09T22:46:52.762Z (3 months ago)
Topics: bengali, bengali-dataset, bengali-language-processing, bengali-natural-language-processing, bengali-nlp, english, machine-learning, natural-language-processing, natural-language-understanding, nlp, nlp-library, nlp-machine-learning, postagger, postagging, rajdeep-das, rajspeaks, stemmer, stemming, tokenizer-parser
Language: Jupyter Notebook
Homepage:
Size: 78.1 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Machine Learning approach to Bengali Corpus Tokenization | Stemming | POS Tagging using BNLTK

BNLTK Means Bengali Natural Language Toolkit developed by [Asraf Patoary](https://github.com/ashwoolford). By using BNLTK, we can tokenize, stemming, tagging parts of speeches categories on Bengali Words.

## Installation:

```
pip install bnltk
```

## Methodology

- First we have installed BNLTK.
- Imported Tokenizers from bnltk & tokenized a Bengali Sentence by splitting into individual words. Then applied the same on a larger Bengali Corpus to tokenize Bengali words.
- Imported BanglaStemmer() from bnltk to apply stemming on Bengali Words. Repeated 2 times the same on different words.
- Downloaded the Datafile from bnltk before moving for further execution.
- Imported PosTagger from bnltk & applied on a Bengali small sentence & tagged each Bengali words into different Parts of Speech categories. Repeated the same 2 times more on larger Bengali Corpora.

## Tools & Library requirements:

- Google Colab/Jupyter-Notebook
- BNLTK Library

### Reference:

1. https://ashwoolford.github.io/bnltk-documentation/
2. https://github.com/ashwoolford/bnltk

### Mentor:

Prof. Sandipan Ganguly

### Developer:

Rajdeep Das

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rajspeaks/machine-learning-approach-to-bengali-corpus-tokenization-stemming-pos-tagging-using-bnltk

Awesome Lists containing this project

README