https://github.com/sagorbrur/bnlp
BNLP is a natural language processing toolkit for Bengali Language.
https://github.com/sagorbrur/bnlp
bangla bangla-nlp bangla-pos-tagging bangla-word2vec bengal-pos-tagging bengali bengali-fasttext bengali-language bengali-ner bengali-nlp bengali-nlp-library bengali-tokenization bengali-word-embedding bengali-word2vec bn-glove named-entity-recognition ner nlp nltk-tokenizer
Last synced: 9 months ago
JSON representation
BNLP is a natural language processing toolkit for Bengali Language.
- Host: GitHub
- URL: https://github.com/sagorbrur/bnlp
- Owner: sagorbrur
- License: mit
- Created: 2019-11-22T10:02:15.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-12-06T06:52:51.000Z (over 1 year ago)
- Last Synced: 2025-04-15T00:49:58.392Z (about 1 year ago)
- Topics: bangla, bangla-nlp, bangla-pos-tagging, bangla-word2vec, bengal-pos-tagging, bengali, bengali-fasttext, bengali-language, bengali-ner, bengali-nlp, bengali-nlp-library, bengali-tokenization, bengali-word-embedding, bengali-word2vec, bn-glove, named-entity-recognition, ner, nlp, nltk-tokenizer
- Language: Jupyter Notebook
- Homepage: https://sagorbrur.github.io/bnlp/
- Size: 22.5 MB
- Stars: 290
- Watchers: 5
- Forks: 64
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
- awesome-bangladeshi-foss - BNLP - Bengali NLP toolkit with tokenization, embeddings, POS tagging, and NER. (Developer Tools & Libraries / 🚀 How to contribute)
- indicnlp_catalog - BNLP
- awesome-bangla - Bengali NLP Library(BNLP)
README
# Bengali Natural Language Processing(BNLP)
[](https://pypi.org/project/bnlp-toolkit/)
[](https://pepy.tech/project/bnlp_toolkit)
BNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.
## Features
- Tokenization
- [Basic Tokenizer](./docs/README.md#basic-tokenizer)
- [NLTK Tokenizer](./docs/README.md#nltk-tokenization)
- [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)
- Embeddings
- [Word2vec embedding](./docs/README.md#bengali-word2vec)
- [Fasttext embedding](./docs/README.md#bengali-fasttext)
- [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)
- [Doc2vec Document embedding](./docs/README.md#document-embedding)
- Part of speech tagging
- [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)
- Named Entity Recognition
- [CRF-based NER](./docs/README.md#bengali-crf-ner)
- [Text Cleaning](./docs/README.md#text-cleaning)
- [Corpus](./docs/README.md#bengali-corpus-class)
- Letters, vowels, punctuations, stopwords
## Installation
### PIP installer
```
pip install bnlp_toolkit
```
**or Upgrade**
```
pip install -U bnlp_toolkit
```
- Python: 3.8, 3.9, 3.10, 3.11
- OS: Linux, Windows, Mac
### Build from source
```
git clone https://github.com/sagorbrur/bnlp.git
cd bnlp
python setup.py install
```
## Sample Usage
```py
from bnlp import BasicTokenizer
tokenizer = BasicTokenizer()
raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]
```
## Documentation
Full documentation are available [here](https://sagorbrur.github.io/bnlp/)
If you are using previous version of **bnlp** check the documentation [archive](https://sagorbrur.github.io/bnlp/docs/archive)
## Contributor Guide
Check [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.
## Thanks To
* [Semantics Lab](https://www.facebook.com/lab.semantics/)
* All the developers who are contributing to enrich Bengali NLP.