Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sagorbrur/bnlp
BNLP is a natural language processing toolkit for Bengali Language.
https://github.com/sagorbrur/bnlp
bangla bangla-nlp bangla-pos-tagging bangla-word2vec bengal-pos-tagging bengali bengali-fasttext bengali-language bengali-ner bengali-nlp bengali-nlp-library bengali-tokenization bengali-word-embedding bengali-word2vec bn-glove named-entity-recognition ner nlp nltk-tokenizer
Last synced: 4 days ago
JSON representation
BNLP is a natural language processing toolkit for Bengali Language.
- Host: GitHub
- URL: https://github.com/sagorbrur/bnlp
- Owner: sagorbrur
- License: mit
- Created: 2019-11-22T10:02:15.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2024-12-06T06:52:51.000Z (about 1 month ago)
- Last Synced: 2024-12-30T02:54:11.459Z (11 days ago)
- Topics: bangla, bangla-nlp, bangla-pos-tagging, bangla-word2vec, bengal-pos-tagging, bengali, bengali-fasttext, bengali-language, bengali-ner, bengali-nlp, bengali-nlp-library, bengali-tokenization, bengali-word-embedding, bengali-word2vec, bn-glove, named-entity-recognition, ner, nlp, nltk-tokenizer
- Language: Jupyter Notebook
- Homepage: https://sagorbrur.github.io/bnlp/
- Size: 22.5 MB
- Stars: 283
- Watchers: 7
- Forks: 64
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# Bengali Natural Language Processing(BNLP)
[![PyPI version](https://img.shields.io/pypi/v/bnlp_toolkit)](https://pypi.org/project/bnlp-toolkit/)
[![Downloads](https://static.pepy.tech/badge/bnlp_toolkit)](https://pepy.tech/project/bnlp_toolkit)BNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.
## Features
- Tokenization
- [Basic Tokenizer](./docs/README.md#basic-tokenizer)
- [NLTK Tokenizer](./docs/README.md#nltk-tokenization)
- [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)
- Embeddings
- [Word2vec embedding](./docs/README.md#bengali-word2vec)
- [Fasttext embedding](./docs/README.md#bengali-fasttext)
- [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)
- [Doc2vec Document embedding](./docs/README.md#document-embedding)
- Part of speech tagging
- [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)
- Named Entity Recognition
- [CRF-based NER](./docs/README.md#bengali-crf-ner)
- [Text Cleaning](./docs/README.md#text-cleaning)
- [Corpus](./docs/README.md#bengali-corpus-class)
- Letters, vowels, punctuations, stopwords## Installation
### PIP installer
```
pip install bnlp_toolkit
```
**or Upgrade**```
pip install -U bnlp_toolkit
```
- Python: 3.8, 3.9, 3.10, 3.11
- OS: Linux, Windows, Mac### Build from source
```
git clone https://github.com/sagorbrur/bnlp.git
cd bnlp
python setup.py install
```## Sample Usage
```py
from bnlp import BasicTokenizertokenizer = BasicTokenizer()
raw_text = "আমি বাংলায় গান গাই।"
tokens = tokenizer(raw_text)
print(tokens)
# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]
```## Documentation
Full documentation are available [here](https://sagorbrur.github.io/bnlp/)If you are using previous version of **bnlp** check the documentation [archive](https://sagorbrur.github.io/bnlp/docs/archive)
## Contributor Guide
Check [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.
## Thanks To
* [Semantics Lab](https://www.facebook.com/lab.semantics/)
* All the developers who are contributing to enrich Bengali NLP.