Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sagorbrur/bnlp

BNLP is a natural language processing toolkit for Bengali Language.
https://github.com/sagorbrur/bnlp

bangla bangla-nlp bangla-pos-tagging bangla-word2vec bengal-pos-tagging bengali bengali-fasttext bengali-language bengali-ner bengali-nlp bengali-nlp-library bengali-tokenization bengali-word-embedding bengali-word2vec bn-glove named-entity-recognition ner nlp nltk-tokenizer

Last synced: 1 day ago
JSON representation

BNLP is a natural language processing toolkit for Bengali Language.

Host: GitHub
URL: https://github.com/sagorbrur/bnlp
Owner: sagorbrur
License: mit
Created: 2019-11-22T10:02:15.000Z (about 5 years ago)
Default Branch: main
Last Pushed: 2024-12-06T06:52:51.000Z (2 months ago)
Last Synced: 2025-02-03T06:04:21.829Z (8 days ago)
Topics: bangla, bangla-nlp, bangla-pos-tagging, bangla-word2vec, bengal-pos-tagging, bengali, bengali-fasttext, bengali-language, bengali-ner, bengali-nlp, bengali-nlp-library, bengali-tokenization, bengali-word-embedding, bengali-word2vec, bn-glove, named-entity-recognition, ner, nlp, nltk-tokenizer
Language: Jupyter Notebook
Homepage: https://sagorbrur.github.io/bnlp/
Size: 22.5 MB
Stars: 286
Watchers: 6
Forks: 65
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

README

        # Bengali Natural Language Processing(BNLP)

[![PyPI version](https://img.shields.io/pypi/v/bnlp_toolkit)](https://pypi.org/project/bnlp-toolkit/)

[![Downloads](https://static.pepy.tech/badge/bnlp_toolkit)](https://pepy.tech/project/bnlp_toolkit)

BNLP is a natural language processing toolkit for Bengali Language. This tool will help you to **tokenize Bengali text**, **Embedding Bengali words**, **Embedding Bengali Document**, **Bengali POS Tagging**, **Bengali Name Entity Recognition**, **Bangla Text Cleaning** for Bengali NLP purposes.

## Features

- Tokenization

   - [Basic Tokenizer](./docs/README.md#basic-tokenizer)

   - [NLTK Tokenizer](./docs/README.md#nltk-tokenization)

   - [Sentencepiece Tokenizer](./docs/README.md#bengali-sentencepiece-tokenization)

- Embeddings

   - [Word2vec embedding](./docs/README.md#bengali-word2vec)

   - [Fasttext embedding](./docs/README.md#bengali-fasttext)

   - [Glove Embedding](./docs/README.md#bengali-glove-word-vectors)

   - [Doc2vec Document embedding](./docs/README.md#document-embedding)

- Part of speech tagging

   - [CRF-based POS tagging](./docs/README.md#bengali-crf-pos-tagging)

- Named Entity Recognition

   - [CRF-based NER](./docs/README.md#bengali-crf-ner)

- [Text Cleaning](./docs/README.md#text-cleaning)

- [Corpus](./docs/README.md#bengali-corpus-class)

   - Letters, vowels, punctuations, stopwords

## Installation

### PIP installer

  ```

  pip install bnlp_toolkit

  ```

  **or Upgrade**

  ```

  pip install -U bnlp_toolkit

  ```

  - Python: 3.8, 3.9, 3.10, 3.11

  - OS: Linux, Windows, Mac

### Build from source

```

git clone https://github.com/sagorbrur/bnlp.git

cd bnlp

python setup.py install

```

## Sample Usage

```py

from bnlp import BasicTokenizer

tokenizer = BasicTokenizer()

raw_text = "আমি বাংলায় গান গাই।"

tokens = tokenizer(raw_text)

print(tokens)

# output: ["আমি", "বাংলায়", "গান", "গাই", "।"]

```

## Documentation

Full documentation are available [here](https://sagorbrur.github.io/bnlp/)

If you are using previous version of **bnlp** check the documentation [archive](https://sagorbrur.github.io/bnlp/docs/archive)

## Contributor Guide

Check [CONTRIBUTING.md](https://github.com/sagorbrur/bnlp/blob/master/CONTRIBUTING.md) page for details.

## Thanks To

* [Semantics Lab](https://www.facebook.com/lab.semantics/)

* All the developers who are contributing to enrich Bengali NLP.