Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sagorbrur/bnflair

Bengali flair based model collection
https://github.com/sagorbrur/bnflair

bengali bengali-nlp flair flair-embeddings

Last synced: 4 days ago
JSON representation

Bengali flair based model collection

Host: GitHub
URL: https://github.com/sagorbrur/bnflair
Owner: sagorbrur
License: mit
Created: 2022-06-02T12:59:04.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2022-06-03T02:49:08.000Z (over 2 years ago)
Last Synced: 2023-03-07T14:40:14.907Z (almost 2 years ago)
Topics: bengali, bengali-nlp, flair, flair-embeddings
Homepage:
Size: 25.7 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: license

Awesome Lists containing this project

README

        # BNFLAIR

A [Flair](https://github.com/flairNLP/flair) based Bengali collections which provide different bengali flair embeddings and Bengali flair trained NER, POS, Text classification model.

## Installation

```

pip install -r requirements.txt

```

## Embeddings

### Bengali Wiki Flair embeddings

Here we have trained Flair character based language model for Bengali Wiki dataset.

- [Forward LM](https://github.com/sagorbrur/bnflair/tree/main/models/embeddings/wikipedia)

    - Total wikipedia artcles: 110449

    - Train epoch: 5 Epochs

    - Validation loss: 1.5366

    - Validation perplexity: 4.6490

    

- [Backward LM](https://github.com/sagorbrur/bnflair/tree/main/models/embeddings/wikipedia)

    - Total wikipedia artcles: 110449

    - Train epoch: 5 Epochs

    - Validation loss: 1.4717

    - Validation perplexity: 4.3566

## Bengali NER Model

### Wikiann Model

Here we have trained Bengali NER model for [wikiann](https://huggingface.co/datasets/wikiann) Bengali NER dataset.

- Total wikiann train data: 1000

- Total wikiann validation data: 100

- TOTAL wikiann test data: 100

- Train epoch: 70 Epochs

- Score in Test data

    - F-score (micro) 0.7751

    - F-score (macro) 0.775

    - Accuracy 0.7364

- For details log check [here](https://github.com/sagorbrur/bnflair/tree/main/models/ner)

## Usage

### Embeddings

- To generate flair embedding using any Bengali text

```py

from flair.data import Sentence

sentence = Sentence('রামপ্রসাদ সেন জন্মগ্রহণ করেছিলেন গাঙ্গেয় পশ্চিমবঙ্গের এক তান্ত্রিক বৈদ্যব্রাহ্মণ পরিবারে।')

# init embeddings from your trained LM

char_lm_embeddings = FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt')

# embed sentence

char_lm_embeddings.embed(sentence)

```

- To fine-tune for training flair based NER, POS, Text classification model

```py

from flair.embeddings import StackedEmbeddings

embedding_types = [

    FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt'),

    FlairEmbeddings('models/embeddings/wikipedia/bnwiki_backward.pt')

]

embeddings = StackedEmbeddings(embeddings=embedding_types)

```

### NER

- To use NER model

```py

from flair.data import Sentence

from flair.models import SequenceTagger

text = "কবিরঞ্জন রামপ্রসাদ সেন (১৭১৮ বা ১৭২৩ – ১৭৭৫) ছিলেন অষ্টাদশ শতাব্দীর এক বিশিষ্ট বাঙালি শাক্ত কবি ও সাধক।"

ner_model_path = "models/ner/wikiann.pt"

ner_model = SequenceTagger.load(ner_model_path)

sentence = Sentence(text)

ner_model.predict(sentence)

entities = sentence.get_spans('ner')

for entity in entities:

    print(entity)

# output: Span[0:3]: "কবিরঞ্জন রামপ্রসাদ সেন" → PER (0.5903)

```