Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tanmoyio/sahajbert
https://github.com/tanmoyio/sahajbert
Last synced: about 6 hours ago
JSON representation
- Host: GitHub
- URL: https://github.com/tanmoyio/sahajbert
- Owner: tanmoyio
- Created: 2021-09-02T08:41:11.000Z (about 3 years ago)
- Default Branch: master
- Last Pushed: 2021-12-28T21:39:31.000Z (almost 3 years ago)
- Last Synced: 2024-05-01T16:29:46.012Z (5 months ago)
- Language: Python
- Size: 630 KB
- Stars: 12
- Watchers: 2
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# sahajBERT
## Downstream evaluation
We have two downstream task `NER` and `NCC`
The datasets have been used here are:
NER: `wikiann`, `bn`
NCC: `indic_glue`, `sna.bn`
To read more about the datasets visit [WikiANN](https://huggingface.co/datasets/wikiann), [IndicGLUE](https://huggingface.co/datasets/indic_glue)
Model link - [sahajBERT-xlarge](https://huggingface.co/Upload/sahajbert2)
### NER
##### 1. Clone the sahajbert repo and prepare the env by intalling requirements.
```
git clone https://github.com/tanmoyio/sahajBERT.git
cd sahajbert
pip install -r requirements.txt
pip install -q https://github.com/learning-at-home/hivemind/archive/sahaj2.zip
pip install seqeval
```
###### 2. Run the following command
```
!python train_ner.py \
--model_name_or_path Upload/sahajbert2 --output_dir sahajbert/ner \
--learning_rate 3e-5 --max_seq_length 256 --num_train_epochs 20 \
--per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 8 \
--early_stopping_patience 3 --early_stopping_threshold 0.01
```
###### This will give you a prompt, and you need to provide your Huggingface username and password. (We don't store huggingface password) this is only to allow your score to be reflected in the leaderboard.**Leaderboard link - [sahajBERT2-xlarge-ner](https://wandb.ai/tanmoyio/sahajBERT2-xlarge-ner?workspace=user-tanmoyio)**
If you are using GPU, or finetuning it with colab GPU then you might want to adjust the `per_device_train_batch_size`, `per_device_train_batch_size`.
### NCC
```
!python train_ncc.py \
--model_name_or_path Upload/sahajbert2 --output_dir sahajbert/ner \
--learning_rate 1e-5 --max_seq_length 128 --num_train_epochs 20 \
--per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 8 \
--early_stopping_patience 3 --early_stopping_threshold 0.01
```