https://github.com/rasyosef/amharic-news-category-classification
notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification dataset and the transformers library
https://github.com/rasyosef/amharic-news-category-classification
amharic bert fine-tuning huggingface text-classification transformers xlm-roberta
Last synced: about 1 month ago
JSON representation
notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification dataset and the transformers library
- Host: GitHub
- URL: https://github.com/rasyosef/amharic-news-category-classification
- Owner: rasyosef
- Created: 2024-05-08T14:36:51.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-05-10T13:11:37.000Z (over 1 year ago)
- Last Synced: 2025-09-14T09:02:19.480Z (2 months ago)
- Topics: amharic, bert, fine-tuning, huggingface, text-classification, transformers, xlm-roberta
- Language: Jupyter Notebook
- Homepage:
- Size: 45.9 KB
- Stars: 10
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# amharic-news-category-classification
This github repo that contains three notebooks that use the [amharic-news-category-classification](https://huggingface.co/datasets/rasyosef/amharic-news-category-classification) dataset to finetune the following models for a text classification task.
The finetuned model classifies a given Amharic news article into one of the following 6 categories.
- ሀገር አቀፍ ዜና (Local News)
- መዝናኛ (Entertainment)
- ስፖርት (Sports)
- ቢዝነስ (Business)
- ዓለም አቀፍ ዜና (International News)
- ፖለቲካ (Politics)
## Models
* [xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) : a multilingual transformer model with 280M parameters
* [bert-small-amharic](https://huggingface.co/rasyosef/bert-small-amharic) : a new amharic version of the bert-small transformer model with 25.7M parameters, pretrained from scratch using unlabelled amharic text data
* [bert-mini-amharic](https://huggingface.co/rasyosef/bert-mini-amharic) : a new amharic version of the bert-mini transformer model with 9.67M parameters, pretrained from scratch using unlabelled amharic text data
### Fine-tuned Model Performance
Since this is a multi-class classification task, the reported precision, recall, and f1 metrics are macro averages.
|Model|Size (# params)|Accuracy|Precision|Recall|F1|
|-----|----|--------|---------|------|--|
|xlm-roberta-base|279M|0.9|0.88|0.88|0.88|
|bert-small-amharic|25.7M|0.89|0.86|0.87|0.86|
|bert-mini-amharic|9.67M|0.87|0.83|0.83|0.83|