https://github.com/abdouaziz/wolof
Wolof is a library that you can use to do specific tasks in NLP with the Wolof language e.g. text classification in Wolof , NMT , ASR
https://github.com/abdouaziz/wolof
Last synced: 4 months ago
JSON representation
Wolof is a library that you can use to do specific tasks in NLP with the Wolof language e.g. text classification in Wolof , NMT , ASR
- Host: GitHub
- URL: https://github.com/abdouaziz/wolof
- Owner: abdouaziz
- License: apache-2.0
- Created: 2021-11-23T14:12:25.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-28T13:07:31.000Z (over 1 year ago)
- Last Synced: 2025-01-12T20:26:53.893Z (4 months ago)
- Language: Python
- Size: 489 KB
- Stars: 28
- Watchers: 5
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![]()
Library, built on PyTorch and Transformers, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks in wolof.
## Your Library for Wolof Language
**Wolof** is a language spoken in Senegal in neighboring countries, many works are written in Wolof or the need to have a tool that allows us to know better this language.
**Wolof library** allows us to do several specific tasks in Wolof languages such as text classification, translation, automatic speech recognition.
### Why Wolof library ?:
- simple and easy to use
- customizable
- clean code
## Installation
### Requirements
- Python >= 3.6
- Torch
- Transformers### With pip
wolof can be installed using pip as follows:
```
pip install wolof
```### From source
```py
pip install git+https://github.com/abdouaziz/wolof.git
```## Usage
```python
from wolof import Speech2Textasr = Speech2Text(model_name="abdouaziiz/wav2vec2-xls-r-300m-wolof")
audio_file = "audio.wav"
prediction = asr(audio_file)
```# Pipeline
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library,offering a simple API dedicated to several tasks Masked Language Modeling, Sentiment Analysis .
**bert-base-wolof** is pretrained bert-base model on wolof language .
**sora-wolof** is pretrained roberta model on wolof language .
## Models in Wolof library
| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |
| :------: | :---: | :---: | :---: | :---: |
| `bert-base-wolof` | 6 | 12 | 514 | 56931622 M |
| `soraberta-base` | 6 | 12 | 514 | 83 M |
## Using Soraberta or BERT-base-wolof
Let's use ***`fill_mask`*** for masked language modeling . We mask a word with the token ***`[MASK]`*** in the given input_text and the unmasker predict the right word corresponding to the token ***`[MASK]`*** .
```python
>>> from wolof import Pipeline
>>> unmasker = Pipeline(task='fill-mask', model_name='abdouaziiz/bert-base-wolof')
>>> unmasker("kuy yoot du [MASK].")[{'sequence': '[CLS] kuy yoot du seqet. [SEP]',
'score': 0.09505125880241394,
'token': 13578},
{'sequence': '[CLS] kuy yoot du daw. [SEP]',
'score': 0.08882280439138412,
'token': 679},
{'sequence': '[CLS] kuy yoot du yoot. [SEP]',
'score': 0.057790059596300125,
'token': 5117},
{'sequence': '[CLS] kuy yoot du seqat. [SEP]',
'score': 0.05671025067567825,
'token': 4992},
{'sequence': '[CLS] kuy yoot du yaqu. [SEP]',
'score': 0.0469999685883522,
'token': 1735}]
```# Machine Translation in Wolof
...for ***`task`*** we can have the following values: 'fill-mask', 'sentiment-analysis'
You can checkout examples in `examples/`
## Author
- Abdou Aziz DIOP @abdouaziz
- email : [email protected]
- linkedin : https://www.linkedin.com/in/abdouaziiz/