https://github.com/abdouaziz/wolof

Wolof is a library that you can use to do specific tasks in NLP with the Wolof language e.g. text classification in Wolof , NMT , ASR
https://github.com/abdouaziz/wolof

Last synced: 4 months ago
JSON representation

Wolof is a library that you can use to do specific tasks in NLP with the Wolof language e.g. text classification in Wolof , NMT , ASR

Host: GitHub
URL: https://github.com/abdouaziz/wolof
Owner: abdouaziz
License: apache-2.0
Created: 2021-11-23T14:12:25.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-11-28T13:07:31.000Z (over 1 year ago)
Last Synced: 2025-01-12T20:26:53.893Z (4 months ago)
Language: Python
Size: 489 KB
Stars: 28
Watchers: 5
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


    


    

    

    Library, built on PyTorch and Transformers, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks in wolof.

    

    





    

        

    

    

            



## Your Library for Wolof Language

**Wolof** is a language spoken in Senegal in neighboring countries, many works are written in Wolof or the need to have a tool that allows us to know better this language. 

**Wolof library** allows us to do several specific tasks in Wolof languages such as text classification, translation, automatic speech recognition. 

### Why Wolof library ?:

- simple and easy to use

- customizable 

- clean code

 

## Installation

### Requirements

- Python >= 3.6 

- Torch 

- Transformers 

### With pip

wolof can be installed using pip as follows:

```

pip install wolof 

```

### From source

```py

pip install git+https://github.com/abdouaziz/wolof.git

```

## Usage

```python

from wolof import Speech2Text

asr = Speech2Text(model_name="abdouaziiz/wav2vec2-xls-r-300m-wolof")

audio_file = "audio.wav"

prediction = asr(audio_file)

```

# Pipeline

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library,offering a simple API dedicated to several tasks Masked Language Modeling, Sentiment Analysis .

**bert-base-wolof** is pretrained bert-base model on wolof language  .

**sora-wolof** is pretrained roberta model on wolof language  .

	

## Models in Wolof library

	

| Model name | Number of layers | Attention Heads | Embedding Dimension | Total Parameters |

| :------:       |   :---: | :---: | :---: | :---: |

| `bert-base-wolof` | 6    | 12   | 514   | 56931622 M |

| `soraberta-base` | 6    | 12   | 514   | 83 M |

	 

## Using Soraberta or BERT-base-wolof

Let's use  ***`fill_mask`***  for masked language modeling . We mask a word with the token ***`[MASK]`*** in the given input_text and the unmasker predict the right word corresponding to the token ***`[MASK]`*** .

 	

```python

>>> from wolof import Pipeline

>>> unmasker = Pipeline(task='fill-mask', model_name='abdouaziiz/bert-base-wolof')

>>> unmasker("kuy yoot du [MASK].")

[{'sequence': '[CLS] kuy yoot du seqet. [SEP]',

	'score': 0.09505125880241394,

	'token': 13578},

	{'sequence': '[CLS] kuy yoot du daw. [SEP]',

	'score': 0.08882280439138412,

	'token': 679},

	{'sequence': '[CLS] kuy yoot du yoot. [SEP]',

	'score': 0.057790059596300125,

	'token': 5117},

	{'sequence': '[CLS] kuy yoot du seqat. [SEP]',

	'score': 0.05671025067567825,

	'token': 4992},

	{'sequence': '[CLS] kuy yoot du yaqu. [SEP]',

	'score': 0.0469999685883522,

	'token': 1735}]

```

# Machine Translation in Wolof

...

for ***`task`***  we can have the following values: 'fill-mask', 'sentiment-analysis'

You can checkout examples in `examples/`



## Author

- Abdou Aziz DIOP @abdouaziz

- email : [email protected]

- linkedin : https://www.linkedin.com/in/abdouaziiz/

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/abdouaziz/wolof

Awesome Lists containing this project

README