Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ndamulelonemakh/our-stopwords

Auto-generated stopwords for South African Bantu Languages
https://github.com/ndamulelonemakh/our-stopwords

african-languages africanlp dataset low-resource-languages natural-language-processing nlp stopwords tshivenda

Last synced: about 2 months ago
JSON representation

Auto-generated stopwords for South African Bantu Languages

Awesome Lists containing this project

README

        

# Our Stopwords

* Auto-generated stopwords for South African Languages

## Introduction

* We present a list of auto-translated stopwords from English and adapt them to native [South African Bantu Languages](https://pubs.cs.uct.ac.za/id/eprint/1334/1/icadl_2019_banturecognition.pdf)


## Usage

- The data is provided in [JSON Lines](https://jsonlines.org/) format. Here is an example of using the stopwords in Python:

### Python Example

```python
import json

# Load the stop words from the JSON lines file
stop_words = []
with open('za_stopwords.main.jsonl', 'r', encoding='utf-8') as file:
for line in file:
stop_words.append(json.loads(line.strip()))

# Example: Print stop words in Zulu
for word in stop_words:
print(f"English: {word['eng']}, Zulu: {word['zul']}")

```

> Refer to [`training_example.py`](./training_example.py) for a full working example on how you can use these stopwords in your model training scripts

## Supported Languages

* ven - Tshivenda
* tso - Xitsonga
* sot - Southern Sotho
* nso - Northen Sotho
* tsn - Setswana
* zul - IsiZulu
* xho - IsiXhosa
* ...
* Coming soon:
* nbl
* ssw

## Contributions

* Feel free to create a pull request if you have a suggestion

## License

* This work is published under the CC BY-NC 4.0 license

# Citation

```bibtex
@misc{multilingual_stop_words,
author = {Ndamulelo Nemakhavhani},
title = {Autogenerated Stop Words for South African Bantu Languages},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ndamulelonemakh/our-stopwords}}
}
```

## Contact

* If you have any questions or suggestions, please feel free to open an issue or [contact us](endeesa@yahoo.com).