https://github.com/ruanchaves/napolab

A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.
https://github.com/ruanchaves/napolab

benchmarks catalan datasets english galician hate-speech huggingface huggingface-transformers large-language-models nlp portuguese python question-answering semantic-similarity spanish text-simplification textual-entailment transformers

Last synced: about 1 month ago
JSON representation

A Natural Portuguese Language Benchmark (Napolab) for the evaluation of language models.

Host: GitHub
URL: https://github.com/ruanchaves/napolab
Owner: ruanchaves
License: mit
Created: 2023-03-29T12:10:14.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2025-03-04T13:21:22.000Z (3 months ago)
Last Synced: 2025-04-02T03:43:12.949Z (about 2 months ago)
Topics: benchmarks, catalan, datasets, english, galician, hate-speech, huggingface, huggingface-transformers, large-language-models, nlp, portuguese, python, question-answering, semantic-similarity, spanish, text-simplification, textual-entailment, transformers
Language: Python
Homepage:
Size: 227 KB
Stars: 67
Watchers: 7
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Citation: CITATION.cff

Awesome Lists containing this project

README

        # 🌎 Natural Portuguese Language Benchmark (Napolab)

The [**Napolab**](https://huggingface.co/datasets/ruanchaves/napolab) is your go-to collection of Portuguese datasets for the evaluation of Large Language Models.

## 📊 Napolab for Large Language Models (LLMs)

A format of Napolab specifically designed for researchers experimenting with Large Language Models (LLMs) is now available. This format includes two main fields:

* **Prompt**: The input prompt to be fed into the LLM.

* **Answer**: The expected classification output label from the LLM, which is always a number between 0 and 5.

The dataset in this format can be accessed at [https://huggingface.co/datasets/ruanchaves/napolab](https://huggingface.co/datasets/ruanchaves/napolab). If you’ve used Napolab for LLM evaluations, please share your findings with us!

## Leaderboards 

The [Open PT LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) incorporates datasets from Napolab. 

The Master's thesis [Lessons Learned from the Evaluation of Portuguese Language Models](https://www.um.edu.mt/library/oar/handle/123456789/120557) features an extensive evaluation of Transformer models on Napolab.

## Guidelines

Napolab adopts the following guidelines for the inclusion of datasets:

* 🌿 **Natural**: As much as possible, datasets consist of natural Portuguese text or professionally translated text.

* ✅ **Reliable**: Metrics correlate reliably with human judgments (accuracy, F1 score, Pearson correlation, etc.).

* 🌐 **Public**: Every dataset is available through a public link.

* 👩‍🔧 **Human**: Expert human annotations only. No automatic or unreliable annotations.

* 🎓 **General**: No domain-specific knowledge or advanced preparation is needed to solve dataset tasks.

[Napolab](https://huggingface.co/datasets/ruanchaves/napolab) currently includes the following datasets:

| | | |

| :---: |  :---:  |  :---: |

|[assin](https://huggingface.co/datasets/assin) | [assin2](https://huggingface.co/datasets/assin2) | [rerelem](https://huggingface.co/datasets/ruanchaves/rerelem)|

|[hatebr](https://huggingface.co/datasets/ruanchaves/hatebr)| [reli-sa](https://huggingface.co/datasets/ruanchaves/reli-sa) | [faquad-nli](https://huggingface.co/datasets/ruanchaves/faquad-nli) |

|[porsimplessent](https://huggingface.co/datasets/ruanchaves/porsimplessent) | | |

**💡 Contribute**: We're open to expanding Napolab! Suggest additions in the issues. For more information, read our [CONTRIBUTING.md](CONTRIBUTING.md).

🌍 For broader accessibility, all datasets have translations in **Catalan, English, Galician and Spanish** using the `facebook/nllb-200-1.3B model` via [Easy-Translate](https://github.com/ikergarcia1996/Easy-Translate).

## 🤖 Models

We've made several models, fine-tuned on this benchmark, available on Hugging Face Hub:

| Datasets                     | mDeBERTa v3                                                                                                    | BERT Large                                                                                                    | BERT Base                                                                                                     |

|:----------------------------:|:--------------------------------------------------------------------------------------------------------------:|:-------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------------------------------------:|

| **ASSIN 2 - STS**            | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-assin2-similarity)                                   | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-assin2-similarity)                       | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin2-similarity)                       |

| **ASSIN 2 - RTE**            | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-assin2-entailment)                                  | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-assin2-entailment)                       | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin2-entailment)                       |

| **ASSIN - STS**              | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-assin-similarity)                                   | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-assin-similarity)                        | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-similarity)                        |

| **ASSIN - RTE**              | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-assin-entailment)                                   | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-assin-entailment)                        | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-entailment)                        |

| **HateBR**                   | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-hatebr)                                             | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-hatebr)                                 | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-hatebr)                                  |

| **FaQUaD-NLI**               | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-faquad-nli)                                         | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-faquad-nli)                             | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-faquad-nli)                              |

| **PorSimplesSent**           | [Link](https://huggingface.co/ruanchaves/mdeberta-v3-base-porsimplessent)                                     | [Link](https://huggingface.co/ruanchaves/bert-large-portuguese-cased-porsimplessent)                         | [Link](https://huggingface.co/ruanchaves/bert-base-portuguese-cased-porsimplessent)                          |

For model fine-tuning details and benchmark results, visit [EVALUATION.md](EVALUATION.md). 

## Usage

To reproduce the Napolab benchmark available on the Hugging Face Hub locally, follow these steps:

1. Clone the repository and install the library:

```bash

git clone https://github.com/ruanchaves/napolab.git

cd napolab

pip install -e .

```

2. Generate the benchmark file:

   

```python

from napolab import export_napolab_benchmark, convert_to_completions_format

input_df = export_napolab_benchmark()

output_df = convert_to_completions_format(input_df)

output_df.reset_index().to_csv("test.csv", index=False)

```

## Citation

If you would like to cite our work or models, please reference the Master's thesis [Lessons Learned from the Evaluation of Portuguese Language Models](https://www.um.edu.mt/library/oar/handle/123456789/120557).

```

@mastersthesis{chaves2023lessons,

  title={Lessons learned from the evaluation of Portuguese language models},

  author={Chaves Rodrigues, Ruan},

  year={2023},

  school={University of Malta},

  url={https://www.um.edu.mt/library/oar/handle/123456789/120557}

}

```

## Disclaimer

The HateBR dataset, including all its components, is provided strictly for academic and research purposes. The use of the HateBR dataset for any commercial or non-academic purpose is expressly prohibited without the prior written consent of [SINCH](https://www.sinch.com/).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ruanchaves/napolab

Awesome Lists containing this project

README