An open API service indexing awesome lists of open source software.

https://github.com/jpzinn654/qa-portuguese-v1

This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering
https://github.com/jpzinn654/qa-portuguese-v1

dataset dataset-generation huggingface huggingface-datasets large-language-models llm nlp-datasets portuguese-language questions-and-answers

Last synced: 19 days ago
JSON representation

This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering

Awesome Lists containing this project

README

          


QA-Portuguese-small
Portuguese preprocessed split from MQA dataset

This repository contains a dataset of question-answer pairs in Portuguese, uploaded to Hugging Face. The dataset consists of 500,000 rows, with each row containing a question and its corresponding answer in Portuguese.


#### Usage

```python
from datasets import load_dataset

data = load_dataset("Jpzinn654/qa-portuguese-small")
```

## Overview

The project involves:

- **Loading** a large question-answer dataset from Hugging Face.
- **Selecting** the first 500,000 rows of the dataset.
- **Saving** the dataset in both CSV and JSON formats.
- **Pushing** the processed dataset to the Hugging Face hub for easy access and sharing.

## Dataset Details

- **Name:** qa-portuguese
- **Source:** Hugging Face
- **Rows:** 500,000 question-answer pairs
- **Languages:** Portuguese