https://github.com/jpzinn654/qa-portuguese-v1

This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering
https://github.com/jpzinn654/qa-portuguese-v1

dataset dataset-generation huggingface huggingface-datasets large-language-models llm nlp-datasets portuguese-language questions-and-answers

Last synced: 19 days ago
JSON representation

This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering

Host: GitHub
URL: https://github.com/jpzinn654/qa-portuguese-v1
Owner: Jpzinn654
License: mit
Created: 2024-11-29T12:28:38.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-11-29T13:05:03.000Z (10 months ago)
Last Synced: 2025-03-29T12:47:00.310Z (6 months ago)
Topics: dataset, dataset-generation, huggingface, huggingface-datasets, large-language-models, llm, nlp-datasets, portuguese-language, questions-and-answers
Language: Python
Homepage:
Size: 4.88 KB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


    QA-Portuguese-small

    Portuguese preprocessed split from MQA dataset 


  This repository contains a dataset of question-answer pairs in Portuguese, uploaded to Hugging Face. The dataset consists of 500,000 rows, with each row containing a question and its corresponding answer in     Portuguese.





#### Usage

```python

from datasets import load_dataset

data = load_dataset("Jpzinn654/qa-portuguese-small")

```

## Overview

The project involves:

- **Loading** a large question-answer dataset from Hugging Face.

- **Selecting** the first 500,000 rows of the dataset.

- **Saving** the dataset in both CSV and JSON formats.

- **Pushing** the processed dataset to the Hugging Face hub for easy access and sharing.

## Dataset Details

- **Name:** qa-portuguese

- **Source:** Hugging Face

- **Rows:** 500,000 question-answer pairs

- **Languages:** Portuguese

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jpzinn654/qa-portuguese-v1

Awesome Lists containing this project

README

QA-Portuguese-small
Portuguese preprocessed split from MQA dataset

This repository contains a dataset of question-answer pairs in Portuguese, uploaded to Hugging Face. The dataset consists of 500,000 rows, with each row containing a question and its corresponding answer in Portuguese.

https://github.com/jpzinn654/qa-portuguese-v1

Awesome Lists containing this project

README

QA-Portuguese-small Portuguese preprocessed split from MQA dataset This repository contains a dataset of question-answer pairs in Portuguese, uploaded to Hugging Face. The dataset consists of 500,000 rows, with each row containing a question and its corresponding answer in Portuguese.

QA-Portuguese-small
Portuguese preprocessed split from MQA dataset

This repository contains a dataset of question-answer pairs in Portuguese, uploaded to Hugging Face. The dataset consists of 500,000 rows, with each row containing a question and its corresponding answer in Portuguese.