https://github.com/jpzinn654/qa-portuguese-v1
This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering
https://github.com/jpzinn654/qa-portuguese-v1
dataset dataset-generation huggingface huggingface-datasets large-language-models llm nlp-datasets portuguese-language questions-and-answers
Last synced: 19 days ago
JSON representation
This is a split 500 thousands rows of a dataset from hugging face in portuguese to train NLP's for Question-and-Answering
- Host: GitHub
- URL: https://github.com/jpzinn654/qa-portuguese-v1
- Owner: Jpzinn654
- License: mit
- Created: 2024-11-29T12:28:38.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-11-29T13:05:03.000Z (10 months ago)
- Last Synced: 2025-03-29T12:47:00.310Z (6 months ago)
- Topics: dataset, dataset-generation, huggingface, huggingface-datasets, large-language-models, llm, nlp-datasets, portuguese-language, questions-and-answers
- Language: Python
- Homepage:
- Size: 4.88 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
QA-Portuguese-small
Portuguese preprocessed split from MQA dataset
This repository contains a dataset of question-answer pairs in Portuguese, uploaded to Hugging Face. The dataset consists of 500,000 rows, with each row containing a question and its corresponding answer in Portuguese.#### Usage
```python
from datasets import load_datasetdata = load_dataset("Jpzinn654/qa-portuguese-small")
```## Overview
The project involves:
- **Loading** a large question-answer dataset from Hugging Face.
- **Selecting** the first 500,000 rows of the dataset.
- **Saving** the dataset in both CSV and JSON formats.
- **Pushing** the processed dataset to the Hugging Face hub for easy access and sharing.## Dataset Details
- **Name:** qa-portuguese
- **Source:** Hugging Face
- **Rows:** 500,000 question-answer pairs
- **Languages:** Portuguese