https://github.com/superbrucejia/gsm8k-consistency
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
https://github.com/superbrucejia/gsm8k-consistency
arithmetic-consistency arithmetic-reasoning factual-consistency foundation-models grade grade-school-math gsm8k large-language-models logical-consistency mathematical-reasoning prompt prompt-engineering prompt-perturbation prompt-toolkit reasoning self-consistency self-consistency-benchmark semantics-consistency semantics-preserving-transformations semantics-similar
Last synced: 3 months ago
JSON representation
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
- Host: GitHub
- URL: https://github.com/superbrucejia/gsm8k-consistency
- Owner: SuperBruceJia
- License: mit
- Created: 2023-12-15T06:44:18.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-31T03:08:18.000Z (over 1 year ago)
- Last Synced: 2025-01-05T06:25:43.958Z (5 months ago)
- Topics: arithmetic-consistency, arithmetic-reasoning, factual-consistency, foundation-models, grade, grade-school-math, gsm8k, large-language-models, logical-consistency, mathematical-reasoning, prompt, prompt-engineering, prompt-perturbation, prompt-toolkit, reasoning, self-consistency, self-consistency-benchmark, semantics-consistency, semantics-preserving-transformations, semantics-similar
- Homepage: https://huggingface.co/datasets/shuyuej/GSM8K-Consistency
- Size: 9.77 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GSM8K-Consistency Benchmark
**GSM8K-Consistency** is a benchmark database for analyzing the consistency of `Arithmetic Reasoning on GSM8K`.## 🚀 The dataset is available on 🤗 [Hugging Face](https://huggingface.co/datasets/shuyuej/GSM8K-Consistency)!
This is a math-problem-related semantics-preserving perturbation benchmark that can be very helpful for evaluating the consistency of arithmetic reasoning capability.## 💻 Dataset Usage
Run the following command to load the data:
```python
from datasets import load_datasetdataset = load_dataset("shuyuej/GSM8K-Consistency")
dataset = dataset['train']
print(dataset)
```Dataset Description:
```python
Dataset({
features: ['id', 'original_question', 'paraphrased_question', 'answer_detail', 'numerical_answer'],
num_rows: 85225
})
```