https://github.com/xiami2019/HalluQA
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
https://github.com/xiami2019/HalluQA
Last synced: 13 days ago
JSON representation
Dataset and evaluation script for "Evaluating Hallucinations in Chinese Large Language Models"
- Host: GitHub
- URL: https://github.com/xiami2019/HalluQA
- Owner: OpenMOSS
- Created: 2023-10-04T03:01:40.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-28T12:02:29.000Z (about 1 year ago)
- Last Synced: 2024-05-22T08:07:49.906Z (11 months ago)
- Language: Python
- Homepage: https://arxiv.org/pdf/2310.03368.pdf
- Size: 6.05 MB
- Stars: 101
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-llm-eval - HalluQA - hard部分69条,knowledge部分206条,每个问题平均有2.8个正确答案和错误答案标注。为了提高HalluQA的可用性,作者设计了一个使用GPT-4担任评估者的评测方法。具体来说,把幻觉的标准以及作为参考的正确答案以指令的形式输入给GPT-4,让GPT-4判断模型的回复有没有出现幻觉 (2023-11-08) | (Datasets-or-Benchmark / 通用)