https://github.com/jungokasai/IgakuQA

Last synced: 27 days ago
JSON representation

Host: GitHub
URL: https://github.com/jungokasai/IgakuQA
Owner: jungokasai
Created: 2023-03-31T17:22:44.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-03-31T17:29:23.000Z (about 2 years ago)
Last Synced: 2024-02-14T02:13:22.938Z (about 1 year ago)
Language: Python
Size: 1.19 MB
Stars: 32
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-latest-LLM - IgakuQA（Japanese National Medical License Exam）

README

        # Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations 

## Introduction

we evaluate LLMs (GPT-3 and 4 and ChatGPT) on Japanese medical lincensing examinations from the past five years (2018-2022) and release the data as the IGAKU QA (医学 QA) benchmark

## Benchmark Collection

We collect the exam problems and their answers in the past five years (from 2018 through 2022) from the [official website](https://www.mhlw.go.jp/kouseiroudoushou/shikaku_shiken/ishi/) of the Ministry of Health, Labour and Welfare in Japan.

Notice that we do not rely on any translation of sources from other languages (e.g., English) or countries, and the benchmark comes solely from resources that are originally written in Japanese.

See our paper for more detail.

## Baselines

See [our scripts](https://github.com/jungokasai/IgakuQA/tree/main/scripts) that we use for the experiments in our paper. Note that you need an OpenAI API key to run these baselines.

## Citations

### IgakuQA and our evaluations on Japanese medical licensing examinations

```

@misc{jpn-med-exam_gpt4,

  author    = {Jungo Kasai and Yuhei Kasai and Keisuke Sakaguchi and Yutaro Yamada and Dragomir Radev},

  title     = {Evaluating {GPT}-4 and {ChatGPT} on {J}apanese Medical Licensing Examinations},

  year      = {2023},

  url       = {},

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jungokasai/IgakuQA

Awesome Lists containing this project

README