https://github.com/FreedomIntelligence/HuatuoGPT-o1

Medical o1, Towards medical complex reasoning with LLMs
https://github.com/FreedomIntelligence/HuatuoGPT-o1

Last synced: 10 months ago
JSON representation

Medical o1, Towards medical complex reasoning with LLMs

Host: GitHub
URL: https://github.com/FreedomIntelligence/HuatuoGPT-o1
Owner: FreedomIntelligence
Created: 2024-12-22T03:25:58.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-01-20T02:31:13.000Z (over 1 year ago)
Last Synced: 2025-01-20T03:25:53.216Z (over 1 year ago)
Language: Python
Homepage:
Size: 3.73 MB
Stars: 685
Watchers: 22
Forks: 69
Open Issues: 10
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

StarryDivineSky - FreedomIntelligence/HuatuoGPT-o1 - o1是一个用于医学复杂推理的大语言模型，它通过医学验证器引导模型进行推理，并使用强化学习提升推理能力，提供了7B、8B和70B等不同规模的模型，支持英文和中文，模型采用“先思考后回答”的方式，输出包含推理过程和最终答案，并开源了模型、数据和代码。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
awesome-llm-reasoning-openai-o1-survey - [Github
awesome-llm-strawberry - CUHK-SZ
awesome-latest-LLM - Huatuo-o1 - o1-72B) | 72B | apache-2.0 | (Model)
Awesome-Scientific-Datasets-and-LLMs - HuatuoGPT-o1 - 3.1, Qwen2.5 | N/A | 2024.12 | ✅ | (🤖 Scientific Models / 🧬 Life Sciences)
awesome-medical-ai-cn - HuatuoGPT-o1 - o1?style=flat-square) | 港中文深圳 | 医疗领域复杂推理能力，Medical o1 思路。 | (中文医疗大模型)
awesome-ai-for-science - HuatuoGPT-o1 (2025) - Open-source medical large language model for complex clinical reasoning, extending the o1 long-chain-of-thought paradigm to biomedical question answering and diagnostic inference (FreedomIntelligence, 1.3K+ stars) (🤖 Foundation Models for Science / Domain-Specific Models)
awesome-medical-ai - HuatuoGPT-o1 - o1?style=flat-square) | CUHK-SZ | Medical complex reasoning with o1-style thinking. | (Medical LLMs & Foundation Models)

README

          # HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs





  HuatuoGPT-o1







📃 Paper ｜🤗 HuatuoGPT-o1-7B ｜🤗 HuatuoGPT-o1-8B ｜ 🤗 HuatuoGPT-o1-70B  | 📚 Data



## ⚡ Introduction

Hello! Welcome to the repository for [HuatuoGPT-o1](https://arxiv.org/pdf/2412.18925)!







**HuatuoGPT-o1** is a medical LLM designed for advanced medical reasoning. It can identify mistakes, explore alternative strategies, and refine its answers.  By leveraging verifiable medical problems and a specialized medical verifier, it advances reasoning through:

- Using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs.

- Applying reinforcement learning (PPO) with verifier-based rewards to enhance complex reasoning further.

We open-sourced our models, data, and code here.

## 👨‍⚕️ Model

- **Model Access**

|                      | Backbone     | Supported Languages | Link                                                                  |

| -------------------- | ------------ | ----- | --------------------------------------------------------------------- |

| **HuatuoGPT-o1-8B**  | LLaMA-3.1-8B  | English    | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B) |

| **HuatuoGPT-o1-70B** | LLaMA-3.1-70B | English    | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-70B) |

| **HuatuoGPT-o1-7B**  | Qwen2.5-7B   | English & Chinese | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) |

| **HuatuoGPT-o1-72B** | Qwen2.5-72B  | English & Chinese | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-72B) |

- **Deploy**

HuatuoGPT-o1 can be used just like `Llama-3.1-8B-Instruct`. You can deploy it with tools like [vllm](https://github.com/vllm-project/vllm) or [Sglang](https://github.com/sgl-project/sglang),  or perform direct inference:

```python

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("FreedomIntelligence/HuatuoGPT-o1-8B",torch_dtype="auto",device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("FreedomIntelligence/HuatuoGPT-o1-8B")

input_text = "How to stop a cough?"

messages = [{"role": "user", "content": input_text}]

inputs = tokenizer(tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True

), return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=2048)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

HuatuoGPT-o1 adopts a *thinks-before-it-answers* approach, with outputs formatted as:

```

## Thinking

[Reasoning process]

## Final Response

[Output]

```

## 📚 Data

- **Data Access**

| Data                  | Description                                                                                   | Link                                                                                           |

| -------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |

| Medical Verifiable Problems | Open-ended medical problems sourced from challenging medical exams,  paired with ground-truth answers. | [Link](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem)  |

| SFT Data in Stage 1        | Fine-tuning data generated using GPT-4o, including complex chains of thought (**Complex CoT**) and output (**Response**). | [Link](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT)       |

- **Data Construction**

We provide scripts to construct verifiable problems and searching reasoning paths.

**1. Constructing Verifiable Problems from Multi-choice Questions.** 

```bash

python construct_verifiable_medical_problems.py --data_path  data/demo_data.json --filter_data --model_name gpt-4o --api_key [your api key]

```

**2. Searching Complex Reasoning Paths for SFT**

```bash

python search_for_complex_reasoning_path.py --data_path  data/demo_data.json --efficient_search True  --max_search_attempts 1 --max_search_depth 2 --model_name gpt-4o --api_key [your api key]

```

## 🚀 Training

- **Stage 1: Supervised Fine-Tuning (SFT)**

Fine-tune the model on an 8-GPU setup:

```bash

accelerate launch --config_file ./configs/deepspeed_zero3.yaml \

    --num_processes 8  \

    --num_machines 1 \

    --machine_rank 0 \

    --deepspeed_multinode_launcher standard SFT_stage1.py \

    --model_path [meta-llama/Llama-3.1-8B-Instruct] \

    --data_path [FreedomIntelligence/medical-o1-reasoning-SFT] 

```

- **Stage 2: Reinforcement Learning (RL)**

We provide a simple PPO script using the [trl](https://github.com/huggingface/trl) library. Below is an example for training an 8B model with PPO on an 8-GPU A100 machine. Ensure you first download our [medical verifier](https://huggingface.co/FreedomIntelligence/medical_o1_verifier_3B) as the reward model.

```bash

accelerate launch \

	--num_processes 8 \

	--num_machines 1 \

	--machine_rank 0 \

    --config_file ./configs/deepspeed_zero3.yaml \

	--deepspeed_multinode_launcher standard RL_stage2.py \

    --model_name_or_path [FreedomIntelligence/HuatuoGPT-o1-8B] \

    --reward_model_path [FreedomIntelligence/medical_o1_verifier_3B] \

    --value_model_path [meta-llama/Llama-3.2-3B-Instruct] \

    --dataset_name  [FreedomIntelligence/medical-o1-verifiable-problem]\

    --response_length 1300 \

    --temperature 0.5 \

    --local_rollout_forward_batch_size 8 \

    --num_ppo_epochs 3 \

    --num_mini_batches 1 \

    --total_episodes 20000 \

    --per_device_train_batch_size 1 \

    --gradient_accumulation_steps 16 \

    --bf16 True \

    --output_dir ./ckpts \

    --save_strategy steps \

    --save_step 20 \

    --save_total_limit 1 \

    --eval_strategy steps \

    --eval_steps 20 \

    --kl_coef 0.03 \

    --learning_rate 5e-7 \

    --warmup_ratio 0.05 \

    --gradient_checkpointing True \

    --dataloader_num_workers 4 \

    --run_name ppo_medical_o1_8B \

    --num_sample_generations -1 \

    --report_to wandb

```

## 🧐 Evaluation

1. You first need to install [Sglang](https://github.com/sgl-project/sglang). After installation, deploy the model you want to test using Sglang with the following command:

```bash

log_num=0

model_name="FreedomIntelligence/HuatuoGPT-o1-8B" # Path to the model you are deploying

port=28${log_num}35

CUDA_VISIBLE_DEVICES=0  python -m sglang.launch_server --model-path $model_name --port $port --mem-fraction-static 0.8 --dp 1 --tp 1  > sglang${log_num}.log 2>&1 &

```

2. Wait for the model to be deployed. After deployment, you can run the following code for evaluation. We use prompts that allow the model to respond freely. We find that the extracted results are consistently reliable and broadly cover the intended scope. You can also set the `--strict_prompt` option to use stricter prompts for more precise answer extraction.

```bash

python evaluation/eval.py --model_name $model_name  --eval_file evaluation/data/eval_data.json --port $port 

```

3. After completing the evaluation, run the following code to stop the Sglang service and release GPU memory.

```bash

bash evaluation/kill_sglang_server.sh

```

The evaluation code above can be used to test most models supported by Sglang.

## 🩺 HuatuoGPT Series 

Explore our HuatuoGPT series:

- [**HuatuoGPT**](https://github.com/FreedomIntelligence/HuatuoGPT): Taming Language Models to Be a Doctor

- [**HuatuoGPT-II**](https://github.com/FreedomIntelligence/HuatuoGPT-II): One-stage Training for Medical Adaptation of LLMs

- [**HuatuoGPT-Vision**](https://github.com/FreedomIntelligence/HuatuoGPT-Vision): Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

- [**CoD (Chain-of-Diagnosis)**](https://github.com/FreedomIntelligence/Chain-of-Diagnosis): Towards an Interpretable Medical Agent using Chain of Diagnosis

- [**HuatuoGPT-o1**](https://github.com/FreedomIntelligence/HuatuoGPT-o1): Towards Medical Complex Reasoning with LLMs

## 📖 Citation

```

@misc{chen2024huatuogpto1medicalcomplexreasoning,

      title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs}, 

      author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},

      year={2024},

      eprint={2412.18925},

      archivePrefix={arXiv},

      primaryClass={cs.CL},

      url={https://arxiv.org/abs/2412.18925}, 

}

```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=FreedomIntelligence/HuatuoGPT-o1&type=Date)](https://star-history.com/#FreedomIntelligence/HuatuoGPT-o1&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/FreedomIntelligence/HuatuoGPT-o1

Awesome Lists containing this project

README

HuatuoGPT-o1