https://github.com/FreedomIntelligence/HuatuoGPT-o1
Medical o1, Towards medical complex reasoning with LLMs
https://github.com/FreedomIntelligence/HuatuoGPT-o1
Last synced: 9 months ago
JSON representation
Medical o1, Towards medical complex reasoning with LLMs
- Host: GitHub
- URL: https://github.com/FreedomIntelligence/HuatuoGPT-o1
- Owner: FreedomIntelligence
- Created: 2024-12-22T03:25:58.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-20T02:31:13.000Z (over 1 year ago)
- Last Synced: 2025-01-20T03:25:53.216Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 3.73 MB
- Stars: 685
- Watchers: 22
- Forks: 69
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- StarryDivineSky - FreedomIntelligence/HuatuoGPT-o1 - o1是一个用于医学复杂推理的大语言模型,它通过医学验证器引导模型进行推理,并使用强化学习提升推理能力,提供了7B、8B和70B等不同规模的模型,支持英文和中文,模型采用“先思考后回答”的方式,输出包含推理过程和最终答案,并开源了模型、数据和代码。 (A01_文本生成_文本对话 / 大语言对话模型及数据)
- awesome-llm-reasoning-openai-o1-survey - [Github
- awesome-llm-strawberry - CUHK-SZ
- awesome-latest-LLM - Huatuo-o1 - o1-72B) | 72B | apache-2.0 | (Model)
- Awesome-Scientific-Datasets-and-LLMs - HuatuoGPT-o1 - 3.1, Qwen2.5 | N/A | 2024.12 | ✅ | (🤖 Scientific Models / 🧬 Life Sciences)
- awesome-medical-ai-cn - HuatuoGPT-o1 - o1?style=flat-square) | 港中文深圳 | 医疗领域复杂推理能力,Medical o1 思路。 | (中文医疗大模型)
- awesome-ai-for-science - HuatuoGPT-o1 (2025) - Open-source medical large language model for complex clinical reasoning, extending the o1 long-chain-of-thought paradigm to biomedical question answering and diagnostic inference (FreedomIntelligence, 1.3K+ stars) (🤖 Foundation Models for Science / Domain-Specific Models)
- awesome-medical-ai - HuatuoGPT-o1 - o1?style=flat-square) | CUHK-SZ | Medical complex reasoning with o1-style thinking. | (Medical LLMs & Foundation Models)
README
# HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
HuatuoGPT-o1
📃 Paper |🤗 HuatuoGPT-o1-7B |🤗 HuatuoGPT-o1-8B | 🤗 HuatuoGPT-o1-70B | 📚 Data
## ⚡ Introduction
Hello! Welcome to the repository for [HuatuoGPT-o1](https://arxiv.org/pdf/2412.18925)!
**HuatuoGPT-o1** is a medical LLM designed for advanced medical reasoning. It can identify mistakes, explore alternative strategies, and refine its answers. By leveraging verifiable medical problems and a specialized medical verifier, it advances reasoning through:
- Using the verifier to guide the search for a complex reasoning trajectory for fine-tuning LLMs.
- Applying reinforcement learning (PPO) with verifier-based rewards to enhance complex reasoning further.
We open-sourced our models, data, and code here.
## 👨⚕️ Model
- **Model Access**
| | Backbone | Supported Languages | Link |
| -------------------- | ------------ | ----- | --------------------------------------------------------------------- |
| **HuatuoGPT-o1-8B** | LLaMA-3.1-8B | English | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-8B) |
| **HuatuoGPT-o1-70B** | LLaMA-3.1-70B | English | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-70B) |
| **HuatuoGPT-o1-7B** | Qwen2.5-7B | English & Chinese | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B) |
| **HuatuoGPT-o1-72B** | Qwen2.5-72B | English & Chinese | [HF Link](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-72B) |
- **Deploy**
HuatuoGPT-o1 can be used just like `Llama-3.1-8B-Instruct`. You can deploy it with tools like [vllm](https://github.com/vllm-project/vllm) or [Sglang](https://github.com/sgl-project/sglang), or perform direct inference:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("FreedomIntelligence/HuatuoGPT-o1-8B",torch_dtype="auto",device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("FreedomIntelligence/HuatuoGPT-o1-8B")
input_text = "How to stop a cough?"
messages = [{"role": "user", "content": input_text}]
inputs = tokenizer(tokenizer.apply_chat_template(messages, tokenize=False,add_generation_prompt=True
), return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
HuatuoGPT-o1 adopts a *thinks-before-it-answers* approach, with outputs formatted as:
```
## Thinking
[Reasoning process]
## Final Response
[Output]
```
## 📚 Data
- **Data Access**
| Data | Description | Link |
| -------------------------- | ----------------------------------------------------------------- | --------------------------------------------------------------------------------------------- |
| Medical Verifiable Problems | Open-ended medical problems sourced from challenging medical exams, paired with ground-truth answers. | [Link](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-verifiable-problem) |
| SFT Data in Stage 1 | Fine-tuning data generated using GPT-4o, including complex chains of thought (**Complex CoT**) and output (**Response**). | [Link](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT) |
- **Data Construction**
We provide scripts to construct verifiable problems and searching reasoning paths.
**1. Constructing Verifiable Problems from Multi-choice Questions.**
```bash
python construct_verifiable_medical_problems.py --data_path data/demo_data.json --filter_data --model_name gpt-4o --api_key [your api key]
```
**2. Searching Complex Reasoning Paths for SFT**
```bash
python search_for_complex_reasoning_path.py --data_path data/demo_data.json --efficient_search True --max_search_attempts 1 --max_search_depth 2 --model_name gpt-4o --api_key [your api key]
```
## 🚀 Training
- **Stage 1: Supervised Fine-Tuning (SFT)**
Fine-tune the model on an 8-GPU setup:
```bash
accelerate launch --config_file ./configs/deepspeed_zero3.yaml \
--num_processes 8 \
--num_machines 1 \
--machine_rank 0 \
--deepspeed_multinode_launcher standard SFT_stage1.py \
--model_path [meta-llama/Llama-3.1-8B-Instruct] \
--data_path [FreedomIntelligence/medical-o1-reasoning-SFT]
```
- **Stage 2: Reinforcement Learning (RL)**
We provide a simple PPO script using the [trl](https://github.com/huggingface/trl) library. Below is an example for training an 8B model with PPO on an 8-GPU A100 machine. Ensure you first download our [medical verifier](https://huggingface.co/FreedomIntelligence/medical_o1_verifier_3B) as the reward model.
```bash
accelerate launch \
--num_processes 8 \
--num_machines 1 \
--machine_rank 0 \
--config_file ./configs/deepspeed_zero3.yaml \
--deepspeed_multinode_launcher standard RL_stage2.py \
--model_name_or_path [FreedomIntelligence/HuatuoGPT-o1-8B] \
--reward_model_path [FreedomIntelligence/medical_o1_verifier_3B] \
--value_model_path [meta-llama/Llama-3.2-3B-Instruct] \
--dataset_name [FreedomIntelligence/medical-o1-verifiable-problem]\
--response_length 1300 \
--temperature 0.5 \
--local_rollout_forward_batch_size 8 \
--num_ppo_epochs 3 \
--num_mini_batches 1 \
--total_episodes 20000 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--bf16 True \
--output_dir ./ckpts \
--save_strategy steps \
--save_step 20 \
--save_total_limit 1 \
--eval_strategy steps \
--eval_steps 20 \
--kl_coef 0.03 \
--learning_rate 5e-7 \
--warmup_ratio 0.05 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--run_name ppo_medical_o1_8B \
--num_sample_generations -1 \
--report_to wandb
```
## 🧐 Evaluation
1. You first need to install [Sglang](https://github.com/sgl-project/sglang). After installation, deploy the model you want to test using Sglang with the following command:
```bash
log_num=0
model_name="FreedomIntelligence/HuatuoGPT-o1-8B" # Path to the model you are deploying
port=28${log_num}35
CUDA_VISIBLE_DEVICES=0 python -m sglang.launch_server --model-path $model_name --port $port --mem-fraction-static 0.8 --dp 1 --tp 1 > sglang${log_num}.log 2>&1 &
```
2. Wait for the model to be deployed. After deployment, you can run the following code for evaluation. We use prompts that allow the model to respond freely. We find that the extracted results are consistently reliable and broadly cover the intended scope. You can also set the `--strict_prompt` option to use stricter prompts for more precise answer extraction.
```bash
python evaluation/eval.py --model_name $model_name --eval_file evaluation/data/eval_data.json --port $port
```
3. After completing the evaluation, run the following code to stop the Sglang service and release GPU memory.
```bash
bash evaluation/kill_sglang_server.sh
```
The evaluation code above can be used to test most models supported by Sglang.
## 🩺 HuatuoGPT Series
Explore our HuatuoGPT series:
- [**HuatuoGPT**](https://github.com/FreedomIntelligence/HuatuoGPT): Taming Language Models to Be a Doctor
- [**HuatuoGPT-II**](https://github.com/FreedomIntelligence/HuatuoGPT-II): One-stage Training for Medical Adaptation of LLMs
- [**HuatuoGPT-Vision**](https://github.com/FreedomIntelligence/HuatuoGPT-Vision): Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
- [**CoD (Chain-of-Diagnosis)**](https://github.com/FreedomIntelligence/Chain-of-Diagnosis): Towards an Interpretable Medical Agent using Chain of Diagnosis
- [**HuatuoGPT-o1**](https://github.com/FreedomIntelligence/HuatuoGPT-o1): Towards Medical Complex Reasoning with LLMs
## 📖 Citation
```
@misc{chen2024huatuogpto1medicalcomplexreasoning,
title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs},
author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
year={2024},
eprint={2412.18925},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.18925},
}
```
## Star History
[](https://star-history.com/#FreedomIntelligence/HuatuoGPT-o1&Date)