{"id":19243834,"url":"https://github.com/openmoss/say-i-dont-know","last_synced_at":"2025-07-11T17:43:17.067Z","repository":{"id":219882355,"uuid":"744805634","full_name":"OpenMOSS/Say-I-Dont-Know","owner":"OpenMOSS","description":"[ICML'2024] Can AI Assistants Know What They Don't Know?","archived":false,"fork":false,"pushed_at":"2024-02-05T06:39:57.000Z","size":8334,"stargazers_count":79,"open_issues_count":4,"forks_count":8,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-21T09:52:13.743Z","etag":null,"topics":["alignment","large-language-models","truthfulness"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/2401.13275.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenMOSS.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-01-18T03:18:26.000Z","updated_at":"2025-03-27T15:51:04.000Z","dependencies_parsed_at":"2025-04-21T09:43:39.835Z","dependency_job_id":null,"html_url":"https://github.com/OpenMOSS/Say-I-Dont-Know","commit_stats":null,"previous_names":["openmoss/say-i-dont-know"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/OpenMOSS/Say-I-Dont-Know","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FSay-I-Dont-Know","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FSay-I-Dont-Know/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FSay-I-Dont-Know/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FSay-I-Dont-Know/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenMOSS","download_url":"https://codeload.github.com/OpenMOSS/Say-I-Dont-Know/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenMOSS%2FSay-I-Dont-Know/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264864083,"owners_count":23675290,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","large-language-models","truthfulness"],"created_at":"2024-11-09T17:20:29.925Z","updated_at":"2025-07-11T17:43:16.983Z","avatar_url":"https://github.com/OpenMOSS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Say-I-Dont-Know\n## Introduction\nThe \"Say-I-Dont-Know\" project primarily investigates whether AI assistants based on large language models can perceive the boundaries of their own knowledge and express this understanding through natural language. \nThis repository contains the code, data and model checkpoints for our paper \"[Can AI Assistants Know What They Don't Know?](https://arxiv.org/pdf/2401.13275.pdf)\".\n\n![](figures/Knowledge_quadrants.png)\n\nThe AI assistant’s perception of its own knowledge can be represented through knowledge quadrants.\nThe knowledge quadrant is a partition which can divide the knowledge into four categories: Known Knowns, Known Unknowns, Unknown Knowns and Unknown Unknowns, as shown in the Figure above.\nIn this project, we develope model-specific Idk (\"I don't know\") dataset for the AI assistant, and by utilizing this Idk dataset, we aim to align the assistant to refuse answering questions that it does not know and answer questions that it knows.\nConsequently, this transforms knowledge from Unknown-Unknowns and Unknown-Knowns to Known-Knowns and Known-Unknowns, thereby enhancing the truthfulness of the AI assistant.\n\n## Idk Dataset and Preference Data\n![](figures/construct_idk_and_preference_data.png)\nThe process of constructing Idk dataset and preference data is shown in the Figure above. We release four Idk datasets, corresponding to Llama-2-7b-chat, Baichuan2-7B-Chat, Mistral-7B-Instruct-v0.1, and Llama-2-70b-chat. We also release the preference data for Llama-2-7b-chat when the Ik threshold is set to 1.0.\n\n## Teaching AI Assistants to Say I Don't Know\nOur code is primarily based on [llama-recipes](https://github.com/facebookresearch/llama-recipes), [DPO](https://github.com/eric-mitchell/direct-preference-optimization) and [DeepSpeed-Chat](https://github.com/microsoft/DeepSpeedExamples/tree/master/applications/DeepSpeed-Chat).\n\n### Preparing the Environment\n```bash\ngit clone https://github.com/OpenMOSS/Say-I-Dont-Know.git\ncd Say-I-Dont-Know\npip install -U pip setuptools\npip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .\n```\n\n### Preprocessing the Idk Dataset\nSee [Idk_datasets/README.md](Idk_datasets) for details.\n\n### Idk-Prompting\nIdk-Prompting directly instruct the AI assistant to refuse its unknown questions through prompts.\nYou can use the following command to generate responses with Idk-Prompting.\n```python\npython src/Inference/infer_llama.py \\\n    --model_name meta-llama/Llama-2-7b-chat-hf \\\n    --batch_size 2 \\\n    --save_name outputs/triviaqa_test_llama_2_7b_chat_threshold_1.0_idk_prompt_greedy_infer.json \\\n    --prompt_file Idk_datasets/sft_data/llama-2-7b-chat/triviaqa_test_threshold_1.0_sft_data.json \\\n    --response_num 1 \\\n    --top_k 1 \\\n    --idk_prompt True\n```\n\n### Idk-SFT\nYou can use the following command to train an Idk-SFT model. \n```python\ntorchrun --nproc_per_node=4 \\\n    --nnodes=1 \\\n    src/llama_recipes/finetuning.py \\\n    --enable_fsdp \\\n    --model_name meta-llama/Llama-2-7b-chat-hf \\\n    --dist_checkpoint_root_folder model_checkpoints \\\n    --dist_checkpoint_folder llama_2_7b_chat_Idk_sft \\\n    --fsdp_config.pure_bf16 \\\n    --dataset Triviaqa_llama2_7b_chat_threshold_1_0 \\\n    --gradient_accumulation_steps 2 \\\n    --batch_size_training 4 \\\n    --num_epochs 10 \\\n    --lr 2e-5\n```\n\nAfter training, you need to convert the fsdp checkpoint to a consloidated checkpoint, using the following command:\n```python\npython -m llama_recipes.inference.checkpoint_converter_fsdp_hf \\\n    --fsdp_checkpoint_path \u003cyour_fsdp_checkpoints_path\u003e \\\n    --consolidated_model_path \u003cyour_consolidated_model_path\u003e \\\n    --HF_model_path_or_name \u003cyour_initial_hf_model_path_or_name\u003e\n```\n\nAt last, you can use the following command to generate responses.\n```python\npython src/Inference/infer_llama.py \\\n    --model_name \u003cyour_result_model\u003e \\\n    --batch_size 2 \\\n    --save_name outputs/triviaqa_test_llama_2_7b_chat_threshold_1.0_idk_sft_greedy_infer.json \\\n    --prompt_file Idk_datasets/sft_data/llama-2-7b-chat/triviaqa_test_threshold_1.0_sft_data.json \\\n    --response_num 1 \\\n    --top_k 1\n```\n\n\n### Idk-BoN\nTo implement Best-of-N sampling, we first train an Idk-SFT model using half of the Idk dataset. Then, we use the trained model to perform sampling on the other half of the Idk data in order to collect preference data. Finally, we utlize the Idk-SFT to initialize a reward model, and train the reward model using the preference data. During inference, we use the reward model to perform Best-of-N sampling.\n\n**Train an Idk-SFT model with half of the Idk dataset**\n```python\ntorchrun --nproc_per_node=4 \\\n    --nnodes=1 \\\n    src/llama_recipes/finetuning.py \\\n    --enable_fsdp \\\n    --model_name meta-llama/Llama-2-7b-chat-hf \\\n    --dist_checkpoint_root_folder model_checkpoints \\\n    --dist_checkpoint_folder llama_2_7b_chat_Idk_sft_half_data \\\n    --fsdp_config.pure_bf16 \\\n    --dataset Triviaqa_llama2_7b_chat_threshold_1_0_half_data \\\n    --gradient_accumulation_steps 2 \\\n    --batch_size_training 4 \\\n    --num_epochs 10 \\\n    --lr 2e-5\n```\n\n**Train a reward model with the preference data**  \nYou need to first process the preference data for reward modeling.\n```python\ncd Idk_datasets\npython process_preference_data.py\ncd ..\n```\n\nThen you can train the reward model.  \n```python\ntorchrun --nproc_per_node 4 \\\n    src/llama_recipes/finetuning.py \\\n    --enable_fsdp \\\n    --model_name \u003cIdk_SFT_model\u003e \\\n    --dist_checkpoint_root_folder model_checkpoints \\\n    --dist_checkpoint_folder llama_2_7b_chat_reward_model \\\n    --fsdp_config.pure_bf16 \\\n    --dataset Triviaqa_llama2_7b_chat_threshold_1_0_preference_data \\\n    --batch_size_training 32 \\\n    --batching_strategy padding \\\n    --lr 9e-6 \\\n    --num_epochs 1 \\\n    --train_ppo_reward_model \\\n    --val_batch_size 2\n```\n\nAfter training, you need to convert the fsdp checkpoint to a consloidated checkpoint, using the following command:\n```python\npython -m llama_recipes.inference.checkpoint_converter_fsdp_hf \\\n    --fsdp_checkpoint_path \u003cyour_fsdp_checkpoints_path\u003e \\\n    --consolidated_model_path \u003cyour_consolidated_model_path\u003e \\\n    --HF_model_path_or_name \u003cyour_initial_hf_model_path_or_name\u003e \\\n    --reward_model True\n```\n\n**Generation using best-of-n sampling**  \nFirst sampling ten candidates for each question.\n```python\npython src/Inference/infer_llama.py \\\n    --model_name meta-llama/Llama-2-7b-chat-hf \\\n    --batch_size 2 \\\n    --save_name outputs/triviaqa_test_llama_2_7b_chat_threshold_1.0_infer_10_candidates.json \\\n    --prompt_file Idk_datasets/sft_data/llama-2-7b-chat/triviaqa_test_threshold_1.0_sft_data.json \\\n    --response_num 10\n```\n\nThen using the reward model to select the best response for each question.\n```python\npython infer_reward_model.py \\\n    --model_name \u003cyour_reward_model\u003e \\\n    --batch_size 1 \\\n    --save_name \u003coutput_file_path\u003e \\\n    --prompt_file outputs/triviaqa_test_llama_2_7b_chat_threshold_1.0_infer_10_candidates.json \\\n    --reject_sampling True\n```\n\n### Idk-DPO\nTo implement, we use the same SFT model and preference data as we used in Idk-BoN. We train Idk-DPO using 8 * 80G A100 GPUs.\n```python\npython -u src/dpo/train.py model=blank_model \\\n    model.name_or_path=result_model \\\n    model.block_name=LlamaDecoderLayer  \\\n    datasets=[triviaqa] \\\n    loss=dpo \\\n    loss.beta=0.1 \\\n    loss.sft_coef_when_dpo=0.01 \\\n    exp_name=llama2-7b-chat_Idk_DPO \\\n    gradient_accumulation_steps=4 \\\n    batch_size=64 \\\n    eval_batch_size=32 \\\n    trainer=FSDPTrainer \\\n    sample_during_eval=false \\\n    model.fsdp_policy_mp=bfloat16\n```\nThe final weights will be saved to path like `.cache/YOUR_USERNAME/llama2-7b-chat_Idk_DPO_2024-01-29_22-19-44_117975/step-BEST/policy.pt`. You can generate responses using the same commands as in Idk-SFT.\n\n### Idk-PPO\nYou can conduct Idk-PPO training using the following command.\n```bash\nbash src/deepspeed-chat-ppo/train.sh \u003coutput_dir\u003e \u003cactor_model_path\u003e \u003creward_model_path\u003e \u003cdata_path\u003e\n```\nYou can generate responses using the same commands as in Idk-SFT.\n\n### Idk-HIR\nYou need to first relabel Idk datasets, as shown in [Idk_datasets](Idk_datasets/README.md). Then you can use the following command to train an Idk-HIR model.\n```python\ntorchrun --nproc_per_node=8 \\\n    --nnodes=1 \\\n    src/llama_recipes/finetuning.py \\\n    --enable_fsdp \\\n    --model_name meta-llama/Llama-2-7b-chat-hf \\\n    --dist_checkpoint_root_folder model_checkpoints \\\n    --dist_checkpoint_folder llama_2_7b_chat_Idk_hir \\\n    --fsdp_config.pure_bf16 \\\n    --dataset Triviaqa_llama2_7b_chat_hir \\\n    --batch_size_training 32 \\\n    --batching_strategy padding \\\n    --num_epochs 3 \\\n    --lr 2e-5\n```\n\n## Evaluation\n![](figures/results.png)\nOur model ouputs can be found at [outputs](outputs). You can use the following command to evaluate the results.\n```python\npython src/evaluation/cal_knowledge_quadrants.py --file_name outputs/triviaqa_test_llama2_7b_chat_idk_sft.json\n```\nYou will get the following results:\n```\nIk-Ik: 38.37\nIk-Idk: 40.59\nIdk-Ik: 11.53\nIdk-Idk: 9.51\nTruthful: 78.96\n```\n\n## Acknowledgements\n- I especially thank Tianxiang Sun, Xiangyang Liu , Wenwei Zhang and other co-authors for their guidance and help. I really enjoy the teamwork with them.\n- Thanks to my advisor, Prof. Xipeng Qiu, for his guidance, support, helping me persevere and complete this work.\n- I am also grateful to Xinyang Pu for her support. I know we'll both make it through.\n\n## Citation\n```\n@misc{cheng2024ai,\n      title={Can AI Assistants Know What They Don't Know?}, \n      author={Qinyuan Cheng and Tianxiang Sun and Xiangyang Liu and Wenwei Zhang and Zhangyue Yin and Shimin Li and Linyang Li and Zhengfu He and Kai Chen and Xipeng Qiu},\n      year={2024},\n      eprint={2401.13275},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Fsay-i-dont-know","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenmoss%2Fsay-i-dont-know","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenmoss%2Fsay-i-dont-know/lists"}