{"id":13604948,"url":"https://github.com/AkariAsai/self-rag","last_synced_at":"2025-04-12T02:32:26.319Z","repository":{"id":201096105,"uuid":"703213640","full_name":"AkariAsai/self-rag","owner":"AkariAsai","description":"This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.","archived":false,"fork":false,"pushed_at":"2024-05-25T11:19:17.000Z","size":2968,"stargazers_count":1806,"open_issues_count":54,"forks_count":167,"subscribers_count":17,"default_branch":"main","last_synced_at":"2024-10-29T15:34:07.274Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://selfrag.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AkariAsai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-10T20:12:05.000Z","updated_at":"2024-10-29T12:44:44.000Z","dependencies_parsed_at":"2023-12-01T01:28:27.959Z","dependency_job_id":"f8e33e91-0281-4d0e-b755-6e50bb68cb81","html_url":"https://github.com/AkariAsai/self-rag","commit_stats":null,"previous_names":["akariasai/self-rag"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkariAsai%2Fself-rag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkariAsai%2Fself-rag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkariAsai%2Fself-rag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AkariAsai%2Fself-rag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AkariAsai","download_url":"https://codeload.github.com/AkariAsai/self-rag/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223489709,"owners_count":17153809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T19:00:52.989Z","updated_at":"2024-11-07T09:31:19.305Z","avatar_url":"https://github.com/AkariAsai.png","language":"Python","funding_links":[],"categories":["English-centric","Python","A01_文本生成_文本对话","2024.10"],"sub_categories":["大语言对话模型及数据","Self-RAG【反思者】"],"readme":"# SELF-RAG: Learning to Retrieve, Generate and Critique through Self-reflection\n\nThis includes the original implementation of [SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection](https://arxiv.org/abs/2310.11511) (ICLR 2024, Oral top 1%) by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.\n\n[Website](https://selfrag.github.io/) | [7B Model](https://huggingface.co/selfrag/selfrag_llama2_7b) | [13B Model](https://huggingface.co/selfrag/selfrag_llama2_13b) | [Paper](https://akariasai.github.io/files/adaptive_retrieval_augmented_lm_arxiv.pdf) | [Training data](https://huggingface.co/datasets/selfrag/selfrag_train_data) | [Twitter summary](https://twitter.com/AkariAsai/status/1715110277077962937) | [Updates](#updates)\n\n**Self-RAG** (Figure right) is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the versatility of LLMs.\n\nUnlike a widely-adopted Retrieval-Augmented Generation (RAG; Figure left) approach, **Self-RAG** retrieves on demand (e.g., can retrieve multiple times or completely skip retrieval) given diverse queries, and criticize its own generation from multiple fine-grained aspects by predicting **reflection tokens** as an integral part of generation.\nWe conduct a segment-wise beam search to select the output that maximizes the utility for diverse preferences.\n\n\n![](images/teaser_self_rag_v8.png)\n\n\nIf you find our code, data, models, or the paper useful, please cite the paper:\n```\n@inproceedings{\nasai2024selfrag,\nauthor={Asai, Akari and Wu, Zeqiu and Wang, Yizhong and Sil, Avirup and Hajishirzi, Hannaneh},\ntitle={Self-{RAG}: Learning to Retrieve, Generate, and Critique through Self-Reflection},\nbooktitle={The Twelfth International Conference on Learning Representations},\nyear={2024},\nurl={https://openreview.net/forum?id=hSyW5go0v8}\n}\n```\n\n## Updates\n- **2023.10**: Initial release of codes, models, and the paper.\n\n## Content\n1. [Installation](#installation)\n2. [Quick Start](#quick-start)\n2. [Retriever setup](#retriever-setup)\n3. [Training](#training)\n4. [Inference](#inference)\n5. [Baselines](#baselines)\n6. [FAQ](#faq)\n7. [Contact](#contact)\n\n\n## Installation\nInstall dependent Python libraries by running the command below.\n\n```\npip install -r requirements.txt\n```\nPlease use the latest version of `vllm`, as the older version may not enable you to set `skip_special_tokens` via `SamplingParam`, which is added by ([this PR](https://github.com/vllm-project/vllm/issues/893)).\n\nYou can also create a conda environment by running the command below.\n\n```\nconda env create -f environment.yml\n```\n\n## Quick start\nYou can download Self-RAG from HuggingFace Hub. For inference, we recommend using [vllm](https://vllm.readthedocs.io/en/latest/) as it significantly speeds up inferences.\n\n```py\nfrom vllm import LLM, SamplingParams\n\nmodel = LLM(\"selfrag/selfrag_llama2_7b\", download_dir=\"/gscratch/h2lab/akari/model_cache\", dtype=\"half\")\nsampling_params = SamplingParams(temperature=0.0, top_p=1.0, max_tokens=100, skip_special_tokens=False)\n\ndef format_prompt(input, paragraph=None):\n  prompt = \"### Instruction:\\n{0}\\n\\n### Response:\\n\".format(input)\n  if paragraph is not None:\n    prompt += \"[Retrieval]\u003cparagraph\u003e{0}\u003c/paragraph\u003e\".format(paragraph)\n  return prompt\n\nquery_1 = \"Leave odd one out: twitter, instagram, whatsapp.\"\nquery_2 = \"Can you tell me the difference between llamas and alpacas?\"\nqueries = [query_1, query_2]\n\n# for a query that doesn't require retrieval\npreds = model.generate([format_prompt(query) for query in queries], sampling_params)\nfor pred in preds:\n  print(\"Model prediction: {0}\".format(pred.outputs[0].text))\n```\n\nOutput:\n```txt\nModel prediction: Twitter, Instagram, and WhatsApp are all social media platforms. [No Retrieval]WhatsApp is the odd one out because it is a messaging app, while Twitter and # Instagram are primarily used for sharing photos and videos.[Utility:5]\u003c/s\u003e\nModel prediction: Sure![Retrieval]\u003cparagraph\u003e\u003cparagraph\u003e\n```\nAs you can see, Self-RAG starts generating responses without retrieval in the first query when it does not require retrieval. On the other hand, Self-RAG output `[Retrieve]` tokens for the second, as this question requires more fine-grained factual grounding.\n\nFor queries that require factual grounding, you can insert a paragraph. Self-RAG can retrieve and insert paragraphs anytime while generating, and recognizes them as long as they are surrounded by context markup special tokens `\u003cparagraph\u003e`, `\u003c/paragraph\u003e`.\n```\n# for a query that needs factual grounding\nprompt = format_prompt(\"Can you tell me the difference between llamas and alpacas?\", \"The alpaca (Lama pacos) is a species of South American camelid mammal. It is similar to, and often confused with, the llama. Alpacas are considerably smaller than llamas, and unlike llamas, they were not bred to be working animals, but were bred specifically for their fiber.\")\npreds = model.generate([prompt], sampling_params)\nprint([pred.outputs[0].text for pred in preds])\n# ['[Relevant]Alpacas are considerably smaller than llamas, and unlike llamas, they were not bred to be working animals, but were bred specifically for their fiber.[Fully supported][Utility:5]\u003c/s\u003e']\n```\nSelf-RAG finds the relevant inserted document and generates answers that are fully supported by the evidence.\n\n\n### Run your evaluation using the online retrieval model\n\nYou can also run retrieval on-demand and use it with Self-RAG. As running retrieval over full English Wikipedia requires large RAM and multiple GPUs, we created a subset of Wikipedia, including intro paragraphs of Wikipedia articles only for demo purposes.\n\nFirst, please download the corpus and embeddings (9GB in total).\n\n```\ngit clone git@github.com:AkariAsai/self-rag.git\ncd retrieval_lm\nbash download_demo_corpus.sh\n```\nIf the script does not work, you can download the data from [google drive](https://drive.google.com/file/d/1IYNAkwawfCDiBL27BlBqGssxFQH9vOux/view?usp=share_link) or [HF dataset](https://huggingface.co/datasets/selfrag/selfrag_train_data).\nThen, you can run the script under `retrieval_lm`. We tested the script using on 1 RTX 6000 with 24GB and 100G RAM (but should be runnable with much smaller RAM).\n\n```py\nfrom passage_retrieval import Retriever\nretriever = Retriever({})\nretriever.setup_retriever_demo(\"facebook/contriever-msmarco\", \"enwiki_2020_intro_only/enwiki_2020_dec_intro_only.jsonl\", \"enwiki_2020_intro_only/enwiki_dec_2020_contriever_intro/*\",  n_docs=5, save_or_load_index=False)\nretrieved_documents = retriever.search_document_demo(query_3, 5)\nprompts = [format_prompt(query_3, doc[\"title\"] +\"\\n\"+ doc[\"text\"]) for doc in retrieved_documents]\npreds = model.generate(prompts, sampling_params)\ntop_doc = retriever.search_document_demo(query_3, 1)[0]\nprint(\"Reference: {0}\\nModel prediction: {1}\".format(top_doc[\"title\"] + \"\\n\" + top_doc[\"text\"], preds[0].outputs[0].text))\n```\n\nOutput:\n```txt\nReference: Overfitting\n  In statistics, overfitting is \"the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably\". An overfitted model is a statistical model that contains more parameters than can be justified by the data. The essence of overfitting is to have unknowingly extracted some of the residual variation (i.e., the noise) as if that variation represented underlying model structure. Underfitting occurs when a statistical model cannot adequately capture the underlying structure of the data. An under-fitted model is a model where some parameters or terms that would appear in a correctly specified model are\nModel prediction: [Relevant]Overfitting occurs when a model has too many parameters relative to the amount of data it has been trained on, leading it to memorize the training data too closely and perform poorly on new, unseen data.[Fully supported][Utility:5]\u003c/s\u003e\n\n```\nThe retriever system properly retrieves necessary document and generate fully grounded output.\n\n**Note that this demo uses a smaller corpus and Self-RAG with the full inference algorithm. For a full evaluation, you either need to set up a retriever or download our retrieved results. Please follow instructions at [Inference](#instruction).**\n\n## Retriever Setup\nBy default, we use [Contriever](https://github.com/facebookresearch/contriever) as our retrieval component.\n\n### Download data\nDownload preprocessed passage data used in DPR.\n```\ncd retrieval_lm\nwget https://dl.fbaipublicfiles.com/dpr/wikipedia_split/psgs_w100.tsv.gz\n```\n\nThen, download the generated passages. We use [Contriever-MSMARCO](https://huggingface.co/facebook/contriever-msmarco)\n```\nwget https://dl.fbaipublicfiles.com/contriever/embeddings/contriever-msmarco/wikipedia_embeddings.tar\n```\n\n### Run retriever\nYou can run passage retrieval by running the command below.\n\n```\ncd retrieval_lm\npython passage_retrieval.py \\\n    --model_name_or_path facebook/contriever-msmarco --passages psgs_w100.tsv \\\n    --passages_embeddings \"wikipedia_embeddings/*\" \\\n    --data YOUR_INPUT_FILE  \\\n    --output_dir YOUR_OUTPUT_FILE \\\n    --n_docs 20\n```\nYour input file should be either a `json` or `jsonl`. Each instance must contain either `question` or `instruction`, which will be used as a query during retrieval.\n\n### Generate embeddings for your own data\n\nYou can generate embeddings for your own data by running the following command. (The script is adapted from the Contriever repository.) Note that generating embeddings from a large-scale corpus (\u003e10M docs) can take time, and we recommend running it on multiple GPUs.\n\n```\ncd retrieval_lm\nfor i in {0..3}; do\n  export CUDA_VISIBLE_DEVICES=${i}\n  python generate_passage_embeddings.py  --model_name_or_path facebook/contriever-msmarco \\\n  --output_dir YOUR_OUTPUT_DIR \\\n  --passages YOUR_PASSAGE_DATA --shard_id ${i}  --num_shards 4 \u003e ./log/nohup.my_embeddings.${i} 2\u003e\u00261 \u0026\n```\n\n## Training\n**Self-RAG** trains two models, *Critic* and *Generator*, both of which expand token vocabularies with reflection tokens and are trained with the standard next token prediction objective.\n\n- [Step 1: Critic Data Creation](#collect-reflection-tokens): Generating Critic training data with GPT4.\n- [Step 2: Critic Training](#critic-training):  Training a Critic with new special tokens.\n- [Step 3: Generator Data Creation](#generator-data-creation): Generating Generator training data using Critic and Retriever.\n- [Step 4: Generator Training](#generator-training): Training a Generator with new special tokens.\n\nAlternatively, you can download our training data consisting of 150K instances [here](https://drive.google.com/file/d/10G_FozUV4u27EX0NjwVe-3YMUMeTwuLk/view?usp=share_link).\n\n### Collect reflection tokens\nWe collect training data from GPT-4. The scripts to call GPT-4 for each special token type are available at [data_creation/critic](data_creation/critic).\n\nAlternatively, you can download our training data at [here](https://drive.google.com/file/d/1IN1XcIOYtRIGWITJ4LKRgfITT-uUwk_W/view?usp=share_link).\n\n### Critic training\nOnce you create or download training data, run the command below to fine-tune Llama2-7B on critic training.\n```\ncd data_creation\ntorchrun --nproc_per_node=2 \\\n  --master_port=2568 train_special_tokens.py \\\n  --model_name_or_path meta-llama/Llama-2-7b-hf \\\n  --data_path PATH_TO_TRAIN_DATA_FILE \\\n  --bf16  True \\\n  --output_dir PATH_TO_CRITIC_MODEL \\\n  --num_train_epochs 3  \\\n  --per_device_train_batch_size 1 --per_device_eval_batch_size 1 \\\n  --gradient_accumulation_steps 8 \\\n  --evaluation_strategy \"no\" \\\n  --save_strategy \"steps\" \\\n  --save_steps 300 \\\n  --save_total_limit 1 \\\n  --learning_rate 2e-5 \\\n  --weight_decay 0. \\\n  --warmup_ratio 0.01 \\\n  --lr_scheduler_type \"cosine\" \\\n  --logging_steps 10 \\\n  --fsdp \"full_shard auto_wrap\"\n```\n\n### Generator Data Creation\nThe code to create Generator training data is under [generator_data_creation](data_creation/generator). See the instructions at [README.md](data_creation/generator/README.md).\n\nAlternatively, you can download our training data at [HuggingFace dataset](https://huggingface.co/datasets/selfrag/selfrag_train_data/tree/main) or [here](https://drive.google.com/file/d/10G_FozUV4u27EX0NjwVe-3YMUMeTwuLk/view?usp=share_link)\n\n\n### Generator Training\nFor generator training, we use DeepSpeed to make training more efficient. You can run training by running the script below, after setting the training data path.\n\n```\ncd retrieval_lm\nbash script_finetune_7b.sh\n```\nFor 13B model training, use `training_13b`. We use 8 A100 with 40 GRAM for 7B model training and 4 a100 with 80 GB GRAM for 13B training. 7B should fit 1-2 A100 although training can be slow.\n\n## Inference\nFor the task evaluation conducted in the paper, please download the data [here](https://drive.google.com/file/d/1TLKhWjez63H4uBtgCxyoyJsZi-IMgnDb/view?usp=share_link).\n\nEach file already comes with retrieved documents, so if you don't want to run a retriever as a part of inference, you can simply load the retrieved docs at `contexts`.\n\nBelow, we describe Self-RAG and baselines.\n- [Short-form](#shot_form): run evaluation for short-form generation.\n- [Long-form](#long_form): run evaluations for long-form generations.\n\n### Short-form (PubHealth, ARC-Challenge, TriviaQA, PopQA)\nAs we typically retrieve only once for a short-form generation task, we provide an easy-to-run evaluation script that leverages pre-given documents retrieved by Contriever offline. See the individual command below.\n\n#### Question Answering\n\n```\npython run_short_form.py \\\n--model_name selfrag/selfrag_llama2_7b \\\n--input_file eval_data/popqa_longtail_w_gs.jsonl \\\n--mode MODE --max_new_tokens 100 \\\n--threshold 0.2 \\\n--output_file YOUR_OUTPUT_FILE \\\n--metric match --ndocs 10 --use_groundness --use_utility --use_seqscore \\\n--dtype half\n```\n\n`mode` specifies the inference time mode among `['adaptive_retrieval', 'no_retrieval', 'always_retrieve']`.\n\n- `adaptive_retrieval` retrieves given the `threshold` or Self-RAG prediction\n- `no_retrieval` disables retrieval at inference time\n- `always_retrieve` always retrieves.\n\nFor 13B, you may have an OOM issue if you use a single GPU with 24 GRAM. You can run inference on multiple GPUs by setting `--world_size`.\n\n#### ARC Challenge\n```\npython run_short_form.py \\\n  --model_name selfrag/selfrag_llama2_7b \\\n  --input_file eval_data/arc_challenge_processed.jsonl \\\n  --max_new_tokens 50 --threshold 0.2 \\\n  --output_file OUTPUT_FILE_NAME \\\n  --metric match --ndocs 5 --use_groundness --use_utility --use_seqscore \\\n  --task arc_c\n```\n\n#### PubHealth\n```\npython run_short_form.py \\\n  --model_name selfrag/selfrag_llama2_7b \\\n  --input_file eval_data/health_claims_processed.jsonl \\\n  --max_new_tokens 50 \\\n  --threshold 0.2 --output_file OUTPUT_FILE_NAME \\\n  --metric match --ndocs 5 \\\n  --use_groundness --use_utility --use_seqscore \\\n  --task fever\n```\n\n### Long-form (ASQA, FactScore)\nFor long-form QA, you can either run evaluations with a retrieval model or with pre-given passages.\nCurrently, we are working on reducing run-time memory requirements (DPR / Contriever with the whole English Wikipedia Embeddings requires 100 GB RAM) speeding up for long-form generations, and releasing the inference code using a small set of initial retrieved documents first (~20).\n\n*Note: Our current implementation is specifically designed for evaluations of target task datasets. We are planning to update our code base to make the interface more simple and easier to use. We will announce it when we release another version.*\n\n#### Run inference using pre-retrieved passages\n\nFor ASQA, please run the following command,\n```\npython run_long_form_static.py \\\n  --model_name selfrag/selfrag_llama2_7b \\\n  --ndocs 5 --max_new_tokens 300 --threshold 0.2 \\\n  --use_grounding --use_utility --use_seqscore \\\n  --task asqa --input_file eval_data/asqa_eval_gtr_top100.json \\\n  --output_file YOUR_OUTPUT_FILE_NAME --max_depth 7 --mode always_retrieve \\\n```\n\nFor FactScore,\n\n```\npython run_long_form_static.py \\\n  --model_name selfrag/selfrag_llama2_7b \\\n  --ndocs 5 --max_new_tokens 300 --threshold 0.2 \\\n  --use_grounding --use_utility --use_seqscore \\\n  --task factscore --input_file eval_data/factscore_unlabeled_alpaca_13b_retrieval.jsonl \\\n  --output_file YOUR_OUTPUT_FILE_NAME --max_depth 7 \\\n```\n\n##### Key parameters for long-form generations\nThere are several key parameters related to the inference of Self-RAG.\n- `w_rel` (default 1.0): `w_rel` controls the emphasis on the `isRel` (a critique token on whether retrieved passages are relevant or not) token probability during beam search.\n- `w_sup` (default 1.0): `w_sup` controls the emphasis on the `isSup` (a critique token on whether the generation is supported by the document or not) token probability during beam search.\n- `w_use` (default 0.5): `w_use` controls the emphasis on the `isUse` (a critique token on overall quality) token probability during beam search.\n- `threshold` (default 0.2): this threshold controls the frequency of adaptive retrieval.\n- `max_depth` (default 6): this corresponds to `T` in the paper, which defines the maximum depth of search.\n- `beam_width` (default 2): this controls the size of the beam in the segment-level beam search.\n\nFor more details, please refer to the details (Section 3.3) and analysis (Section 5) in our paper.\n\n#### Run evaluation\nFor long-form evaluations, set up external libraries or repositories to run evaluations.\n\n- `factscore==v0.1.5` (bio)\nPlease follow the instructions at the [FactScore](https://github.com/shmsw25/FActScore) official repository to set up your environment.\n```\npython -m factscore.factscorer --data_path YOUR_OUTPUT_FILE  --model_name retrieval+ChatGPT --cache_dir YOUR_CACHE_DIR --openai_key YOUR_OPEN_AI_KEY --verbose\n```\n\n- [ALCE/ASQA](https://github.com/princeton-nlp/ALCE)\n\nALCE provides a comprehensive evaluation using multiple different metrics for long-form QA. For your first evaluation, install the ALCE repo and download the data.\n```\ngit clone https://github.com/princeton-nlp/ALCE.git\npython3 -m alce_env\ncd ALCE\nbash download_data.sh\n```\n\nFor ASQA, you can run evaluations as follows. Note that ASQA evaluations require T5-XXL (11B)-based NLI module.\n```\npython eval.py --f YOUR_OUTPUT_FILE --citations --qa --mauve\n```\n\n## Baselines\nCode to rerun the baselines is available at [run_baseline_lm.py](https://github.com/AkariAsai/self-rag/blob/main/retrieval_lm/run_baseline_lm.py).\nTo run the retrieval-augmented baselines, make sure to download the task input files with retrieved passages.\n\n\n### Vanilla LM baselines\n\n- Huggingface models\n```\npython run_baseline_lm.py \\\n--model_name meta-llama/Llama-2-7b-hf \\\n--input_file INPUT_FILE_SAME_AS_SELF_RAG \\\n --max_new_tokens 100 --metric match \\\n--result_fp RESULT_FILE_PATH --task qa --prompt_name \"prompt_no_input\"\n```\ne.g., PubHealth\n```\npython run_baseline_lm.py \\\n--model_name meta-llama/Llama-2-7b-hf \\\n--input_file eval_data/health_claims_processed.jsonl \\\n--max_new_tokens 20 \\\n--metric accuracy \\\n--result_fp llama2_7b_pubhealth_results.json \\\n--task fever\n```\n**Note: for PubHealth and ARC, please pass the task names (ARC = `arc_c` and PubHealth = `fever`) to properly set the instruction.**\n- OpenAI APIs\n\nFor OpenAI API models, you also need to set the organization key [here](https://github.com/AkariAsai/self-rag/blob/main/retrieval_lm/run_baseline_lm.py#L12). You also need to have a txt file including your OpenAI API key.\n```\npython run_baseline_lm.py \\\n--model_name gpt-3.5-turbo-0301 \\\n--input_file INPUT_FILE_SAME_AS_SELF_RAG \\\n--max_new_tokens 100 --metric match \\\n--result_fp RESULT_FILE_PATH \\\n --task qa \\\n--api_key YOUR_OPEN_AI_API_KEY_FILE \\\n--prompt_name \"prompt_no_input\"\n```\n\n### Retrieval-augmented baselines\n\n- Huggingface models\n\n```\npython run_baseline_refactor.py \\\n--model_name meta-llama/Llama-2-7b-hf \\\n--input_file INPUT_FILE_SAME_AS_SELF_RAG \\\n --max_new_tokens 100 --metric match \\\n--result_fp RESULT_FILE_PATH --task qa \\\n--mode retrieval \\\n--prompt_name \"prompt_no_input_retrieval\"\n```\n- OpenAI APIs\n```\npython run_baseline_lm.py \\\n--model_name gpt-3.5-turbo-0301 \\\n--input_file INPUT_FILE_SAME_AS_SELF_RAG \\\n--max_new_tokens 100 --metric match \\\n--result_fp RESULT_FILE_PATH \\\n --task qa \\\n--api_key YOUR_OPEN_AI_API_KEY_FILE \\\n--mode retrieval \\\n--prompt_name \"prompt_no_input_retrieval\"\n```\n\n## FAQ\n**Q1: How can I train a new pre-trained LM using Self-RAG scheme?** -- If you are using hugging face transformers, you can simply change the `model_name_or_path` and `tokenizer_name` in our training script, [script_finetune_7b.sh](https://github.com/AkariAsai/self-rag/blob/main/retrieval_lm/script_finetune_7b.sh). If you want to use your own fine-tuning script, please make sure to add the special tokens and mask out the paragraph context, as discussed in [this issue](https://github.com/AkariAsai/self-rag/issues/12)\n\n**Q2: Are you planning to release Mistral-7B-based Self-RAG?** -- Right now I have limited bandwidth to do so, but there is a community-trained version of Self-RAG [SciPhi-Self-RAG-Mistral-7B-32k](https://huggingface.co/SciPhi/SciPhi-Self-RAG-Mistral-7B-32k) on top of Mistral-7B. We will announce if we can train Self-RAG on Mistral-7B and release the checkpoint.\n\n\n\n## Contact\nIf you have questions, please open an issue mentioning @AkariAsai or send an email to akari[at]cs.washington.edu.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAkariAsai%2Fself-rag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAkariAsai%2Fself-rag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAkariAsai%2Fself-rag/lists"}