{"id":22202448,"url":"https://github.com/abi-antonio/instance-prompt-question-generation","last_synced_at":"2026-05-15T21:04:07.059Z","repository":{"id":265183333,"uuid":"894892532","full_name":"abi-antonio/instance-prompt-question-generation","owner":"abi-antonio","description":"Implementation of Instance-dependent prompting for Question Decomposition","archived":false,"fork":false,"pushed_at":"2024-11-28T05:16:32.000Z","size":417,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-29T07:45:20.432Z","etag":null,"topics":["nlp","prompting","python","pytorch","question-answering","question-generation","transformers"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abi-antonio.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-27T07:31:52.000Z","updated_at":"2024-11-28T05:20:56.000Z","dependencies_parsed_at":"2024-11-28T08:32:14.088Z","dependency_job_id":null,"html_url":"https://github.com/abi-antonio/instance-prompt-question-generation","commit_stats":null,"previous_names":["abi-antonio/instance-prompt-question-generation"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/abi-antonio/instance-prompt-question-generation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abi-antonio%2Finstance-prompt-question-generation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abi-antonio%2Finstance-prompt-question-generation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abi-antonio%2Finstance-prompt-question-generation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abi-antonio%2Finstance-prompt-question-generation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abi-antonio","download_url":"https://codeload.github.com/abi-antonio/instance-prompt-question-generation/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abi-antonio%2Finstance-prompt-question-generation/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274291327,"owners_count":25258157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-09T02:00:10.223Z","response_time":80,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nlp","prompting","python","pytorch","question-answering","question-generation","transformers"],"created_at":"2024-12-02T16:22:11.744Z","updated_at":"2026-05-15T21:04:02.026Z","avatar_url":"https://github.com/abi-antonio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Instance-Driven Prompt Generation for Improved Multi-hop Question Decomposition\n\nThis repository contains the code and supplementary materials for the research paper titled, \"**Instance-Driven Prompt Generation for Improved Multi-hop Question Decomposition**.\" This study investigates reasoning in large language models (LLMs) by employing instance-driven decomposition of complex questions, matching the performance of fine-tuning approaches while requiring only 0.3% additional parameter training.\n\n## Abstract\n\nTo enhance the reasoning ability of LLMs, decomposing complex questions into a series of manageable sub-steps has proven effective. Traditional methods rely on in-context few-shot learning, requiring manually designed exemplars, a process that can be time-consuming and lacks scalability. Alternatively, a question decomposer model can generate sub-questions to enable reasoning through a QA model iterating over each sub-question.\n\nThe **Instance-Driven Prompt (IDP) Generation** method builds on this decomposer approach, generating sub-questions that facilitate logical reasoning and achieve close to fine-tuning performance levels. Empirical evaluations reveal that IDP significantly boosts QA performance compared to in-context learning methods, demonstrating its potential as an efficient and effective strategy for optimizing LLMs in natural language understanding tasks.\n\n## Key Features\n\n- **Instance-Driven Decomposition**: Automatically generates instance-specific sub-questions that enhance LLM reasoning.\n- **Parameter-Efficient**: Achieves near fine-tuning performance with minimal parameter addition (0.3%).\n- **Improved QA Performance**: Empirical evaluation shows enhanced performance over in-context learning approaches.\n\n## Framework\n\u003cimg src=\"img/IDPG-for-QG.png\" width=\"50%\"\u003e\n\u003c!-- ![Question Generation model using instance-based prompting. Red boxes are trained while blue boxes are frozen.](img/IDPG-for-QG.png \"Framework\") --\u003e\n\nThe model is designed to break down complex questions (multi-hop questions, or MHQs) into smaller steps to make it easier to answer them. It uses two main components:\n\n1. **Base Language Model (PLM)**: This is the core AI model that processes and generates responses.\n2. **Prompt Generator**: This creates special, instance-specific prompts tailored to the question being asked.\n\nThe prompt generator takes the MHQ as input and generates a set of prompts, for each layer in the encoder and decoder of the base PLM, where these prompts are concatenated to the input of each layer.\n\nTraining of the prompt generator is follows the work of [IDPG: An Instance-Dependent Prompt Generation Method (Wu et al., NAACL 2022)](https://aclanthology.org/2022.naacl-main.403/)\n\n## Training and Evaluation\n\n### Prerequisites\n- Python 3.8+\n- PyTorch\n- Transformers (Hugging Face)\n- Additional requirements are listed in `requirements.txt`\n\n### Installation\n\n1. Install neccessary packages\n    ```\n    pip install -r requirements.txt\n    ```\n2. Download [Musique](https://github.com/StonyBrookNLP/musique) repository. The evaluate script provided in the Musique dataset is used for evaluation. Install the [dependencies](https://github.com/StonyBrookNLP/musique#installations). The an updated evaluate file needs to be download so that the evaluate function imported in our code. \n    ```\n    !git clone https://github.com/StonyBrookNLP/musique.git\n    !wget https://drive.google.com/file/d/1EpK3p25cYbsNzx4pz_qjzYd5CN1Z15fG/view?usp=drive_link\n    !mv evaluate_v1.py musique/evaluate_v1.py\n    ```\n2. Install Musique related packages :\n   ```bash\n   cd musique/\n   pip install -r requirements.txt\n   ```\n2. Download [paraphrased single hop questions](https://drive.google.com/drive/folders/1of2Iy8DrQ6BDeOZc8zWUkiU0LHRKiLry?usp=sharing) and place in `data/` directory\n\n## Question Generation Training and Evaluation\n\n1. **Training**: \n    To train the IDP model, run:\n   ```bash\n   sh train_qg_idp.sh\n   ```\n\n2. **Evaluation**: \n    To generate sub-questions for the musique dataset, run:\n   ```bash\n   sh evaluate_qg_idp.sh # Instance driven prompt generation\n   sh evaluate_qg_ft.sh # Finetuning and Task Specific Prompts\n   ```\n\n3. **Pre-trained Models**:\n    The pretrained models used in our experiments can be accessed via the links below:\n\n    | Model Name  | Description   | Link  |\n    |---------------------------------------|-------------------------|------------------------------------------------|\n    | **Instance-dependent prompts (0.5M)** | IDPG Question Generator | [Link](https://huggingface.co/abiantonio/musique-shqg-idp-500k) |\n    | **Instance-dependent prompts (2.5M)** | IDPG Question Generator | [Link](https://huggingface.co/abiantonio/musique-shqg-idp-3M)   |\n    | **Task-specific prompts**             | Prefix Tuned Question Generator | [Link](https://huggingface.co/abiantonio/musique-shqg-pt)       |\n    | **Fine-Tuning**                       | Finetuned Question Generator    | [Link](https://huggingface.co/abiantonio/musique-shqg-ft)       |\n    | **Single-Hop QA Model**               | QA model trained on single-hop questions | [Link](https://huggingface.co/abiantonio/shqa-ft-p3-unifiedqa-musique) |\n\n\n## Question Answering Training and Evaluation\n1. **Multi-hop QA Evaluation**:  \n    Use the generated question decompositions for multi-hop question answering:\n   ```bash\n   python inference_t5_nhop_qa_viashqg.py --qa_model_checkpoint \u003cpath to trained SHQA model\u003e --qg_model_checkpoint \u003cpath to trained QG model\u003e\n   ```\n\n   To use our trained models\n   ```bash\n   python inference_t5_nhop_qa_viashqg.py --qa_model_checkpoint abiantonio/shqa-ft-p3-unifiedqa-musique --qg_model_checkpoint abiantonio/musique-shqg-idp-3M --use_gold_context --base_path \u003cBASE_PATH\u003e --experiment_name \"test\" --do_predict --eval_all \n   ```\n\n2. **Multi-hop QA Evalution using Meta's Llama2**:  \n    Make sure you have access to Llama2 via huggingface and set your huggingface credentials\n    ```bash\n    huggingface-cli login --token $HUGGINGFACE_TOKEN\n    ```\n\n    Invoke Llama2's API to generate answers for each set of question decomposition. This may take a while to finish.\n    ```bash\n    sh inference_llama2_nhop_qa_viaqg.sh\n    ```\n\n## Experiment Results\n\n| Method                        | Params | ROUGE-LSUM | ROUGE-1 | ROUGE-2 | BLEU | F1  |\n|-------------------------------|--------|------------|---------|---------|------|-----|\n| **Discrete prompts**          |        |            |         |         |      |     |\n| few-shot-random               | -      | 62.6       | 73.3    | 50.2    | 43.0 | 65.6|\n| few-shot-diverse              | -      | 61.8       | 72.5    | 49.0    | 39.2 | 64.8|\n| few-shot-similar              | -      | 60.7       | 72.1    | 48.0    | 40.0 | 64.4|\n| few-shot-nhops                | -      | 63.1       | 73.1    | 49.1    | 45.3 | 64.9|\n| few-shot-embed                | -      | 63.0       | 72.8    | 49.2    | 44.5 | 64.9|\n| **Continuous prompts**        |        |            |         |         |      |     |\n| Task-specific prompts         | 0.5M   | 64.5       | 72.1    | 51.2    | 40.0 | 71.3|\n| Instance-dependent prompts    | 0.5M   | 71.7       | 78.8    | 60.1    | 48.6 | 78.1|\n| Instance-dependent prompts    | 2.5M   | 73.2       | 79.8    | 61.8    | 52.5 | 79.2|\n| Finetuning                    | 783M   | 73.3       | 80.1    | 62.0    | 53.0 | 79.4|\n\n**Table:** Comparison of question decomposition performance between in-context few-shot learning methods, finetuning, prefix tuning, and instance-dependent prompting (IDP).\n\n| Method                     | ans-f1 | sup-f1 | sup-recall |\n|----------------------------|--------|--------|------------|\n| gold                       | 61.6   | 54.7   | 93.0       |\n| **Few-shot Methods**       |        |        |            |\n| few-shot-random            | 49.2   | 53.2   | 85.3       |\n| few-shot-diverse           | 48.4   | 53.1   | 83.1       |\n| few-shot-similar           | 48.1   | 53.0   | 83.7       |\n| few-shot-nhops             | 48.9   | 53.5   | 87.7       |\n| few-shot-embed             | 50.4   | 52.7   | 86.0       |\n| **Task-Specific Prompts**  | 49.7   | 52.1   | 83.1       |\n| **Fine-tuning**            | 55.2   | 53.7   | 89.5       |\n| **IDPG**                   | 55.3   | 53.5   | 88.6       |\n\n**Table:** Impact of different question decomposition methods on QA performance. The QA model is a fine-tuned UnifiedQA on single-hop questions. The paragraph retriever model is a pre-trained CrossEncoder that selects the top-3 paragraphs based on similarity to the question.\n\n\u003cimg src=\"img/hpo_analysis.png\" width=\"50%\"\u003e\n\n**Figure:** ROUGE Scores for Question Generation Across Various Prompt Length and Hidden Size Configurations. h is hidden size and l is prompt length.\n\n\u003cimg src=\"img/hpo_parameters.png\" width=\"50%\"\u003e\n\n**Figure:** Impact of Prompt Length and Hidden Size Configurations on Model Parameter Size. h is hidden size and l is prompt length.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabi-antonio%2Finstance-prompt-question-generation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabi-antonio%2Finstance-prompt-question-generation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabi-antonio%2Finstance-prompt-question-generation/lists"}