Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dmis-lab/compact
CompAct: Compressing Retrieved Documents Actively for Question Answering
https://github.com/dmis-lab/compact
compression question-answering
Last synced: about 1 month ago
JSON representation
CompAct: Compressing Retrieved Documents Actively for Question Answering
- Host: GitHub
- URL: https://github.com/dmis-lab/compact
- Owner: dmis-lab
- Created: 2024-07-07T14:38:21.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-07-19T08:21:12.000Z (5 months ago)
- Last Synced: 2024-07-19T16:48:03.262Z (5 months ago)
- Topics: compression, question-answering
- Language: Python
- Homepage: https://arxiv.org/abs/2407.09014
- Size: 508 KB
- Stars: 10
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CompAct: Compressing Retrieved Documents Actively for Question Answering (EMNLP 2024)
We propose [**CompAct** (**Comp**ressing Retrieved Documents **Act**ively for Question Answering)](https://arxiv.org/abs/2407.09014), a novel framework that employs an active strategy for compressing extensive documents. **CompAct** dynamically preserves query-related contexts, focusing on the integration of information across documents. See our [paper](https://arxiv.org/abs/2407.09014) for more details.
![](assets/framework.jpg)
## Updates
[July 16. 2024] We have released the code and data.## Installation
To ensure compatibility with other libraries, we recommend using the following versions. You can adjust these based on your environment:
* Python 3.10.9
* PyTorch 2.1.2
* Cuda 11.8We utilize the 'alignment-handbook' for training our model. Please install all required packages as per their instructions. More details can be found [here](https://github.com/huggingface/alignment-handbook).
```bash
# Create and activate a new environment
conda create -n compact python==3.10.9
conda activate compact# Install pytorch
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118# alignment-handbook
cd ./alignment-handbook/
python -m pip install .
python -m pip install flash-attn --no-build-isolation # Flash Attention 2# Install our requirements
pip install -r requirements.txt
```## Quick Usage
Here's a simple example to get you started:
```python
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from utils import create_prompt, parse_output_without_sentencedevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_dir = 'cwyoon99/CompAct-7b'
model = AutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_dir)example = json.load(open('./data/example.json')) # example case with retrieved documents
print(f"question: {example['question']}\nanswer: {example['answer']}")
prev_summary = []
prev_eval = []# actively compress documents until it finds all necessary evidence
for i, iteration in enumerate(example['iterations']):
segment = iteration['documents_list']
document_input = "\n".join(segment)# load previous output
prev_summary_temp = prev_summary[-1] if i != 0 else ""
prev_eval_temp = prev_eval[-1].replace('[INCOMPLETE]', '').strip() if i != 0 else ""# create prompt
input_prompt = create_prompt(example, iteration, i, document_input, prev_summary_temp, prev_eval_temp, tokenizer, eos_token="", add_generation_prompt=True)
# compress
with torch.no_grad():
inputs = tokenizer(input_prompt, return_tensors="pt")
input_ids = inputs.input_ids.to(device)
attention_mask = inputs.attention_mask.to(device)
outputs = model.generate(input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=500, temperature=0, top_p=1.0, pad_token_id=tokenizer.eos_token_id)
iteration['output'] = tokenizer.decode(outputs[0][input_ids.size(1):], skip_special_tokens=True).strip()# parsing
parsed_sections = parse_output_without_sentence(iteration['output'])
prev_summary.append(parsed_sections['summary'])
prev_eval.append(parsed_sections['eval'])
# compressing extensive documents into compact context (under 200 tokens)
print(f"summary of segment {i}: {iteration['summary']}\ntermination of segment {i}: {iteration['eval']}\n")# early termination
if "[COMPLETE]" in iteration['eval']:
break
```## Download
### Model
You can download our model from [huggingface](https://huggingface.co/cwyoon99/CompAct-7b).### Data
We conducted experiments on 5 question-answering benchmark datasets: [HotpotQA](https://github.com/hotpotqa/hotpot), [MuSiQue](https://github.com/StonyBrookNLP/musique), [2wikimultihopQA](https://github.com/Alab-NII/2wikimultihop) (2WikiMQA), [Natural Question](https://github.com/google-research-datasets/natural-questions) (NQ), and [TriviaQA](https://github.com/mandarjoshi90/triviaqa) (TQA).Required data can be downloaded from [this Google Drive](https://drive.google.com/drive/folders/1lTz-hmb2inmU9KswLfkHag5-qRxTVujy?usp=sharing). Place it in the ```.data``` folder.
* **retrieval**: instances with retrieved results for each datasets using different retrievers.
* **preprocessed**: 28k preprocessed instances from HotpotQA train set
* **demos**: Few-shot examples for answering questions.## Inference
After setting up your environment and preparing the data, you can compress retrieved documents and check the end QA performance with the following script:For convenience, you can easliy deploy our model from Huggingface. If you wish to fine-tune the base model, Please refer to the [Training](#training) section.
We also support batch decoding options (```--batch_decoding, --batch_size```) to accelerate the infernce
```bash
CUDA_VISIBLE_DEVICES=0PRE_DIR="[your repository path]"
ret=contriever-msmarco
comp_name=cwyoon99/CompAct-7b
model_name=meta-llama/Meta-Llama-3-8B
cache_dir="[your caching paths]"task=HotpotQA # 2wikimultihop, musique, NQ, TQA
if [ "$task" == "TQA" ]; then
split="test"
else
split="dev"
fiiter=6
segment_size=5CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES python run_prompt.py \
--task $task \
--data_path $PRE_DIR/data/retrieval/$ret"_"$task/$split.json \
--fshot \
--fshot_path $PRE_DIR/data/demos/fshot_$task.json \
--compress_output_dir $PRE_DIR/data/experiments/compress/$ret"_"$task/$split \
--read_output_dir $PRE_DIR/data/experiments/test/$ret"_"$task/$split \
--compressor_name_or_path $comp_name \
--model_name_or_path $model_name \
--cache_dir $cache_dir \
--batch_decoding \
--batch_size 20 \
--read_wo_prev_eval \
--segment_size $segment_size \
--max_iteration $iter \
```
> If you want to use your self-trained model, specify the following arguments:
> --compressor_dir e.g. $PRE_DIR/data/experiments/train/
> --compressor_name_or_path e.g. "[name of trained model]"
> --checkpoint e.g. checkpoint-378## Training
We apply Supervised Fine-Tuning (SFT) using only the subset of [HotpotQA](https://github.com/hotpotqa/hotpot). You may change specific hyperparameters and training arguments in ```./alignment-handbook/recipes/mistral-7b-instruct-v0.2/sft/config_full.yaml```For more information about data and training details, Please refer to our paper.
```bash
cd CompAct/alignment-handbookexport CUDA_VISIBLE_DEVICES="[GPU_ID]" # e.g. 0,1,2,3
export n_processes="[N_GPU]" # e.g. 4# ./scripts/run_sft.sh
CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml --num_processes $n_processes scripts/run_sft.py recipes/mistral-7b-instruct-v0.2/sft/config_full.yaml```
## Citation
```
@article{yoon2024compact,
title={CompAct: Compressing Retrieved Documents Actively for Question Answering},
author={Chanwoong Yoon and Taewhoo Lee and Hyeon Hwang and Minbyul Jeong and Jaewoo Kang},
journal={arXiv preprint arXiv:2407.09014},
year={2024},
url={https://arxiv.org/abs/2407.09014},
}
```## Contact
For more information or any questions of our work, feel free to contact me (cwyoon99 (at) korea.ac.kr or gmail.com).