Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/tomhosking/hercules

Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)
https://github.com/tomhosking/hercules

nlp opinion-summarization summarization vq-vae

Last synced: 24 days ago
JSON representation

Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)

Host: GitHub
URL: https://github.com/tomhosking/hercules
Owner: tomhosking
License: mit
Created: 2022-12-02T15:13:22.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2023-11-08T15:55:10.000Z (about 1 year ago)
Last Synced: 2024-10-08T13:26:43.624Z (about 1 month ago)
Topics: nlp, opinion-summarization, summarization, vq-vae
Language: Python
Homepage: http://tomho.sk/hercules
Size: 81.4 MB
Stars: 17
Watchers: 1
Forks: 3
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # [Hercules: Attributable and Scalable Opinion Summarization](https://aclanthology.org/2023.acl-long.473/)

	

[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/attributable-and-scalable-opinion/unsupervised-opinion-summarization-on-space)](https://paperswithcode.com/sota/unsupervised-opinion-summarization-on-space?p=attributable-and-scalable-opinion)



Code for the paper "[Attributable and Scalable Opinion Summarization](https://aclanthology.org/2023.acl-long.473/)", Tom Hosking, Hao Tang & Mirella Lapata (ACL 2023).

By representing sentences from reviews as paths through a discrete hierarchy, we can generate abstractive summaries that are informative, attributable and scale to hundreds of input reviews.

## Setup

Create a fresh environment:

```

conda create -n herculesenv python=3.9

conda activate herculesenv

```

or

```

python3 -m venv herculesenv

source herculesenv/bin/activate

```

Then install dependencies:

```

pip install -r requirements.txt

```

Download data/models:

 - Space -> `./data/opagg/`

 - AmaSum -> `./data/opagg/`

 - [Trained checkpoints](http://tomho.sk/hercules/models/) -> `./models`

Tested with Python 3.9.

## Evaluation with trained models

See [`./examples/Space-Eval.ipynb`](examples/Space-Eval.ipynb)

or 

```python

from torchseq.utils.model_loader import model_from_path

from torchseq.metric_hooks.hrq_agg import HRQAggregationMetricHook

model_slug = 'hercules_space' # Which model to load?

instance = model_from_path('./models/' + model_slug, output_path='./runs/', data_path='./data/', silent=True)

scores, res = HRQAggregationMetricHook.eval_generate_summaries_and_score(instance.config, instance, test=True)

print("Model {:}: Abstractive R2 = {:0.2f}, Extractive R2 = {:0.2f}".format(model_slug, scores['abstractive']['rouge2'], scores['extractive']['rouge2']))

```

## Training on SPACE/AmaSum from scratch

To train on SPACE, download the datasets (as above) then you should just be able to run:

```

torchseq --train --reload_after_train --validate --config ./configs/hercules_space.json

```

## Training on a new dataset (WIP)

You will need to: 

- [ ] Install `allennlp==2.10.1` and `allennlp-models==2.10.1` via pip (ignore the warnings about version conflicts)

- [ ] Make a copy of your dataset in a format expected by the script below

- [ ] Run the dataset filtering scripts `./scripts/opagg_filter_space.py` and `./scripts/opagg_filter_space_eval.py`

- [ ] Run the script to generate training pairs `./scripts/generate_opagg_pairs.py`

- [ ] Make a copy of one of the training configs and update to point at your data

- [ ] Finally, train the model!

```

torchseq --train --reload_after_train --validate --config ./configs/{YOUR_CONFIG}.json

```

Please feel free to raise a Github issue or email me if you run into any difficulties!

## Citation

```

@inproceedings{hosking-etal-2023-attributable,

  title = "Attributable and Scalable Opinion Summarization",

  author = "Hosking, Tom  and

      Tang, Hao  and

      Lapata, Mirella",

  booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",

  month = jul,

  year = "2023",

  address = "Toronto, Canada",

  publisher = "Association for Computational Linguistics",

  url = "https://aclanthology.org/2023.acl-long.473",

  pages = "8488--8505",

}

````