Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/madrugado/attention-based-aspect-extraction
Code for unsupervised aspect extraction, using Keras and its Backends
https://github.com/madrugado/attention-based-aspect-extraction
aspect-extraction deep-learning keras topic-modeling unsupervised-learning
Last synced: about 2 months ago
JSON representation
Code for unsupervised aspect extraction, using Keras and its Backends
- Host: GitHub
- URL: https://github.com/madrugado/attention-based-aspect-extraction
- Owner: madrugado
- License: apache-2.0
- Created: 2019-02-26T11:32:54.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-11-30T10:28:18.000Z (about 4 years ago)
- Last Synced: 2023-04-04T19:47:19.736Z (over 1 year ago)
- Topics: aspect-extraction, deep-learning, keras, topic-modeling, unsupervised-learning
- Language: Python
- Homepage:
- Size: 126 MB
- Stars: 86
- Watchers: 7
- Forks: 21
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# Attention-Based Aspect Extraction
This repository is a fork of [paper authors' repository](https://github.com/ruidan/Unsupervised-Aspect-Extraction) with following code improvements:
* python 3 compliant
* Keras 2 compliant
* Keras backend independentIn addition there is an additional functionality:
* seed words
* no need to specify embedding dimension with external embedding usage model## Dependencies
* keras>=2.0
* tensorflow-gpu>=1.4
* numpy>=1.13This code also tested to work with CNTK and MXNet. With MXNet there were some issues with Keras internals, hope they will be improved in future versions.
## Data and Preprocessing
You can download the original datasets (of Restaurant and Beer domains) in [[Download]](https://drive.google.com/open?id=1qzbTiJ2IL5ATZYNMp2DRkHvbFYsnOVAQ).
For preprocessing, put the decompressed zip file in the main folder and run:
```bash
python preprocess.py
python word2vec.py
```
respectively in `code/`. The preprocessed files and trained word embeddings for each domain will be saved in a folder `preprocessed_data/`.You can also find the pre-processed datasets and the pre-trained word embeddings in [[Download]](https://drive.google.com/open?id=1L4LRi3BWoCqJt5h45J2GIAW9eP_zjiNc).
The zip file should be decompressed and put in the main folder.## Train
For training, run in `code/` folder:
```bash
python train.py \
--emb-name ../preprocessed_data/$domain/w2v_embedding \
--domain $domain \
--out-dir ../output
```
where:
* `$domain` (`restaurant` or `beer`) is the corresponding domain,
* `--emb-name` is the path to the pre-trained word embeddings, it could be just a name of a file, then it will be searched in `../preprocessed_data/$domain/`, otherwise it will be searched by absolute path;
* `--out-dir` is the path of the output directory.You can find more arguments/hyper-parameters defined in [code/train.py] with default values used in our experiments.
After training, two output files will be saved in `../output/$domain/`:
* `aspect.log` contains extracted aspects with top 100 words for each of them.
* `model_param` contains the saved model weights.## Evaluation
For evaluation, run in `code/` folder:
```bash
python evaluation.py \
--domain $domain \
--out-dir ../output
```Note that you should keep the values of arguments for evaluation the same as those for training (except `--emb-name`, you don't need to specify it), as we need to first rebuild the network architecture and then load the saved model weights.
This will output a file `att_weights` that contains the attention weights on all test sentences in `../output/$domain/`.
To assign each test sentence a gold aspect label, you need to first manually map each inferred aspect to a gold aspect label according to its top words, and then uncomment the bottom part in evaluation.py (line 136-144) for evaluaton using F scores.
One example of trained model for the restaurant domain has been put in `pre_trained_model/restaurant/`, and the corresponding aspect mapping has been provided in [code/evaluation.py](code/evaluation.py) (at the bottom).
## Cite
If you use the code, please consider citing original paper:
```tex
@InProceedings{he-EtAl:2017:Long2,
author = {He, Ruidan and Lee, Wee Sun and Ng, Hwee Tou and Dahlmeier, Daniel},
title = {An Unsupervised Neural Attention Model for Aspect Extraction},
booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
month = {July},
year = {2017},
address = {Vancouver, Canada},
publisher = {Association for Computational Linguistics}
}
```