{"id":19296613,"url":"https://github.com/madrugado/attention-based-aspect-extraction","last_synced_at":"2025-10-04T13:09:00.478Z","repository":{"id":37677525,"uuid":"172699239","full_name":"madrugado/Attention-Based-Aspect-Extraction","owner":"madrugado","description":"Code for unsupervised aspect extraction, using Keras and its Backends","archived":false,"fork":false,"pushed_at":"2023-07-06T21:36:14.000Z","size":132540,"stargazers_count":91,"open_issues_count":14,"forks_count":22,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-22T08:43:29.042Z","etag":null,"topics":["aspect-extraction","deep-learning","keras","topic-modeling","unsupervised-learning"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/madrugado.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-02-26T11:32:54.000Z","updated_at":"2025-03-01T00:04:15.000Z","dependencies_parsed_at":"2025-04-22T08:35:53.639Z","dependency_job_id":"c274e63a-28a4-4ead-90aa-ade14e45c4f0","html_url":"https://github.com/madrugado/Attention-Based-Aspect-Extraction","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/madrugado/Attention-Based-Aspect-Extraction","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madrugado%2FAttention-Based-Aspect-Extraction","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madrugado%2FAttention-Based-Aspect-Extraction/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madrugado%2FAttention-Based-Aspect-Extraction/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madrugado%2FAttention-Based-Aspect-Extraction/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/madrugado","download_url":"https://codeload.github.com/madrugado/Attention-Based-Aspect-Extraction/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/madrugado%2FAttention-Based-Aspect-Extraction/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278315967,"owners_count":25966895,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-04T02:00:05.491Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aspect-extraction","deep-learning","keras","topic-modeling","unsupervised-learning"],"created_at":"2024-11-09T22:48:22.053Z","updated_at":"2025-10-04T13:09:00.451Z","avatar_url":"https://github.com/madrugado.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Attention-Based Aspect Extraction\n\nThis repository is a fork of [paper authors' repository](https://github.com/ruidan/Unsupervised-Aspect-Extraction) with following code improvements:\n\n* python 3 compliant\n* Keras 2 compliant\n* Keras backend independent\n\nIn addition there is an additional functionality:\n\n* seed words\n* no need to specify embedding dimension with external embedding usage model\n\n## Dependencies\n\n* keras\u003e=2.0\n* tensorflow-gpu\u003e=1.4\n* numpy\u003e=1.13\n\nThis code also tested to work with CNTK and MXNet. With MXNet there were some issues with Keras internals, hope they will be improved in future versions.\n\n## Data and Preprocessing\n\nYou can download the original datasets (of Restaurant and Beer domains) in [[Download]](https://drive.google.com/open?id=1qzbTiJ2IL5ATZYNMp2DRkHvbFYsnOVAQ).\n\nFor preprocessing, put the decompressed zip file in the main folder and run:\n```bash\npython preprocess.py\npython word2vec.py\n```\nrespectively in `code/`. The preprocessed files and trained word embeddings for each domain will be saved in a folder `preprocessed_data/`.\n\nYou can also find the pre-processed datasets and the pre-trained word embeddings in [[Download]](https://drive.google.com/open?id=1L4LRi3BWoCqJt5h45J2GIAW9eP_zjiNc).\nThe zip file should be decompressed and put in the main folder.\n\n## Train\n\nFor training, run in `code/` folder:\n\n```bash\npython train.py \\\n--emb-name ../preprocessed_data/$domain/w2v_embedding \\\n--domain $domain \\\n--out-dir ../output\n```\nwhere:\n* `$domain` (`restaurant` or `beer`) is the corresponding domain,\n* `--emb-name` is the path to the pre-trained word embeddings, it could be just a name of a file, then it will be searched in `../preprocessed_data/$domain/`, otherwise it will be searched by absolute path;\n* `--out-dir` is the path of the output directory.\n\nYou can find more arguments/hyper-parameters defined in [code/train.py] with default values used in our experiments.\n\nAfter training, two output files will be saved in `../output/$domain/`:\n* `aspect.log` contains extracted aspects with top 100 words for each of them.\n* `model_param` contains the saved model weights.\n\n## Evaluation\n\nFor evaluation, run in `code/` folder:\n\n```bash\npython evaluation.py \\\n--domain $domain \\\n--out-dir ../output\n```\n\nNote that you should keep the values of arguments for evaluation the same as those for training (except `--emb-name`, you don't need to specify it), as we need to first rebuild the network architecture and then load the saved model weights.\n\nThis will output a file `att_weights` that contains the attention weights on all test sentences in `../output/$domain/`.\n\nTo assign each test sentence a gold aspect label, you need to first manually map each inferred aspect to a gold aspect label according to its top words, and then uncomment the bottom part in evaluation.py (line 136-144) for evaluaton using F scores.\n\nOne example of trained model for the restaurant domain has been put in `pre_trained_model/restaurant/`, and the corresponding aspect mapping has been provided in [code/evaluation.py](code/evaluation.py) (at the bottom).\n\n## Cite\n\nIf you use the code, please consider citing original paper:\n\n```tex\n@InProceedings{he-EtAl:2017:Long2,\n  author    = {He, Ruidan  and  Lee, Wee Sun  and  Ng, Hwee Tou  and  Dahlmeier, Daniel},\n  title     = {An Unsupervised Neural Attention Model for Aspect Extraction},\n  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n  month     = {July},\n  year      = {2017},\n  address   = {Vancouver, Canada},\n  publisher = {Association for Computational Linguistics}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadrugado%2Fattention-based-aspect-extraction","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmadrugado%2Fattention-based-aspect-extraction","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmadrugado%2Fattention-based-aspect-extraction/lists"}