{"id":49488293,"url":"https://github.com/yujiabao/r2a","last_synced_at":"2026-05-01T03:02:27.011Z","repository":{"id":129619913,"uuid":"145450445","full_name":"YujiaBao/R2A","owner":"YujiaBao","description":"\"Deriving Machine Attention from Human Rationales\" EMNLP 2018","archived":false,"fork":false,"pushed_at":"2019-02-15T15:51:59.000Z","size":1488,"stargazers_count":26,"open_issues_count":0,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2023-10-20T23:58:01.061Z","etag":null,"topics":["attention-mechanism","emnlp2018","interpretable-machine-learning","machine-learning"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/1808.09367.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/YujiaBao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-08-20T17:37:50.000Z","updated_at":"2023-10-20T23:58:01.534Z","dependencies_parsed_at":null,"dependency_job_id":"91912c43-8416-49d9-933a-ae6dfd0e2808","html_url":"https://github.com/YujiaBao/R2A","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/YujiaBao/R2A","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YujiaBao%2FR2A","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YujiaBao%2FR2A/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YujiaBao%2FR2A/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YujiaBao%2FR2A/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/YujiaBao","download_url":"https://codeload.github.com/YujiaBao/R2A/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/YujiaBao%2FR2A/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32483406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention-mechanism","emnlp2018","interpretable-machine-learning","machine-learning"],"created_at":"2026-05-01T03:02:25.125Z","updated_at":"2026-05-01T03:02:26.999Z","avatar_url":"https://github.com/YujiaBao.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Deriving Machine Attention from Human Rationales\n\nThis repo contains the code and data of the following paper:\n\n**[Deriving Machine Attention from Human Rationales](https://arxiv.org/pdf/1808.09367.pdf).** *Yujia Bao, Shiyu Chang, Mo Yu and Regina Barzilay. EMNLP 2018.* \n\nIf you find this work useful and use it on your own research, please cite our paper.\n```\n@article{bao2018deriving,\n  title={Deriving Machine Attention from Human Rationales},\n  author={Bao, Yujia and Chang, Shiyu and Yu, Mo and Barzilay, Regina},\n  journal={arXiv preprint arXiv:1808.09367},\n  year={2018}\n}\n```\n\n## Overview\n\nThe R2A model first learns to map binary rationales into continuous attention scores on the source tasks. Then the trained R2A model is used to predict how attention should look like based on human-annotated rationales for the low-resource target task. Finally, we train a target classifier under the supervision of both the annotated labels and the R2A-generated attention. The following figure illustrates our learning pipeline.\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"assets/pipeline.png\" alt=\"drawing\" width=\"500\"/\u003e\n\u003c/p\u003e\n\n\n## Models\n**Instructions to run the code are provided within each directory.**\n+ Directory [`r2a`](r2a/) contains the source code and pre-trained models for our R2A model.\n+ Directory [`rationalization`](rationalization/) contains the code we used for automatic rationale generation.\n\n\n## Data\n### Download\nThe original raw dataset can be found at: [beer review](https://snap.stanford.edu/data/web-BeerAdvocate.html), [hotel review](http://www.cs.virginia.edu/~hw5x/dataset.html).\n\nWe provide the processed data (together with the machine-generated rationales) that we used for all our experiments at [data.zip](https://people.csail.mit.edu/yujia/files/r2a/data.zip). **Important Note:** this data is for research-purpose only.\n\n\n### Usage\n1. Unzip [data.zip](https://people.csail.mit.edu/yujia/files/r2a/data.zip) to the root directory of this repo.\n2. There are three directories under the directory `data`, named as `source`, `target` and `oracle`.\n   + `source` includes all source data files. Each data file is a *tsv* file that contains the following fields: task name, label, text (tokenized and separated by space), rationale label (a sequence of binary integer separated by space).\n   \n     | Task | #train (file) | #dev (file) |\n     | -----|-------|-----|\n     | Beer look | 43,351 (beer0.train) | 10,170 (beer0.dev)\n     | Beer aroma | 39,825 (beer1.train) | 8,772 (beer1.dev)\n     | Beer palate | 30,041 (beer2.train) | 7,152 (beer2.dev)\n\n   + `oracle` contains the data used to derive the oracle attention. The data format is the same as the one in `source`.\n     \n     | Task | #train (file) | #dev (file) |\n     | -----|-------|-----|\n     | Beer look | 32,276 (beer0.train) | 6392 (beer0.dev)\n     | Beer aroma | 28,984 (beer1.train) | 5,720 (beer1.dev)\n     | Beer palate | 25,748 (beer2.train) | 4,994 (beer2.dev)\n     | Hotel location | 14,472 (hotel_Location.train) | 1,813 (hotel_Location.dev) |\n     | Hotel cleanliness | 150,098 (hotel_Cleanliness.train) | 18,764 (hotel_Cleanliness.dev) |\n     | Hotel service | 101,484 (hotel_Service.train) | 12,689 (hotel_Service.dev) |\n     \n   + `target` contains the data for the target tasks. \n      + `hotel_unlabeled.train`, `hotel_unlabeled.dev`: unlabeled data file. Each row is a hotel review. Used for training the domain-invariant encoder of our R2A model.\n      + `*.dev`, `*.test`: target development and test set. The data format is the same as the one in `source`.\n          \n           | Task | #dev (file) | #test (file) |\n           | -----|-------|-----|\n           | Beer look | 200 (beer0.dev) | 4014 (beer0.test)\n           | Beer aroma | 200 (beer1.dev) | 4212 (beer1.test)\n           | Beer palate | 200 (beer2.dev) |  3804 (beer2.test)\n           | Hotel location | 200 (hotel_Location.dev) | 1808 (hotel_Location.test) |\n           | Hotel cleanliness | 200 (hotel_Cleanliness.dev) | 12684 (hotel_Cleanliness.test) |\n           | Hotel service | 200 (hotel_Service.dev) | 18762 (hotel_Service.test) |\n      \n      + `*.train`: target training set. Each data file (except `hotel_unlabeled.dev` and `hotel_unlabeled.train`) is a *tsv* file that contains the following fields: 1) task name, 2) label, 3) text (tokenized and separated by space), 4) rationale label (a sequence of binary integer), 5) R2A-generated attention (a sequence of float), 6) oracle attention (a sequence of float), frequency of a word being highlighted as rationale (a sequence of float).\n        + `beer0.train` (beer look), `beer1.train` (beer aroma), `beer2.train` (beer palate), `hotel_Location.train`, `hotel_Cleanliness.train`, `hotel_Service.train`: Each data file consists of 200 labeled examples with human annotated rationales. The entries for R2A-generated attention and oracle attention are all zero.\n        + `*.pred_att.gold_att.train`: the file contains R2A-generated attention and the oracle attention from pretrained models.\n   \n\n## Dependency\n+ PyTorch 0.4.1\n+ numpy 1.15.1\n+ torchtext 0.2.1\n+ termcolor 1.1.0\n+ tqdm 4.24.0\n+ scikit-learn 0.19.2\n+ spacy 2.0.12\n+ colored 1.3.5\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyujiabao%2Fr2a","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyujiabao%2Fr2a","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyujiabao%2Fr2a/lists"}