{"id":18967683,"url":"https://github.com/talschuster/tokenmasker","last_synced_at":"2025-04-19T14:44:48.202Z","repository":{"id":89679749,"uuid":"221317695","full_name":"TalSchuster/TokenMasker","owner":"TalSchuster","description":"Masking tokens to modify the predictions of a pretrained sentence classifier","archived":false,"fork":false,"pushed_at":"2020-02-04T18:40:25.000Z","size":331,"stargazers_count":16,"open_issues_count":1,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-29T08:51:09.118Z","etag":null,"topics":["fact-checking","masker","nlp","rational"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TalSchuster.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-12T21:42:12.000Z","updated_at":"2024-02-11T13:18:07.000Z","dependencies_parsed_at":"2024-06-29T10:15:07.581Z","dependency_job_id":null,"html_url":"https://github.com/TalSchuster/TokenMasker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TalSchuster%2FTokenMasker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TalSchuster%2FTokenMasker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TalSchuster%2FTokenMasker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TalSchuster%2FTokenMasker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TalSchuster","download_url":"https://codeload.github.com/TalSchuster/TokenMasker/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249213752,"owners_count":21231096,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fact-checking","masker","nlp","rational"],"created_at":"2024-11-08T14:44:38.944Z","updated_at":"2025-04-16T07:34:04.977Z","avatar_url":"https://github.com/TalSchuster.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TokenMasker\nThis repository contains the code for the masker module from the paper [Automatic Fact-guided Sentence Modification](https://arxiv.org/pdf/1909.13838.pdf) (AAAI 2020)\n\nThe multiple-encoder pointer-generator is available [here](https://github.com/darsh10/split_encoder_pointer_summarizer).\n\n## Description\nThe goal of the masker is to find the minimal group of tokens can be removed from a sentence in order to modify the relation of it with another setentence. For example, given a pair of claim and evidence sentences, it finds the words to delete from the evidence that will make it neutral with respect to the claim. The neutrality is determined by a pretrained classifier.\n\nFor example ($ symbols a masked token):\n\n* Claim: *Eddie Vedder sings.*\n* Evidence: *He is known for his powerful baritone vocals.*\n* Model's output: *He is known for $ powerful $ $.* \n\nIllustration of the model:\n\n![mask gen](mask_gen.png \"mask gen\")\n\n\n## Setup\nYou'll need the allennlp repo (version 0.8.3)\n```\npip install -r requirements.txt\n```\n\n## Training\n\n### Neutrality classifier\n\nNote - To train a masker with our neutrality pretrained classifier, skip to the next step (the config file has the path to our trained model).\n\n```\nallennlp train allen_configs/esim_fever_wmask.jsonnnet -s trained_neutrality_classsifier --include-package masker_allen_pkg\n```\n\n### Masker\n\n```\nallennlp train allen_configs/mask_generator.jsonnet -s trained_mask_generator --include-package masker_allen_pkg\n```\n\n## Extracting masks\n\n### Trained model\n\nTo get the trained masked model and preprocessed FEVER training data:\n```\nwget https://www.dropbox.com/s/do5jptwmgroencn/model.tar.gz\nwget https://www.dropbox.com/s/o53i6urucny7q03/fever.train_no_nei.tokenized.jsonl\n```\n\n### Command\nTo create masks for the data (add -c to use gpu):\n```\npython model_predictions.py -f model.tar.gz \\\n-i fever.train_no_nei.tokenized.jsonl \\\n-out predictions/fever_train_no_nei.jsonl\n```\n\n# Citation\nIf you find this repository helpful, please cite our paper:\n```\n@inproceedings{shah2020automatic,\n  title={Automatic Fact-guided Sentence Modification},\n  author={Darsh J Shah and Tal Schuster and Regina Barzilay},\n  booktitle={Association for the Advancement of Artificial Intelligence ({AAAI})},\n  year={2020},\n  url={https://arxiv.org/pdf/1909.13838.pdf}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftalschuster%2Ftokenmasker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftalschuster%2Ftokenmasker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftalschuster%2Ftokenmasker/lists"}