{"id":18950165,"url":"https://github.com/salesforce/disentangled-sum","last_synced_at":"2025-04-16T00:31:30.311Z","repository":{"id":43717620,"uuid":"289088789","full_name":"salesforce/disentangled-sum","owner":"salesforce","description":"Resources for the \"What's New? Summarizing Contributions in Scientific Literature\" paper","archived":false,"fork":false,"pushed_at":"2023-06-12T21:27:06.000Z","size":1465,"stargazers_count":7,"open_issues_count":2,"forks_count":6,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-04-08T00:12:40.703Z","etag":null,"topics":["scientific-articles","text-summarization"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2011.03161","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/salesforce.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null}},"created_at":"2020-08-20T19:07:11.000Z","updated_at":"2023-08-07T00:16:49.000Z","dependencies_parsed_at":"2023-09-24T16:57:05.749Z","dependency_job_id":null,"html_url":"https://github.com/salesforce/disentangled-sum","commit_stats":{"total_commits":7,"total_committers":3,"mean_commits":"2.3333333333333335","dds":0.2857142857142857,"last_synced_commit":"adf81dddfe22e7fbb7f8fc0be7a61f1660c53030"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fdisentangled-sum","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fdisentangled-sum/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fdisentangled-sum/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/salesforce%2Fdisentangled-sum/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/salesforce","download_url":"https://codeload.github.com/salesforce/disentangled-sum/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223690020,"owners_count":17186538,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scientific-articles","text-summarization"],"created_at":"2024-11-08T13:21:22.531Z","updated_at":"2024-11-08T13:21:23.331Z","avatar_url":"https://github.com/salesforce.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# What's new? Summarizing Contributions in Scientific Literature\nAuthors: [Hiroaki Hayashi](hiroakih.me), [Wojciech Kryściński](https://twitter.com/iam_wkr), [Bryan McCann](https://bmccann.github.io/), [Nazneen Rajani](http://www.nazneenrajani.com/), [Caiming Xiong](http://cmxiong.com/)\n\n[Paper](https://arxiv.org/abs/2011.03161)\n\n## Introduction\n\nWith thousands of academic articles shared on a daily basis, it has become increasingly difficult to keep up with the latest scientific findings.\nTo overcome this problem, we introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work, making it easier to identify the key findings shared in articles.\nFor this purpose, we extend the S2ORC corpus of academic articles, which spans a diverse set of domains ranging from economics to psychology, by adding disentangled *contribution* and *context* reference labels.\nTogether with the dataset, we introduce and analyze three baseline approaches: \n\n1. a unified model controlled by input code prefixes, \n2. a model with separate generation heads specialized in generating the disentangled outputs, and \n3.  a training strategy that guides the model using additional supervision coming from inbound and outbound citations.\n\nWe also propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.\nThrough a human study involving expert annotators, we show that in 79%, of cases our new task is considered more helpful than traditional scientific paper summarization.\n\n![Model diagram.](assets/model.jpg)\n\n## Table of Contents\n\n1. [Updates](#updates)\n2. [Usage](#usage)\n3. [Citation](#citation)\n4. [License](#license)\n\n## Updates\n\n11/9/20: Initial code release.\n\n## Usage\n\nFirst, install the requirements with `requirements.txt`. Please note that we specify `transformers==2.11.0`.\n\n### Generating the Dataset\n\nFollow the procedure [here](filtering/README.md).\n\n### Training Models \u0026 Inference\n#### Training a model\nUse the corresponding scripts in `scripts/` directory to train and decode from the models after modifying the `--datadir` in the scripts.\nFor example, training ControlCode model would be\n```sh\n./scripts/train_controlcode.sh exp1\n```\nwhere `exp1` is the name of experiment to store all the related data and checkpoints.\n\n#### Decoding from a trained model\nUse the corresponding scripts starting with `eval_`. For example, below is the command for evaluating the model trained with the above command.\n```sh\n./scripts/eval_controlcode.sh exp1 TARGET_TYPE RESULT_DIR SPLIT\n```\n\n### Evaluation\n\u003cimg align=\"right\" width=\"25%\" src=\"assets/eval.jpg\"\u003e\n\nFollow the instruction [here](https://github.com/Yale-LILY/SummEval#setup) for setting up the evaluation tool.\n\nFor our three-way evaluation, each instance require four summaries as shown on the right: gold standard (contribution, context) summaries and generated (contribution, context) summaries.\nROUGE scores between paired summaries in the figure is calculated first and are aggregated to represent relevance, purity, and disentanglement.\n\nAfter decoding texts from different models in different modes, you can run `src/evaluate.py` to get ROUGE and BERTScore results. Usage is as follows:\n```sh\n# Example 1: Comparing references and decoded texts within exp1.\npython evaluate.py \\\n    --exps /path/to/exp1 \\\n    --output-file exp1-ref_exp1-dec.jsonl\n\n# Example 2: Comparing references from exp1 and decoded texts from exp2.\npython evaluate.py \\\n    --exps /path/to/exp1 /path/to/exp2  \\\n    --summary-type reference decoded \\\n    --output-file exp1-ref_exp2-dec.jsonl\n```\n\nUsing the scripts above, you can obtain all the pairwise evaluations associated to each line on the right figure.\nFinally, use `src/calc_metrics.py` to compute the measures used for this task:\n```sh\n# Relevance:\npython calc_metrics.py \\\n    --contrib-rel /path/to/s_con_y_con.jsonl \\\n    --context-rel /path/to/s_ctx_y_ctx.jsonl \\\n    --measure relevance\n\n# Purity:\npython calc_metrics.py \\\n    --contrib-rel /path/to/s_con_y_con.jsonl \\\n    --context-rel /path/to/s_ctx_y_ctx.jsonl \\\n    --contrib-cross /path/to/s_con_y_ctx.jsonl \\\n    --context-cross /path/to/s_ctx_y_con.jsonl \\\n    --measure purity\n\n# Disentanglement:\npython calc_metrics.py \\\n    --contrib-context /path/to/s_con_s_ctx.jsonl \\\n    --measure disentanglement\n```\n\nSee the file for the argument details.\n\n\n\n## Citation\n```\n@article{hayashi2020whats,\n  author    = {Hiroaki Hayashi and Wojciech Kry{\\'s}ci{\\'n}ski and Bryan McCann and Nazneen Rajani and Caiming Xiong},\n  title     = {What's new? Summarizing Contributions in Scientific Literature},\n  journal   = {arXiv preprint arXiv:2011.03161},\n  year      = {2020},\n}\n```\n\n## License\n\nCopyright (c) 2020, Salesforce.com, Inc.\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:\n\n* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.\n\n* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.\n\n* Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fdisentangled-sum","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsalesforce%2Fdisentangled-sum","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsalesforce%2Fdisentangled-sum/lists"}