{"id":13711496,"url":"https://github.com/danieldeutsch/sacrerouge","last_synced_at":"2025-05-06T21:31:36.866Z","repository":{"id":41952675,"uuid":"246912886","full_name":"danieldeutsch/sacrerouge","owner":"danieldeutsch","description":"SacreROUGE is a library dedicated to the use and development of text generation evaluation metrics with an emphasis on summarization.","archived":false,"fork":false,"pushed_at":"2022-10-22T17:36:56.000Z","size":895,"stargazers_count":143,"open_issues_count":17,"forks_count":15,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-05-03T08:25:45.093Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danieldeutsch.png","metadata":{"files":{"readme":"Readme.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-12T19:16:05.000Z","updated_at":"2025-03-28T14:46:35.000Z","dependencies_parsed_at":"2022-08-12T00:30:34.303Z","dependency_job_id":null,"html_url":"https://github.com/danieldeutsch/sacrerouge","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Fsacrerouge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Fsacrerouge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Fsacrerouge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danieldeutsch%2Fsacrerouge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danieldeutsch","download_url":"https://codeload.github.com/danieldeutsch/sacrerouge/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252772014,"owners_count":21801830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T23:01:08.855Z","updated_at":"2025-05-06T21:31:36.368Z","avatar_url":"https://github.com/danieldeutsch.png","language":"Python","funding_links":[],"categories":["Repositories"],"sub_categories":[],"readme":"# SacreROUGE\n![Master](https://github.com/danieldeutsch/sacrerouge/workflows/Master/badge.svg?branch=master\u0026event=push)\n\n**New (2022-04-22):**\nThe metric correlation confidence intervals/hypothesis tests from [A Statistical Analysis of Summarization Evaluation Metrics Using Resampling Methods](https://arxiv.org/abs/2104.00054) and the modified system-level correlation calculations from [Re-Examining System-Level Correlations of Automatic Summarization Evaluation Metrics](https://arxiv.org/abs/2204.10216) can more easily be used with the [`nlpstats`](https://nlpstats.readthedocs.io/en/latest/index.html) library. \n\n**New (2021-08-04):**\nWe now have Docker versions of several evaluation metrics included in the library, which makes it even easier to run them as long as you have Docker installed.\nOur implementations are wrappers around the metrics included in the [Repro library](https://github.com/danieldeutsch/repro).\nSee [here](doc/metrics/docker/Readme.md) for more information about the Dockerized metrics.\n\nSacreROUGE is a library dedicated to the development and use of summarization evaluation metrics.\nIt can be viewed as an [AllenNLP](https://github.com/allenai/allennlp) for evaluation metrics (with an emphasis on summarization).\nThe inspiration for the library came from [SacreBLEU](https://github.com/mjpost/sacreBLEU), a library with a standardized implementation of BLEU and dataset readers for common machine translation datasets.\nSee our [paper](https://arxiv.org/abs/2007.05374) for more details or [this Jupyter Notebook](https://colab.research.google.com/drive/1RikOFUEx299c8qxd6IfCLe3KeuLX31I4?usp=sharing) that was presented at the [NLP-OSS 2020](https://nlposs.github.io/2020/) and [Eval4NLP 2020](https://nlpevaluation2020.github.io/) workshops for a demo of the library.\n\nThe development of SacreROUGE was motivated by three problems: \n\n- The official implementations for various evaluation metrics do not use a common interface, so running many of them on a dataset is frustrating and time consuming.\nSacreROUGE wraps many popular evaluation metrics in a common interface so it is straightforward and fast to setup and run a new metric.\n\n- Evaluating metrics can be tricky.\nThere are several different correlation coefficients commonly used, there are different levels at which the correlation can be calculated, and comparing system summaries to human summaries requires implementing jackknifing.\nThe evaluation code in SacreROUGE is shared among all of the metrics, so once a new metric implements the common interface, all of the details of the evaluation are taken care of for free.\n\n- Datasets for evaluating summarization metrics formatted differently and can be hard to parse (e.g., DUC and TAC).\nSacreROUGE addresses this problem by providing dataset readers to load and reformat the data into a common schema.\n\nThe two main uses of SacreROUGE are to evaluate summarization systems and to evaluation the evaluation metrics themselves by calculating their correlations to human judgments.\n\n## Installing\nThe easiest method of using SacreROUGE is to install the [pypi library](https://pypi.org/project/sacrerouge/) via:\n```\npip install sacrerouge\n```\nThis will add a new `sacrerouge` bash command to your path, which serves as the primary interface for the library.\n\n## Tutorials\nWe provide several different tutorials for how to use SacreROUGE based on your use case:\n- [Using SacreROUGE to evaluate a model](doc/tutorials/evaluating-models.md)\n- [Using SacreROUGE to develop and evaluate a new metric](doc/tutorials/developing-metrics.md)\n\n### Setting up a Dataset\nSacreROUGE  contains data to load some summarization datasets and save them in a common format.\nRun the `sacrerouge setup-dataset` command to see the available datasets, or check [here](doc/datasets/datasets.md).\n\n### Data Visualization\nWe have also written two data visualization tools.\nThe [first tool](https://danieldeutsch.github.io/pages/pyramid-visualization.html) visualizes a Pyramid and optional Pyramid annotations on peer summaries.\nIt accepts the `pyramid.jsonl` and `pyramid-annotations.jsonl` files which are saved by some of the dataset readers.\n\nThe [second tool](https://danieldeutsch.github.io/pages/rouge-visualization.html) visualizes the n-gram matches that are used to calculate the ROUGE score.\nIt accepts the `summaries.jsonl` files which are saved by some of the dataset readers.\n\n## Papers\nRelevant publications which are implemented in the SacreROUGE framework include:\n- [Understanding the Extent to which Summarization Evaluation Metrics Measure the Information Quality of Summaries](https://arxiv.org/abs/2010.12495)\n- [Towards Question-Answering as an Automatic Metric for Evaluating the Content Quality of a Summary](https://arxiv.org/abs/2010.00490)\n- [A Statistical Analysis of Summarization Evaluation Metrics using Resampling Methods](https://arxiv.org/abs/2104.00054)\n\n## Help\nIf you have any questions or suggestions, please open an issue or contact me (Dan Deutsch).\n\n## Citation\nIf you use SacreROUGE for your paper, please cite the following paper:\n```\n@inproceedings{deutsch-roth-2020-sacrerouge,\n    title = {{SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics}},\n    author = \"Deutsch, Daniel  and\n      Roth, Dan\",\n    booktitle = \"Proceedings of Second Workshop for NLP Open Source Software (NLP-OSS)\",\n    month = nov,\n    year = \"2020\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/2020.nlposs-1.17\",\n    pages = \"120--125\"\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldeutsch%2Fsacrerouge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanieldeutsch%2Fsacrerouge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanieldeutsch%2Fsacrerouge/lists"}