{"id":19932329,"url":"https://github.com/amazon-science/fact-graph","last_synced_at":"2025-05-03T11:32:08.760Z","repository":{"id":40536231,"uuid":"483355164","full_name":"amazon-science/fact-graph","owner":"amazon-science","description":"Implementation of the paper \"FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)\"","archived":false,"fork":false,"pushed_at":"2023-07-26T03:47:07.000Z","size":2351,"stargazers_count":47,"open_issues_count":9,"forks_count":5,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-07T15:11:09.641Z","etag":null,"topics":["abstractive-summarization","factuality"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amazon-science.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-19T18:00:47.000Z","updated_at":"2024-10-03T10:20:43.000Z","dependencies_parsed_at":"2022-09-20T18:18:43.850Z","dependency_job_id":null,"html_url":"https://github.com/amazon-science/fact-graph","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ffact-graph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ffact-graph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ffact-graph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amazon-science%2Ffact-graph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amazon-science","download_url":"https://codeload.github.com/amazon-science/fact-graph/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252184626,"owners_count":21707987,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["abstractive-summarization","factuality"],"created_at":"2024-11-12T23:09:50.176Z","updated_at":"2025-05-03T11:32:07.875Z","avatar_url":"https://github.com/amazon-science.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)\n\nThis repository contains the code for the paper \"[FactGraph](https://arxiv.org/pdf/2204.06508.pdf): Evaluating Factuality in Summarization with Semantic Graph Representations\". \n\n**FactGraph** is an adapter-based method for assessing factuality that decomposes the document and the summary into structured meaning representations (MR):\n\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/example.png\" width=\"300\"\u003e\n\u003c/p\u003e\n\nIn **FactGraph**, summary and document graphs are encoded by a graph encoder with structure-aware adapters, along with text representations using an adapter-based text encoder. Text and graph encoders use the same pretrained model and only the adapters are trained:\n\u003cp align=\"center\"\u003e\n\u003cimg src=\"images/factgraph.png\" width=\"500\"\u003e\n\u003c/p\u003e\n \n## Environment\n\nThe easiest way to proceed is to create a conda environment:\n```\nconda create -n factgraph python=3.7\nconda activate factgraph\n```\n\nFurther, install PyTorch and PyTorch Geometric:\n\n```\npip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html\npip install torch-scatter==2.0.9 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html\npip install torch-sparse==0.6.12 -f https://data.pyg.org/whl/torch-1.9.0+cu111.html\npip install torch-geometric==2.0.3\n```\n\nInstall the packages required:\n\n```\npip install -r requirements.txt\n```\n\nFinally, create the environment for AMR preprocessing:\n\n```\ncd data/preprocess\n./create_envs_preprocess.sh\ncd ../../\n```\n\n## FactCollect Dataset\n\nFactCollect is created consolidating the following datasets:\n\n| Dataset        | Datapoints |            |\n| ------------- |:-------------:|:-------------:|\n| [Wang et al. (2020)](https://aclanthology.org/2020.acl-main.450.pdf)     | 953 | [Link](https://github.com/W4ngatang/qags/tree/master/data)\n| [Kryscinski et al. (2020)](https://aclanthology.org/2020.emnlp-main.750.pdf)     | 1434 | [Link](https://storage.googleapis.com/sfr-factcc-data-research/unpaired_annotated_data.tar.gz)\n| [Maynez et al. (2020)](https://aclanthology.org/2020.acl-main.173.pdf) |   2500  | [Link](https://github.com/google-research-datasets/xsum_hallucination_annotations)\n| [Pagnoni et al. (2021)](https://aclanthology.org/2021.naacl-main.383.pdf) |  4942 | [Link](https://github.com/artidoro/frank/tree/main/data)\n\n* FactCollect uses two datasets released under licenses.\n  * FactCC is under [BSD-3](https://github.com/amazon-research/fact-graph/blob/main/data/LICENSE-FACTCC.txt). Copyright (c) 2019, Salesforce.com, Inc. All rights reserved.\n  * XSum Hallucinations is under [CC BY 4.0](https://github.com/amazon-research/fact-graph/blob/main/data/LICENSE-XSUM-HAL.txt).\n\nFor generating **FactCollect** dataset, execute:\n\n```\nconda activate factgraph\ncd data\n./create_dataset.sh\ncd ..\n```\n\n# Running trained FactGraph Models\n\nFirst, download **FactGraph** trained checkpoints:\n```\ncd src\n./download_trained_models.sh\n```\n\nTo run **FactGraph**:\n```\n./evaluate.sh factgraph \u003cfile\u003e \u003cgpu_id\u003e\n```\n\nTo run **FactGraph** edge-level:\n```\n./evaluate.sh factgraph-edge \u003cfile\u003e \u003cgpu_id\u003e\n```\n\n`\u003cfile\u003e` is a JSON line file with the following format: \n```\n{'summary': summary1, 'article': article1}\n{'summary': summary2, 'article': article2}\n...\n```\nwhere `'summary'` is a single sentence summary.\n\n# Training FactGraph\n\n## Preprocess\n\nConvert the dataset into the format required for the model:\n\n```\ncd data/preprocess\n./process_dataset_for_model.sh \u003cgpu_id\u003e\ncd ../../\n```\n\nThis step generated AMR graphs using the [SPRING model](https://github.com/SapienzaNLP/spring). Check their [repository](https://github.com/SapienzaNLP/spring) for more details.\n\nDownload the pretrained parameters of the adapters:\n```\ncd src\n./download_pretrained_adapters.sh\n```\n\n## Training\n\nFor training **FactGraph** using the **FactCollect** dataset, execute:\n```\nconda activate factgraph\n./train.sh \u003cgpu_id\u003e \n```\n\n## Predicting\n\nFor predicting, run:\n```\n./predict.sh \u003ccheckpoint_folder\u003e \u003cgpu_id\u003e\n```\n\n# Training FactGraph - Edge-level\n\n## Preprocess\n\nDownload the files *train.tsv* and *test.tsv* from this [link](https://drive.google.com/drive/folders/1BxUVnc7ov9PL7nxP7sS9ZUXCYo877Bcx?usp=sharing) provided by [Goyal and Durrett (2021)](https://arxiv.org/pdf/2104.04302.pdf). Copy those files to `data\\edge_level_data`\n\nConvert the dataset into the format required for the model:\n\n```\ncd data/preprocess\n./process_dataset_for_edge_model.sh \u003cgpu_id\u003e\ncd ../../\n```\n\n## Training\n\nFor training **FactGraph** using the **FactCollect** dataset, execute:\n```\nconda activate factgraph\n./train_edgelevel.sh \u003cgpu_id\u003e\n```\n\n## Predicting\n\nFor predicting, run:\n```\n./predict_edgelevel.sh \u003ccheckpoint_folder\u003e \u003cgpu_id\u003e\n```\n\n## Trained Models\n\nA **FactGraph** checkpoint trained on **FactCollect** dataset can be found [here](https://public.ukp.informatik.tu-darmstadt.de/ribeiro/factgraph/factgraph.tar.gz). Test set results:\n```\n {'accuracy': 0.89, 'bacc': 0.8904, 'f1': 0.89, 'size': 600, 'cnndm': {'bacc': 0.7717, 'f1': 0.8649, 'size': 370}, 'xsum': {'bacc': 0.6833, 'f1': 0.9304, 'size': 230}}\n```\n\nA **FactGraph-edge** checkpoint trained on the **Maynez** dataset can be found [here](https://public.ukp.informatik.tu-darmstadt.de/ribeiro/factgraph/factgraph-edge.tar.gz). This checkpoint was selected using the test set. Test set results:\n```\n {'accuracy': 0.8371, 'bacc': 0.8447, 'f1': 0.8371, 'f1_macro': 0.7362, 'accuracy_edge': 0.6948, 'bacc_edge': 0.6592, 'f1_edge': 0.6948}\n```\n\n## Security\n\nSee [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.\n\n## License Summary\n\nThe documentation is made available under under the CC-BY-NC-4.0 License. See the LICENSE file.\n\n## Citation\n\n```\n@inproceedings{ribeiro-etal-2022-factgraph,\n    title = \"FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations\",\n    author = \"Ribeiro, Leonardo F. R.  and\n      Liu, Mengwen  and\n      Gurevych, Iryna and\n      Dreyer Markus and\n      Bansal, Mohit\",\n      booktitle = \"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies\",\n      year={2022}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Ffact-graph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famazon-science%2Ffact-graph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famazon-science%2Ffact-graph/lists"}