{"id":28753262,"url":"https://github.com/google-deepmind/long-form-factuality","last_synced_at":"2025-06-17T00:39:22.147Z","repository":{"id":230130074,"uuid":"777408994","full_name":"google-deepmind/long-form-factuality","owner":"google-deepmind","description":"Benchmarking long-form factuality in large language models. Original code for our paper \"Long-form factuality in large language models\".","archived":false,"fork":false,"pushed_at":"2025-04-29T16:08:05.000Z","size":773,"stargazers_count":606,"open_issues_count":17,"forks_count":73,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-05-18T04:47:39.848Z","etag":null,"topics":["benchmark","dataset","evaluation","factuality","language","language-modeling","large-language-models","metrics"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2403.18802","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-deepmind.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-25T19:52:34.000Z","updated_at":"2025-05-17T16:36:17.000Z","dependencies_parsed_at":"2024-03-28T02:31:11.837Z","dependency_job_id":"1e74242e-f088-494d-9231-204c13b2d97b","html_url":"https://github.com/google-deepmind/long-form-factuality","commit_stats":null,"previous_names":["google-deepmind/long-form-factuality"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/google-deepmind/long-form-factuality","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Flong-form-factuality","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Flong-form-factuality/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Flong-form-factuality/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Flong-form-factuality/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-deepmind","download_url":"https://codeload.github.com/google-deepmind/long-form-factuality/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-deepmind%2Flong-form-factuality/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260268635,"owners_count":22983601,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","dataset","evaluation","factuality","language","language-modeling","large-language-models","metrics"],"created_at":"2025-06-17T00:39:21.157Z","updated_at":"2025-06-17T00:39:22.122Z","avatar_url":"https://github.com/google-deepmind.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Long-Form Factuality in Large Language Models\n\nThis is the official code release accompanying our paper [\"Long-form factuality in large language models\"](https://arxiv.org/abs/2403.18802).\nThis repository contains:\n\n1. **LongFact**: A prompt set of 2,280 fact-seeking prompts requiring long-form responses.\n2. **Search-Augmented Factuality Evaluator (SAFE)**: Automatic evaluation of model responses in long-form factuality settings.\n3. **F1@K**: Extending F1 score to long-form settings using recall from human-preferred length.\n4. **Experimentation pipeline** for benchmarking OpenAI and Anthropic models using LongFact + SAFE.\n\n## Installation\n\nFirst, clone our GitHub repository.\n\n```bash\ngit clone https://github.com/google-deepmind/long-form-factuality.git\n```\n\nThen navigate to the newly-created folder.\n```bash\ncd long-form-factuality\n```\n\nNext, create a new Python 3.10+ environment using `conda`.\n\n```bash\nconda create --name longfact python=3.10\n```\n\nActivate the newly-created environment.\n\n```bash\nconda activate longfact\n```\n\nAll external package requirements are listed in `requirements.txt`.\nTo install all packages, and run the following command.\n\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n### LongFact\nThe full prompt set for LongFact is available in the `longfact/` folder.\nSee the README in `longfact/` for more details about the dataset.\n\nTo run the data-generation pipeline that we used to generate LongFact, use the following command.\nRefer to the README in `data_creation/` for additional details about the data-generation pipeline.\n\n```bash\npython -m data_creation.pipeline\n```\n\n### SAFE\nOur full implementation of SAFE is located in `eval/safe/`.\nSee the README in `eval/safe/` for more information about how SAFE works.\n\nTo run the pipeline for evaluating SAFE against FActScore human annotations, use the following command.\nRefer to the README in `eval/` for additional details about this experiment.\n\n```bash\npython -m eval.correlation_vs_factscore\n```\n\n### Benchmarking models\nTo benchmark OpenAI and Anthropic models, first add your API keys to `common/shared_config.py` (see README in `common/` for more information; be sure not to publish these keys).\nTo obtain model responses for a given prompt set, use the following command.\nRefer to the README in `main/` for additional details about our main experimentation pipeline.\n\n```bash\npython -m main.pipeline\n```\n\nNext, to evaluate prompt-response pairs from our main experimentation pipeline using SAFE, use the following command, making sure to add the path to the `.json` file containing the prompt-response pairs to be evaluated to the `--result_path` argument.\n\n```bash\npython -m eval.run_eval \\\n    --result_path=\n```\n\n## Unit Tests\n\nEach file in this directory has a corresponding unit test with the `_test` suffix (e.g., `file.py` would have `file_test.py` for unit tests).\nRun commands for individual tests are shown in the unit test files.\nTo run all unit tests, use the following command.\n\n```bash\npython -m unittest discover -s ./ -p \"*_test.py\"\n```\n\n## Citing this work\n\nIf you find our code useful, please cite our [paper](https://arxiv.org/abs/2403.18802):\n\n```bibtex\n@misc{wei2024long,\n  title={Long-form factuality in large language models},\n  author={Wei, Jerry and Yang, Chengrun and Song, Xinying and Lu, Yifeng and Hu, Nathan and Huang, Jie and Tran, Dustin and Peng, Daiyi and Liu, Ruibo and Huang, Da and Du, Cosmo and Le, Quoc V.},\n  year={2024},\n  url={https://arxiv.org/abs/2403.18802},\n}\n```\n\n## License and disclaimer\n\nCopyright 2024 DeepMind Technologies Limited\n\nAll software is licensed under the Apache License, Version 2.0 (Apache 2.0);\nyou may not use this file except in compliance with the Apache 2.0 license.\nYou may obtain a copy of the Apache 2.0 license at:\nhttps://www.apache.org/licenses/LICENSE-2.0\n\nAll other materials are licensed under the Creative Commons Attribution 4.0\nInternational License (CC-BY). You may obtain a copy of the CC-BY license at:\nhttps://creativecommons.org/licenses/by/4.0/legalcode\n\nUnless required by applicable law or agreed to in writing, all software and\nmaterials distributed here under the Apache 2.0 or CC-BY licenses are\ndistributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND,\neither express or implied. See the licenses for the specific language governing\npermissions and limitations under those licenses.\n\nThis is not an official Google product.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-deepmind%2Flong-form-factuality","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-deepmind%2Flong-form-factuality","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-deepmind%2Flong-form-factuality/lists"}