{"id":13652487,"url":"https://github.com/ChanLiang/CONNER","last_synced_at":"2025-04-23T03:30:58.705Z","repository":{"id":199842643,"uuid":"703869062","full_name":"ChanLiang/CONNER","owner":"ChanLiang","description":"The implementation for EMNLP 2023 paper ”Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators“","archived":false,"fork":false,"pushed_at":"2024-01-22T16:00:29.000Z","size":16532,"stargazers_count":30,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-29T09:00:48.619Z","etag":null,"topics":["chatgpt","emnlp2023","factuality","hallucinations","large-language-models","llama","llm-evaluation","nlg-evaluation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2310.07289","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ChanLiang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-12T04:46:39.000Z","updated_at":"2024-09-20T11:50:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"e458f074-c5f3-417f-85c7-149283794952","html_url":"https://github.com/ChanLiang/CONNER","commit_stats":{"total_commits":12,"total_committers":2,"mean_commits":6.0,"dds":"0.16666666666666663","last_synced_commit":"77f99c876bdc6ca8cb3991210e2ccc2914d4971b"},"previous_names":["chanliang/conner"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChanLiang%2FCONNER","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChanLiang%2FCONNER/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChanLiang%2FCONNER/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ChanLiang%2FCONNER/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ChanLiang","download_url":"https://codeload.github.com/ChanLiang/CONNER/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249620346,"owners_count":21301247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chatgpt","emnlp2023","factuality","hallucinations","large-language-models","llama","llm-evaluation","nlg-evaluation"],"created_at":"2024-08-02T02:00:59.770Z","updated_at":"2025-04-23T03:30:53.696Z","avatar_url":"https://github.com/ChanLiang.png","language":"Python","readme":"# Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators\nWelcome to the repository for our EMNLP 2023 paper, \"Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators.\" In this work, we introduce **CONNER** (COmpreheNsive kNowledge Evaluation fRamework), a systematic approach designed to evaluate the output of Large Language Models (LLMs) across key dimensions such as Factuality, Relevance, Coherence, Informativeness, Helpfulness, and Validity.\n\nHere, you'll find the necessary code and resources to replicate our findings and further explore the potential of LLMs. We hope they help facilitate your work in exploring the frontiers of LLMs with a touch of ease.\n\n## CONNER Framework\n\n\n### Intrinsic Evaluation\n\n- **Factuality:** Assessing the verifiability of the information against external evidence.\n- **Relevance:** Ensuring the knowledge aligns with the user's query intent.\n- **Coherence:** Evaluating the logical flow of information at both sentence and paragraph levels.\n- **Informativeness:** Measuring the novelty or unexpectedness of the knowledge provided.\n\n### Extrinsic Evaluation\n\n- **Helpfulness:** Gauging whether the knowledge aids in enhancing performance on downstream tasks.\n- **Validity:** Certifying the factual accuracy of downstream task results when utilizing the knowledge.\n\n## Getting Started\n\n#### Setting Up the Environment\n\nBegin by setting up your Conda environment with the provided `environment.yaml` file, which will install all necessary packages and dependencies.\n\n```bash\nconda env create -f env/environment.yaml -n CONNER\nconda activate CONNER\n```\nIf you run into any missing packages or dependencies, please install them as needed.\n\n#### Evaluating Your LLMs\nRun the evaluation script that corresponds to your dataset and chosen metric. Replace ${data} with your dataset choice (nq or wow) and ${metric} with one of the following metrics: factuality, relevance, info, coh_sent, coh_para, validity, helpfulness.\n```bash\n# Run evaluation script. Example usage:\n# bash scripts/nq_factuality.sh\n# bash scripts/wow_relevance.sh\nbash scripts/${data}_${metric}.sh\n```\n#### Viewing Results\nOnce you have completed the evaluation, you can easily view the results with our provided script:\n```bash\n# Display the evaluation results. Example usage:\n# bash scripts/nq_factuality_view.sh\n# bash scripts/wow_relevance_view.sh\nbash scripts/${data}_${metric}_view.sh\n```\n\n#### Model Sources\n\nBelow is a list of models utilized in our CONNER framework for each metric:\n\n| Metric               | Model                           | Source                                              |\n|----------------------|---------------------------------|-----------------------------------------------------|\n| Factuality           | NLI-RoBERTa-large, ColBERTv2             | [Hugging Face](https://huggingface.co/sentence-transformers/nli-roberta-large), [GitHub](https://github.com/stanford-futuredata/ColBERT) |\n| Relevance            | BERT-ranking-large              | [GitHub](https://github.com/nyu-dl/dl4marco-bert)                             |\n| Sentence-level Coherence            | GPT-neo-2.7B                    | [Hugging Face](https://huggingface.co/EleutherAI/gpt-neo-2.7B)                |\n| Paragraph-level Coherence           | Coherence-Momentum              | [Hugging Face](https://huggingface.co/aisingapore/coherence-momentum)         |\n| Informativeness      | GPT-neo-2.7B                    | [Hugging Face](https://huggingface.co/EleutherAI/gpt-neo-2.7B)                |\n| Helpfulness          | LLaMA-65B                       | [GitHub](https://github.com/facebookresearch/llama/tree/main)                 |\n| Validity             | NLI-RoBERTa-large, ColBERTv2               | [Hugging Face](https://huggingface.co/sentence-transformers/nli-roberta-large), [GitHub](https://github.com/stanford-futuredata/ColBERT)  |\n\n\n## Citing Our Work\nIf you find our work helpful in your research, please citing our paper:\n```\n@misc{chen2023factuality,\n      title={Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators}, \n      author={Liang Chen and Yang Deng and Yatao Bian and Zeyu Qin and Bingzhe Wu and Tat-Seng Chua and Kam-Fai Wong},\n      year={2023},\n      eprint={2310.07289},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL}\n}\n```\n","funding_links":[],"categories":["Tools"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChanLiang%2FCONNER","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FChanLiang%2FCONNER","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FChanLiang%2FCONNER/lists"}