{"id":15013314,"url":"https://github.com/maluuba/nlg-eval","last_synced_at":"2025-05-15T17:07:56.903Z","repository":{"id":43750786,"uuid":"95585342","full_name":"Maluuba/nlg-eval","owner":"Maluuba","description":"Evaluation code for various unsupervised automated metrics for Natural Language Generation.","archived":false,"fork":false,"pushed_at":"2024-08-20T16:26:23.000Z","size":96725,"stargazers_count":1381,"open_issues_count":32,"forks_count":224,"subscribers_count":27,"default_branch":"master","last_synced_at":"2025-05-14T01:08:07.060Z","etag":null,"topics":["bleu","bleu-score","cider","dialog","dialogue","evaluation","machine-translation","meteor","natural-language-generation","natural-language-processing","nlg","nlp","rouge","rouge-l","skip-thought-vectors","skip-thoughts","task-oriented-dialogue"],"latest_commit_sha":null,"homepage":"http://arxiv.org/abs/1706.09799","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Maluuba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-27T17:48:22.000Z","updated_at":"2025-05-12T18:07:35.000Z","dependencies_parsed_at":"2024-11-18T23:34:41.600Z","dependency_job_id":"a621e58a-f336-42a8-af86-43e5612bf8cf","html_url":"https://github.com/Maluuba/nlg-eval","commit_stats":{"total_commits":63,"total_committers":13,"mean_commits":4.846153846153846,"dds":0.746031746031746,"last_synced_commit":"2ab4528fad5548315cf61e40c2249fec8c8ad233"},"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maluuba%2Fnlg-eval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maluuba%2Fnlg-eval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maluuba%2Fnlg-eval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Maluuba%2Fnlg-eval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Maluuba","download_url":"https://codeload.github.com/Maluuba/nlg-eval/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384988,"owners_count":22062422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bleu","bleu-score","cider","dialog","dialogue","evaluation","machine-translation","meteor","natural-language-generation","natural-language-processing","nlg","nlp","rouge","rouge-l","skip-thought-vectors","skip-thoughts","task-oriented-dialogue"],"created_at":"2024-09-24T19:44:04.619Z","updated_at":"2025-05-15T17:07:51.885Z","avatar_url":"https://github.com/Maluuba.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/Maluuba/nlg-eval.svg?branch=master)](https://travis-ci.org/Maluuba/nlg-eval)\n\n# nlg-eval\nEvaluation code for various unsupervised automated metrics for NLG (Natural Language Generation).\nIt takes as input a hypothesis file, and one or more references files and outputs values of metrics.\nRows across these files should correspond to the same example.\n\n## Metrics ##\n- BLEU\n- METEOR\n- ROUGE\n- CIDEr\n- SPICE\n- SkipThought cosine similarity\n- Embedding Average cosine similarity\n- Vector Extrema cosine similarity\n- Greedy Matching score\n\n## Setup ##\n\nInstall Java 1.8.0 (or higher).\n\nInstall the Python dependencies, run:\n```bash\npip install git+https://github.com/Maluuba/nlg-eval.git@master\n```\n\nIf you are using macOS High Sierra or higher, then run this to allow multithreading:\n```bash\nexport OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES\n```\n\nSimple setup (download required data (e.g. models, embeddings) and external code files), run:\n```bash\nnlg-eval --setup\n```\n\nIf you're setting this up from the source code or you're on Windows and not using a Bash terminal, then you might get errors about `nlg-eval` not being found.\nYou will need to find the `nlg-eval` script.\nSee [here](https://github.com/Maluuba/nlg-eval/issues/61) for details.\n\n### Custom Setup ###\n```bash\n# If you don't like the default path (~/.cache/nlgeval) for the downloaded data,\n# then specify a path where you want the files to be downloaded.\n# The value for the data path is stored in ~/.config/nlgeval/rc.json and can be overwritten by\n# setting the NLGEVAL_DATA environment variable.\nnlg-eval --setup ${data_path}\n```\n\n### Validate the Setup (Optional) ###\n(These examples were made with Git Bash on Windows)\n\nAll of the data files should have been downloaded, you should see sizes like:\n```\n$ ls -l ~/.cache/nlgeval/\ntotal 6003048\n-rw-r--r-- 1 ...  289340074 Sep 12  2018 bi_skip.npz\n-rw-r--r-- 1 ...        689 Sep 12  2018 bi_skip.npz.pkl\n-rw-r--r-- 1 ... 2342138474 Sep 12  2018 btable.npy\n-rw-r--r-- 1 ...    7996547 Sep 12  2018 dictionary.txt\n-rw-r--r-- 1 ...   21494787 Jan 22  2019 glove.6B.300d.model.bin\n-rw-r--r-- 1 ...  480000128 Jan 22  2019 glove.6B.300d.model.bin.vectors.npy\n-rw-r--r-- 1 ...  663989216 Sep 12  2018 uni_skip.npz\n-rw-r--r-- 1 ...        693 Sep 12  2018 uni_skip.npz.pkl\n-rw-r--r-- 1 ... 2342138474 Sep 12  2018 utable.npy\n```\n\nYou can also verify some checksums:\n```\n$ cd ~/.cache/nlgeval/\n$ md5sum *\n9a15429d694a0e035f9ee1efcb1406f3 *bi_skip.npz\nc9b86840e1dedb05837735d8bf94cee2 *bi_skip.npz.pkl\n022b5b15f53a84c785e3153a2c383df6 *btable.npy\n26d8a3e6458500013723b380a4b4b55e *dictionary.txt\nf561ab0b379e23cbf827a054f0e7c28e *glove.6B.300d.model.bin\nbe5553e91156471fe35a46f7dcdfc44e *glove.6B.300d.model.bin.vectors.npy\n8eb7c6948001740c3111d71a2fa446c1 *uni_skip.npz\ne1a0ead377877ff3ea5388bb11cfe8d7 *uni_skip.npz.pkl\n5871cc62fc01b79788c79c219b175617 *utable.npy\n$ sha256sum *\n8ab7965d2db5d146a907956d103badfa723b57e0acffb75e10198ba9f124edb0 *bi_skip.npz\nd7e81430fcdcbc60b36b92b3f879200919c75d3015505ee76ae3b206634a0eb6 *bi_skip.npz.pkl\n4a4ed9d7560bb87f91f241739a8f80d8f2ba787a871da96e1119e913ccd61c53 *btable.npy\n4dc5622978a30cddea8c975c871ea8b6382423efb107d27248ed7b6cfa490c7c *dictionary.txt\n10c731626e1874effc4b1a08d156482aa602f7f2ca971ae2a2f2cd5d70998397 *glove.6B.300d.model.bin\n20dfb1f44719e2d934bfee5d39a6ffb4f248bae2a00a0d59f953ab7d0a39c879 *glove.6B.300d.model.bin.vectors.npy\n7f40ff16ff5c54ce9b02bd1a3eb24db3e6adaf7712a7a714f160af3a158899c8 *uni_skip.npz\nd58740d46cba28417cbc026af577f530c603d81ac9de43ffd098f207c7dc4411 *uni_skip.npz.pkl\n790951d4b08e843e3bca0563570f4134ffd17b6bd4ab8d237d2e5ae15e4febb3 *utable.npy\n```\n\nIf you're ensure that the setup was successful, you can run the tests:\n```bash\npip install pytest\npytest\n```\n\nIt might take a few minutes and you might see warnings but they should pass.\n\n## Usage ##\nOnce setup has completed, the metrics can be evaluated with a Python API or in the command line.\n\nExamples of the Python API can be found in [test_nlgeval.py](nlgeval/tests/test_nlgeval.py).\n\n### Standalone ###\n\n    nlg-eval --hypothesis=examples/hyp.txt --references=examples/ref1.txt --references=examples/ref2.txt\n\nwhere each line in the hypothesis file is a generated sentence and the corresponding\nlines across the reference files are ground truth reference sentences for the\ncorresponding hypothesis.\n\n### functional API: for the entire corpus ###\n\n```python\nfrom nlgeval import compute_metrics\nmetrics_dict = compute_metrics(hypothesis='examples/hyp.txt',\n                               references=['examples/ref1.txt', 'examples/ref2.txt'])\n```\n\n### functional API: for only one sentence ###\n\n```python\nfrom nlgeval import compute_individual_metrics\nmetrics_dict = compute_individual_metrics(references, hypothesis)\n```\n\nwhere `references` is a list of ground truth reference text strings and\n`hypothesis` is the hypothesis text string.\n\n### object oriented API for repeated calls in a script - single example ###\n\n```python\nfrom nlgeval import NLGEval\nnlgeval = NLGEval()  # loads the models\nmetrics_dict = nlgeval.compute_individual_metrics(references, hypothesis)\n```\n\nwhere `references` is a list of ground truth reference text strings and\n`hypothesis` is the hypothesis text string.\n\n### object oriented API for repeated calls in a script - multiple examples ###\n\n```python\nfrom nlgeval import NLGEval\nnlgeval = NLGEval()  # loads the models\nmetrics_dict = nlgeval.compute_metrics(references, hypothesis)\n```\n\nwhere `references` is a list of lists of ground truth reference text strings and\n`hypothesis` is a list of hypothesis text strings. Each inner list in `references`\nis one set of references for the hypothesis (a list of single reference strings for\neach sentence in `hypothesis` in the same order).\n\n## Reference ##\nIf you use this code as part of any published research, please cite the following paper:\n\nShikhar Sharma, Layla El Asri, Hannes Schulz, and Jeremie Zumer.\n**\"Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation\"**\n*arXiv preprint arXiv:1706.09799* (2017)\n\n```bibtex\n@article{sharma2017nlgeval,\n    author  = {Sharma, Shikhar and El Asri, Layla and Schulz, Hannes and Zumer, Jeremie},\n    title   = {Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation},\n    journal = {CoRR},\n    volume  = {abs/1706.09799},\n    year    = {2017},\n    url     = {http://arxiv.org/abs/1706.09799}\n}\n```\n\n## Example ##\nRunning\n\n    nlg-eval --hypothesis=examples/hyp.txt --references=examples/ref1.txt --references=examples/ref2.txt\n\ngives\n\n    Bleu_1: 0.550000\n    Bleu_2: 0.428174\n    Bleu_3: 0.284043\n    Bleu_4: 0.201143\n    METEOR: 0.295797\n    ROUGE_L: 0.522104\n    CIDEr: 1.242192\n    SPICE: 0.312331\n    SkipThoughtsCosineSimilarity: 0.626149\n    EmbeddingAverageCosineSimilarity: 0.884690\n    VectorExtremaCosineSimilarity: 0.568696\n    GreedyMatchingScore: 0.784205\n\n## Troubleshooting\nIf you have issues with Meteor then you can try lowering the `mem` variable in meteor.py\n\n## Important Note ##\nCIDEr by default (with idf parameter set to \"corpus\" mode) computes IDF values using the reference sentences provided. Thus,\nCIDEr score for a reference dataset with only 1 image (or example for NLG) will be zero. When evaluating using one (or few)\nimages, set idf to \"coco-val-df\" instead, which uses IDF from the MSCOCO Vaildation Dataset for reliable results. This has\nnot been adapted in this code. For this use-case, apply patches from\n[vrama91/coco-caption](https://github.com/vrama91/coco-caption).\n\n\n## External data directory\n\nTo mount an already prepared data directory to a Docker container or share it between\nusers, you can set the `NLGEVAL_DATA` environment variable to let nlg-eval know\nwhere to find its models and data.  E.g.\n\n    NLGEVAL_DATA=~/workspace/nlg-eval/nlgeval/data\n\nThis variable overrides the value provided during setup (stored in `~/.config/nlgeval/rc.json`)\n\n## Microsoft Open Source Code of Conduct ##\nThis project has adopted the [Microsoft Open Source Code of\nConduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct\nFAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com)\nwith any additional questions or comments.\n\n## License ##\nSee [LICENSE.md](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaluuba%2Fnlg-eval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmaluuba%2Fnlg-eval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmaluuba%2Fnlg-eval/lists"}