{"id":17280880,"url":"https://github.com/blmoistawinde/fense","last_synced_at":"2025-10-05T06:53:17.847Z","repository":{"id":113037428,"uuid":"412709859","full_name":"blmoistawinde/fense","owner":"blmoistawinde","description":"Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.","archived":false,"fork":false,"pushed_at":"2023-02-01T09:57:47.000Z","size":107884,"stargazers_count":21,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-14T09:51:05.236Z","etag":null,"topics":["audio-captioning","audiocaption","benchmark","evaluation-metrics"],"latest_commit_sha":null,"homepage":"https://share.streamlit.io/blmoistawinde/fense/main/streamlit_demo/app.py","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blmoistawinde.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-10-02T06:21:48.000Z","updated_at":"2025-04-07T01:17:54.000Z","dependencies_parsed_at":"2023-06-06T20:00:43.908Z","dependency_job_id":null,"html_url":"https://github.com/blmoistawinde/fense","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/blmoistawinde/fense","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmoistawinde%2Ffense","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmoistawinde%2Ffense/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmoistawinde%2Ffense/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmoistawinde%2Ffense/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blmoistawinde","download_url":"https://codeload.github.com/blmoistawinde/fense/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blmoistawinde%2Ffense/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278420205,"owners_count":25983812,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["audio-captioning","audiocaption","benchmark","evaluation-metrics"],"created_at":"2024-10-15T09:22:18.128Z","updated_at":"2025-10-05T06:53:17.841Z","avatar_url":"https://github.com/blmoistawinde.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# FENSE\n\nThe metric, **F**luency **EN**hanced **S**entence-bert **E**valuation (FENSE), for audio caption evaluation, proposed in the paper [\"Can Audio Captions Be Evaluated with Image Caption Metrics?\"](https://arxiv.org/abs/2110.04684)\n\nThe `main` branch contains an easy-to-use interface for fast evaluation of an audio captioning system.\n\nOnline demo avaliable at https://share.streamlit.io/blmoistawinde/fense/main/streamlit_demo/app.py .\n\nTo get the dataset (AudioCaps-Eval and Clotho-Eval) and the code to reproduce, please refer to the [experiment-code](https://github.com/blmoistawinde/fense/tree/experiment-code) branch.\n\n## Installation\n\nClone the repository and pip install it.\n\n```bash\ngit clone https://github.com/blmoistawinde/fense.git\ncd fense\npip install -e .\n```\n\n## Usage\n\n### Single Sentence\nTo get the detailed scores of each component for a single sentence.\n\n```python\nfrom fense.evaluator import Evaluator\n\nprint(\"----Using tiny models----\")\nevaluator = Evaluator(device='cpu', sbert_model='paraphrase-MiniLM-L6-v2', echecker_model='echecker_clotho_audiocaps_tiny')\n\neval_cap = \"An engine in idling and a man is speaking and then\"\nref_cap = \"A machine makes stitching sounds while people are talking in the background\"\n\nscore, error_prob, penalized_score = evaluator.sentence_score(eval_cap, [ref_cap], return_error_prob=True)\n\nprint(\"Cand:\", eval_cap)\nprint(\"Ref:\", ref_cap)\nprint(f\"SBERT sim: {score:.4f}, Error Prob: {error_prob:.4f}, Penalized score: {penalized_score:.4f}\")\n```\n\n### System Score\n\nTo get a system's overall score on a dataset by averaging sentence-level FENSE, you can use `eval_system.py`, with your system outputs prepared in the format like `test_data/audiocaps_cands.csv` or `test_data/clotho_cands.csv` .\n\nFor AudioCaps test set:\n\n```bash\npython eval_system.py --device cuda --dataset audiocaps --cands_dir ./test_data/audiocaps_cands.csv\n```\n\nFor Clotho Eval set:\n\n```bash\npython eval_system.py --device cuda --dataset clotho --cands_dir ./test_data/clotho_cands.csv\n```\n\n## Performance Benchmark\n\nWe benchmark the performance of FENSE with different choices of SBERT model and Error Detector on the two benchmark dataset AudioCaps-Eval and Clotho-Eval. (*) is the combination reported in paper.\n\nAudioCaps-Eval\n\n| SBERT | echecker | HC   | HI   | HM   | MM   | total  |\n|-------|-------|------|------|------|------|--------|\n| paraphrase-MiniLM-L6-v2 |  none     | 62.1 | 98.8 | 93.7 | 75.4 | 80.4   |\n| paraphrase-MiniLM-L6-v2 | tiny  | 57.6 | 94.7 | 89.5 | 82.6 | 82.3   |\n| paraphrase-MiniLM-L6-v2 | base  | 62.6 | 98   | 82.5 | 85.4 | 85.5   |\n| paraphrase-TinyBERT-L6-v2 | none    | 64   | 99.2 | 92.5 | 73.6 | 79.6   |\n| paraphrase-TinyBERT-L6-v2 | tiny  | 58.6 | 95.1 | 88.3 | 82.2 | 82.1   |\n| paraphrase-TinyBERT-L6-v2 | base  | 64.5 | 98.4 | 91.6 | 84.6 | 85.3(*)  |\n| paraphrase-mpnet-base-v2  | none  | 63.1 | 98.8 | 94.1 | 74.1 | 80.1   |\n| paraphrase-mpnet-base-v2 | tiny  | 58.1 | 94.3 | 90   | 83.2 | 82.7   |\n| paraphrase-mpnet-base-v2 | base  | 63.5 | 98   | 92.5 | 85.9 | 85.9   |\n\n\nClotho-Eval\n\n| SBERT | echecker | HC   | HI   | HM   | MM   | total  |\n|-------|-------|------|------|------|------|--------|\n| paraphrase-MiniLM-L6-v2 | none    | 59.5 | 95.1 | 76.3 | 66.2 | 71.3   |\n| paraphrase-MiniLM-L6-v2 | tiny  | 56.7 | 90.6 | 79.3 | 70.9 | 73.3   |\n| paraphrase-MiniLM-L6-v2 | base  | 60   | 94.3 | 80.6 | 72.3 | 75.3   |\n| paraphrase-TinyBERT-L6-v2 | none  | 60   | 95.5 | 75.9 | 66.9 | 71.8   |\n| paraphrase-TinyBERT-L6-v2 | tiny  | 59   | 93   | 79.7 | 71.5 | 74.4   |\n| paraphrase-TinyBERT-L6-v2 | base  | 60.5 | 94.7 | 80.2 | 72.8 | 75.7(*)   |\n| paraphrase-mpnet-base-v2  | none  | 56.2 | 96.3 | 77.6 | 65.2 | 70.7   |\n| paraphrase-mpnet-base-v2 | tiny  | 54.8 | 91.8 | 80.6 | 70.1 | 73     |\n| paraphrase-mpnet-base-v2 | base  | 57.1 | 95.5 | 81.9 | 71.6 | 74.9   |\n\n## Reference\n\nIf you use FENSE in your research, please cite:\n\n```\n@misc{zhou2021audio,\n      title={Can Audio Captions Be Evaluated with Image Caption Metrics?}, \n      author={Zelin Zhou and Zhiling Zhang and Xuenan Xu and Zeyu Xie and Mengyue Wu and Kenny Q. Zhu},\n      year={2021},\n      eprint={2110.04684},\n      archivePrefix={arXiv},\n      primaryClass={cs.SD}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblmoistawinde%2Ffense","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblmoistawinde%2Ffense","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblmoistawinde%2Ffense/lists"}